2021 — Rockstar Data Engineer Roadmap
This article presents a roadmap for those who want to become Data Engineers in 2021. It also serves as a reference to learn and improve the understanding of the different skills required for this position.
--
For each skill category, I cite the skill points that are recommended with links to video/text courses and some reference books at the end. I have no association with the author, and this is not a referral article. The goal is to provide an easy way for you to navigate and improve your skills :)!
To become a skilled data engineer, you need to know computer science, dev, networking, and databases fundamentals.
Networking Fundamentals
Reference Books
Computer Networking: A Top-Down Approach
Database Fundamentals
- Relational Algebra / Normalisation.
- SQL.
- ACID / CAP.
- OLTP vs OLAP.
- Datawarehouses and Datamarts.
- Datalake / Cloud Data Platform.
Reference Books
Readings in Database Systems
After fundamentals, you need to grasp some technologies. Here is a non-exhaustive list of tools to store, process, and create data pipelines.
Data Storage
- Relational: MySql, PostgreSQL, MariaDB, Amazon Aurora.
- Document: MongoDB, Elasticsearch, Apache CouchDB.
- Wide column: Cassandra, Hbase, Google Bigtable.
- Graph: Neo4j, Amazon Neptune.
- Key-value: Redis, Memcached, Amazon DynamoDB.
- Messaging: RabbitMQ, Apache ActiveMQ.
- Data warehouses: Snowflake, Presto, Apache Hive, Amazon Redshift, Google BigQuery.
- Datalakes: S3, ADLS Gen2.
Reference Books
Designing Data-Intensive Applications
Data Processing:
- Cluster Computing: Hadoop, HDFS, MapReduce.
- Batch: Apache Pig, Apache Arrow, Apache Impala.
- Hybrid: Apache Spark, Apache Beam, Apache Flink.
- Streaming: Apache Kafka, Apache Storm, Apache Samza, Amazon Kinesis.
- Managed Solutions: Databricks, Amazon EMR, Google Dataproc, Azure HDInsight.
Workflow Scheduling / ETC
DevOps & Security
- Jenkins, Azure Devops, AWS CodePipeline, Github Actions.
- Docker, Kuberentes and Helm.
- Active Directory / Azure Active Directory.
- Encryption, Key Management.
- Data Governance, GDPR.
It’s time to prove your skills. Here are some certifications that can help you get your dream job or just make you more confident.
Valuable Certifications
Conclusion
Keep this story bookmarked! It can help you learn efficiently and become a better data engineer. You are not born a guru, It is continuous trial and error that improves your skills.