Before discussing the skills you need to be a successful Data Engineer, it is important to understand who a Data Engineer is and what a Data Engineer does.
Who is a Data Engineer?
A Data Engineer is responsible for the integration, migration, and transformation of data into a data platform to aid an easy and efficient use of data by Data Analysts, Data Scientists, and other Data Engineers. If as a data professional you find yourself doing any of these, you have been involved in Data Engineering.
Earlier in my journey (still early 😉) as a Data Engineer, I was curious to know the tools and technologies I needed to become a top-tier Data Engineer. From my search, two skills were consistent, and No, it wasn’t SQL, Python, Apache, or even Excel.
Every blog I checked had their Top 10 tools and technologies every Data Engineer MUST have, but I noticed most of the skills were not consistent in ranking and recurrence. The inconsistent result was apt because as Data Engineers we are problem solvers and experience different problems based on our company’s industry, data culture, and technology stack.
It is easy to spot the hottest skill right now but Data Engineering data tools are evolving so fast that today’s skill might be obsolete tomorrow. A good example is how the design I made last month (March 2022) to build a data lakehouse by integrating Databricks to GCP might not be the best approach this month (April 2022) with the recent introduction of BigLake (currently in Preview). If you get stuck in thinking you need an ultimate tool, you will miss out on the constantly evolving ecosystem. One good strategy to win is to understand the fundamentals of data engineering and master the skill of LEARNING.
Yes, Learning is one of the ultimate skills for every data engineer. There will always be a better way to solve yesterday’s problems today. Hence, you shouldn’t worry about the tools or the platform but how you can transfer your previous knowledge of one technology to another.
Don’t overwhelm yourself with what is predicted to trend in the next 10 years, instead focus on understanding the fundamentals to solve the current problem you are facing. That is the knowledge you will leverage to solve future problems. Learn at your pace and be willing to take on new challenges.
Learning shouldn’t only be focused on tools and technologies. Learn from professionals in your field; you need to know how they are doing things, what they plan to explore and evaluate their path in line with your vision for your career. Don’t follow blindly.
The second skill is the ability to solve problems. Data Engineers love solving problems, and the ability to solve problems using logic is a great asset to every Data Engineer because, your knowledge of new technologies and processes will help solve complex real-life problems by breaking down the solutions based on the available technology. The more you learn, the easier and less complicated your solutions will be.
CALL TO ACTION
Yes, it can be overwhelming to start learning about a completely new field. So to ease this tension, I recommend you cut the chase and start with a not-so-perfect plan; learn about one Data Engineering topic/tech at a time, and start with a topic that interests you the most to get your learning momentum going. Learn what is relevant to your current challenges then look to learn future solutions. Take it a day at a time.
“Do what you can, Now!”
For starters, you need to understand SQL (this is important), and any other scripting language like Python then you can go a nudge further by learning to write basic bash scripts and use the Linux terminal. But overall, no tool is the most important for all Data Engineers. The least important tool to one Data Engineer could be the only reason another Data Engineer gets hired into another company.
When you understand the different data formats and how they can be transformed (CSV to JSON, GraphQL to CSV, JSON to an RDBMS…) you will easily be able to extract and transform data from different APIs. Knowledge of SQL will aid your understanding of data modelling. Now you see how your new skills can make it easier to acquire newer skills.
If you are still in doubt on how to start you Data Engineering Journey, I suggest you begin with SQL, and one of the top cloud computing platforms (Azure, GCP, or AWS). Next, understanding the fundamentals of Data engineering and the list of things to learn goes on…
Still in doubt? Feel free to contact me on LinkedIn.