In our data-driven economy, new job roles are emerging. Two of these are data scientists and data engineers. But what do they involve? Let’s find out.
Despite only being at the frontier of the information age, it has already spawned a digital revolution. Core to this is big data—the constant stream of information that’s reshaping the way our society and economy work. The existence of big data alone has transformed our shopping habits, our access to healthcare and education, how our businesses are run, and of course, our job market. Two fresh fields in this area are data science and data engineering. But what’s the difference between them, and which, if either, is the right one for you?
In this post, we’ll look at the differences between data science and data engineering, asking:
- Data science vs. data engineering: what’s the difference?
- What are the key skills for data scientists and data engineers?
- How much do data scientists and data engineers earn?
- Should you become a data scientist or a data engineer?
- Key takeaways
Ready to learn about two possible new career paths? Read on.
1. Data science vs. data engineering: what’s the difference?
Because data science and data engineering are relatively new, related fields, there is sometimes confusion about what distinguishes them. Toss the word ‘data’ into a job title, and people (at least those who aren’t in the know) tend to lump things in together! In reality, data science and data engineering are two very distinct roles. Let’s explore further.
What is data science?
Data science is an interdisciplinary field of scientific study. It focuses on obtaining insights from very large datasets (or ‘big data’). Data scientists may work in any number of industries, from business to government or the applied sciences. However, all data scientists share a common goal: to analyze information and to obtain insights from that information that are relevant to their field of work.
For example, in business, big tech companies often hire data scientists to help them perfect their customer recommendation algorithms (or to tailor the customer experience in other ways). The finance industry uses data science to help inform the creation of new products. In healthcare, big data can be used to diagnose disease. The list goes on and on.
Most data scientists start their careers in areas related to math and statistics. They usually then develop into areas like data analytics and machine learning. Skills required range from knowledge of computer science to information visualization, communication, and business. However, data scientists also require a great deal of technical knowledge, such as how to apply complex data modeling architectures. This is one area where data science overlaps with data engineering (which we’ll explore later).
Increasingly, many data scientists are carving niche careers in very specialized areas. This is possible due to the deluge of data that now impacts every part of our lives. In every industry, the demand for data scientists is growing. This is why data science is considered one of the ‘sexiest’ careers of the 21st century!
What is data engineering?
Data engineering (also known as information engineering, or information systems engineering) is a software engineering approach. A data engineer’s job is to build the appropriate software architecture to collect and funnel big data. Others working in the field (including data scientists) can then use these data. While data engineering and data science both involve working with big data, this is largely where the similarities end. Data engineering has a much more specialized focus.
A data engineer’s role is to build or unify different aspects of complex systems, taking into account the information required, a business’s goals, and the needs of the end-user. This involves creating highly complex data pipelines. Just like oil pipelines, these data pipelines collect raw, unstructured data from any number of different sources. They then channel them into a single database (or larger structure) where they are stored. While data scientists also source data as part of their role, unlike data engineers, this is not their main focus.
Unsurprisingly, data engineers need an in-depth understanding of dozens of big data technologies and how these technologies interact. From beginning to end, a data engineer’s job involves strategic planning, data modeling, designing appropriate systems, and finally, prototyping, constructing, and implementing those systems.
Without data, there is no data science. By extension, we need the right structures to collect and store information. This is a particular challenge for older, larger organizations, whose legacy architecture is often insufficient for 21st century needs. That’s why, even though data engineering is not generally considered to be as ‘hot’ as data science, talented data engineers are highly in demand. If you’re considering a new career, take note!
2. What are the key skills for data scientists and data engineers?
OK, so we now have a fairly good understanding of the difference between data scientists and data engineers. Now let’s dive a bit deeper and look at the core skills and responsibilities for each role.
Key skills and responsibilities of a data scientist
Most data scientists have backgrounds in areas like mathematics or statistics. Key skills for a data scientist include:
- Advanced math, statistics, or similar (including the relevant Ph.D. or master’s).
- Domain knowledge, i.e. subject matter expertise in a particular field.
- Excellent business acumen.
- Advanced analytics skills, e.g. knowledge of predictive, diagnostic, or sentiment analytics models, etc.
- In-depth knowledge of machine learning and artificial intelligence algorithms (and their uses).
- Solid understanding of big data tools, e.g. Apache Spark, Hadoop, SQL, etc.
- Exceptional visualization, communication, and reporting skills, e.g. multimedia reports, dashboards, presentations.
Key skills and responsibilities of a data engineer
Since their role is much more focused on software architecture, a data engineer’s skills are accordingly more focused on the necessary know-how. A data engineer’s key skills usually include:
- Advanced programming in languages like Java, Scala, and Python (as well as knowledge of many others).
- Specialized knowledge of distributed computing.
- Knowledge of database systems, e.g. SQL, NoSQL, object-oriented databases, etc.
- Expertise in perhaps dozens of big data technologies, e.g. Amazon Web Services (AWS), Spark, Hadoop, Hive, Kafka (and others in the Apache big data ecosystem).
- The ability to understand and combine different frameworks and to build suitable data pipelines.
- Knowledge of Extract, Transfer, Load (ETL) tools (used for merging data from multiple sources).
- Expertise in application programming interfaces (APIs), used to connect different software applications.
Overlapping skills between data scientists and data engineers
When two roles share a similar focus (big data) it’s inevitable that they should share some core skills. This overlap is why data engineering is often lumped under the broader umbrella of data science. Some dispute this, though. When two roles are confused, it can cause tension. If a data engineer is expected to carry out data science tasks (or vice-versa) this does a great disservice to the specialized skills of both roles. To distinguish them better, we need to understand where they overlap:
Data analysis: Since analyzing data is what they spend most of their time doing, data scientists are experts in data analytics. However, data engineers also need basic to intermediate data analysis skills. This helps them effectively plan their work and to make sense of how the data they’re working with will eventually be used.
Programming: Conversely, data engineers are expert programmers, often with a background in software engineering. While data science relies much less heavily on programming skills, it is still a requirement. For instance, data scientists often need to code algorithms built using languages like Python or R.
- Big data: We’ve already mentioned this but it doesn’t hurt to be explicit! Data scientists and data engineers both work with big data. The difference is in how they use it. Data engineers build big data architectures, while data scientists analyze big data. Either way, both roles require a natural flair for working with unstructured datasets. You can learn more about big data in this post.
3. How much do data scientists and data engineers earn?
The amount that data scientists and data engineers earn depends on many factors. These include the industry they’re working in, their skill level, an organization’s understanding (or, more often, lack of understanding) about what the job involves, and even the job title. However, for a rough measure of the different salaries data scientists and data engineers can expect, we’ve looked to the salary comparison website, Payscale. The following figures were correct at the time of writing.
In the US, data scientists will earn a median salary of $96K. This can range from around $67K for entry-level positions, to about $134K for very senior roles.
Meanwhile, data engineers can earn a median of $92K. Salaries range from $65K to $132K, depending on skill level.
While data scientists earn a little more on average than data engineers, there are a couple of caveats. First, as we’ve mentioned, there is currently a real buzz around data science. While data scientists and data engineers are of pretty equal importance, this buzz can artificially inflate salary expectations. In reality, data architecture is fundamental to the way businesses are run, meaning that good data engineers are often in higher demand than data scientists.
Secondly, many organizations (or more accurately, many management teams) lack clarity about what data scientists and data engineers actually do. For instance, some expect data scientists to be able to construct complex data pipelines. Others might expect data engineers to conduct complex analyses. As organizations evolve a more nuanced understanding about the differences between data science and data engineering (and the vital importance of solid architecture) we may see data engineers earning more. One to keep your eye on.
4. Should you become a data scientist or a data engineer?
Since data-related jobs are quickly evolving, there’s no single path into one arena or the other. This can be both a blessing and a curse. Up until recently, most people tended to ‘fall into’ these types of jobs, by specializing their existing skills.
For instance, many of those with statistical backgrounds picked up analytical skills to take their work further. These people became today’s data scientists. Likewise, many developers specialized in the area of big data, leading to the emergence of today’s data engineers.
Only more recently, as these roles have become better defined, have people started actively aspiring to careers in one or the other. But which one is right for you?
Should you become a data scientist?
Are you mathematically minded? Do you have a Ph.D. or master’s, perhaps in a field like statistics? Are you a subject matter expert, maybe in the sciences? Or are you an excellent communicator with a flair for business? Most of all, do you love analyzing data to detect patterns and trends? If so, have you developed programming skills to advance your analytics abilities (rather than for the love of programming itself)? Are you fascinated by the potential of fields like machine learning and artificial intelligence? If the answer to all these questions is yes then you might have what it takes to progress in the field of data science.
On the other hand…
Should you become a data engineer?
Have you been fiddling around with code since you first switched on a PC? Do you come from a technical background like software development? Are you a perfectionist who loves to build new applications that solve challenging problems? Does figuring out new technologies thrill you? Most of all, do you love the challenge of collecting and structuring information in complex systems? If your answer to all (or most!) of these questions is yes, then you could have a bright future as a data engineer.
While data science and data engineering are distinct roles, they are not mutually exclusive. The joy of the emerging data economy is that it is constantly changing. As you progress on your chosen career path, you’ll likely find new routes that you hadn’t considered before, or that might not have existed when you set out. For instance, machine learning engineers combine the rigor of data engineering with the pursuit of knowledge that is so fundamental to data science. Keep an open mind and you never know where a career in data might take you.
5. Key takeaways
In this post, we’ve explored the differences between data science and data engineering. We’ve learned that:
- Data science is an interdisciplinary field of scientific study, which focuses on obtaining insights from big data.
- Data engineering involves planning, designing, building, and implementing software architecture to collect and funnel big data from numerous sources.
- Data scientists tend to have strong backgrounds in statistics and math and need to be experts in data analysis.
- Data engineers tend to have backgrounds in software development and need to be experts in working with involved, complex data structures.
- Presently, both data scientists and data engineers earn about the same. However, as large organizations update their legacy architecture, data engineers are increasingly in demand.
As big data reshapes the industrial landscape for the 21st century, new roles are constantly popping up. That makes this a prime time to consider a new career in data. Explore more with a free, five-day data analytics short course, and check out the following: