From census records to birth registers, we’ve been collecting data for centuries. However, the amount of data we create and collect has exploded to phenomenal proportions since the dawn of the internet. In 2013, it was claimed that 90% of all the data in the world was created in the previous two years. And that figure has vastly multiplied since.
First coined in 2005, the term “big data” is used to describe these huge quantities of information—datasets so vast that they defy traditional analysis. Today, governments, private companies, and public service providers are all trying to tap the potential of big data. However, while it has many potential benefits, it also comes with some risks.
In this post, we’ll explore both the potential and risks posed by big data, with a particular focus on the privacy, security, and ethical issues it brings. We’ll cover the following:
- What are the benefits of big data?
- What are the risks of big data?
- Examples of dangerous big data in action
- How can we minimize the dangers of big data?
- Key takeaways
Hold on to your hats, because this one’s a bumpy ride…
1. What are the benefits of big data?
Before we get to the risks of big data, it makes sense to understand why so many organizations are trying to utilize it. Big data has already revolutionized many aspects of our lives. Let’s explore some of its benefits.
Big data offers better insights
Big data offers the potential for vastly enhanced data analytics. Used properly, organizations can employ big data to spot entirely new trends, to segment customers to an astonishing degree of accuracy, and to allow unprecedented levels of innovation in technology and product design.
Big data offers a unique competitive advantage
By definition, big data is a flow of real-time information. By harnessing this flow, organizations can also adapt to changes in real-time. This means they can stay ahead of the competition in ways that companies of the past could only dream of.
Big data has huge potential to improve productivity
Big data tools like Apache Hadoop and Spark allow data analysts to work with datasets they wouldn’t otherwise be able to. This not only offers improved productivity for data analysts: with enhanced tools, they can glean far greater insights and detect patterns that will boost staff productivity, too.
Big data and the Internet of Things
For the most part, the internet is used for humans to communicate with each other, using machines as a go-between. However, with the Internet of Things, we are starting to see devices communicate directly with each other. This has tonnes of potential. For example, your thermostat could automatically adjust the temperature based on weather reports, your car could send information to the manufacturer to improve safety measures, or your fridge could simply remind you to buy milk!
This is just a taster of how big data can potentially transform the world around us. While it’s pretty exciting, with all this potential comes plenty of risk, too. Let’s explore this in more detail.
2. What are the risks of big data?
While it’s easy to get caught up in the opportunities big data offers, it’s not necessarily a cornucopia of progress. If gathered, stored, or used wrongly, big data poses some serious dangers. However, the key to overcoming these is to understand them. So let’s get ahead of the curve.
Broadly speaking, the risks of big data can be divided into four main categories: security issues, ethical issues, the deliberate abuse of big data by malevolent players (e.g. organized crime), and unintentional misuse.
Big data’s security issues
The more data an organization collects, the more expensive and difficult it is to store safely.
This is already a problem. According to the Risk-Based Security Mid-Year Data Breach report, 4.1 billion records were exposed through data breaches in the first half of 2019 alone. This highlights just how important data security is, but also the challenges organizations face in keeping our data safe. The more data a company holds, the higher the cost and practical burden of keeping it secure.
Related to this is the issue of privacy. Governments, social media giants, insurance companies, and healthcare providers are just a handful of organizations that have unprecedented levels of access to our data. While they’re bound by data protection laws (with the potential for huge fines) the increasing number of high profile data breaches in the last few years shows that more action is needed. Organizations—especially big tech—may have information on where we live, where we go, how we spend our money, and so on. With personal bank details and other sensitive information under their protection, and cyberattacks on the rise, this begs the question: just because companies can store vast amounts of data, does that mean they should? This segues nicely into the next section…
Ethical issues with big data
Presuming organizations manage to keep our data safe from hackers and cyberattacks, that does not preclude the possibility that they might misuse the information themselves. While data protection laws are in place, there is still some grey area about how data can be used by companies who have obtained it legally.
Take insurance providers and credit card companies. It’s no revelation that these organizations impose premiums and limits based on customer behaviors. For instance, if you’ve ever had a car accident, you’ll know your car insurance premium goes up. Big data allows these companies to make ever-more refined predictions about the future, allowing them to conduct ever-more invasive financial profiling.
Way back in 2009 (even before big data was as big as it is today) one man had his credit limit cut, simply because other customers who shopped in the same stores as him had poor repayment histories. This is just one small example of a murky area of big data use that has clear ethical implications. There are multiple other ethical issues too, around consent, ownership, and privacy. These have resulted in the emergence of the Right To Be Forgotten, which has led to new laws being introduced.
Abuse of big data by malevolent players
Another danger with big data is if third parties get their hands on sensitive information. In 2020, it’s estimated that we’ll produce 2.5 quintillion bytes of data every day. That’s tough to visualize, but you can trust that it’s an immense amount—far more than any organization can easily manage or analyze. Nevertheless, hackers and cyberattackers can target this data to sell on the DarkNet.
Phishing, bank fraud, and insurance scams are all common examples of how big data can be deliberately misused by organized crime groups. The days of try-their-luck emails offering you a million dollars if you just send through your bank details are long gone! If you’ve recently been the victim of a scam, you’ll know just how sophisticated they can be.
Big data also plays a big part in the misinformation and spread of fake news that has characterized public debate for the last half-decade. Nefarious organizations can use big data to target ads or fake news that aims to influence our ideas, beliefs, and even who we vote for. The reason so much fake news is successful is because it is well targeted and preys on people’s fears—all of which can be tracked (or at least inferred) from big data. With the risks of data theft growing by the day, this issue remains to be solved.
Unintentional misuse of big data (including systematic errors)
While those deliberately seeking to abuse big data are one problem, not all dangers are necessarily premeditated.
Enter machine learning. This is a crucial tool for analyzing and extracting insights from big data.
However, while machine learning algorithms learn on their own, they must first be programmed how to learn, which allows human bias to sneak into the algorithm. Human bias, as well as bad practice in data analytics, or even just poor quality data, can lead to bad insights. If these insights are used to make important financial or safety decisions (for example) there are going to be negative effects.
Since data science is a new field, we can’t yet predict how problems like these will evolve. The use of artificial intelligence is rising, but there are unknown risks attached to this nascent technology. While it’s unlikely that machines will rise to overthrow us any time soon, there are certainly risks associated with artificial intelligence. AI can already do amazing things, but it has limitations. For example, it is not very good at nuance and lacks the intuition of a human being. This can have tragic results, as illustrated by a self-driving Uber car, which killed a woman in 2018. It turns out the accident occurred because the AI in charge of the car did not understand that pedestrians sometimes jaywalk.
To avoid these kinds of risks in future, we must address systemic problems before the technology becomes more widely adopted.
3. Examples of dangerous big data in action
Before looking at how we might tackle some of the problems big data poses, here are some real-world examples of how it has been misused.
Big data and election interference
Probably the most obvious examples of big data misuse are the 2016 US Presidential Election and the 2016 Brexit referendum in the UK.
Following shock results in both polls, Vote Leave in the UK and the Trump Campaign in the US were linked to a shady data analytics firm called Cambridge Analytica. The now-defunct firm used information illegally gathered from Facebook to inform the communications strategies for both polls. Their impact has shaped the global political scene ever since.
Big data and state surveillance
The Chinese government is currently launching a new social credit system.
Linked to each citizen’s permanent record, it aims to promote good citizen behavior. ‘Good’ citizens, e.g. those giving to charity or paying bills on time, will receive credit that can be exchanged for things like first-class airplane or train tickets. Meanwhile, ‘bad’ citizens, e.g. those with traffic violations or unpaid debts, could receive disincentives such as slower internet connections or reduced access to private education. This system, due to be launched in 2020, clearly represents the dark potential of big data.
Big data and racial profiling
Once again, deliberate misuse is not the only danger of big data.
A prime example is Amazon’s facial recognition software, Rekognition. In 2018, the software incorrectly identified 28 members of the US Congress as convicted criminals. While this highlighted an overall problem with the software, a disproportionate number of those misidentified were people of color. This is not an isolated incident—numerous studies have shown there is significant racial (and in some cases gender) bias within these kinds of technologies.
4. How can we minimize the dangers of big data?
While big data poses clear dangers we cannot ignore, nor should we toss the baby out with the bathwater, so to speak. Big data’s potential for positive change is huge. Luckily for us, it’s not a binary choice.
Big data analytics is a new discipline. Naturally, mistakes will be made. The key thing is to learn from these mistakes and improve safety. By implementing security measures and ethical guidelines, we can reap big data’s benefits while mitigating its risks. Here are a few ways that data analysts and data scientists can advocate for safer use of big data.
Stay vigilant about security measures
For any curator of big data, it’s crucial to have effective security measures in place and to ensure that these are up to date. One area where many organizations trip up is on their back-door security. While it’s common to have well-guarded front ends, back-up data is often stored in disaster recovery systems or test environments that are not always as well-protected.
Eliminate unnecessary information
One of the surest ways to prevent a data breach is not to have sensitive data in the first place.
Many companies stockpile data they don’t use, thinking it may be helpful in the future. However, by conducting regular audits, organizations can keep the data necessary for their business operations, while purging what remains. Good housekeeping has the added benefit of focusing analytics tasks where they’re most needed.
Check compliance with data legislation
Although we have data protection legislation to secure people’s data, many companies don’t fully comply with it.
For instance, in a 2019 survey by Talend, only 58% of global businesses were complying with GDPR legislation. In order to protect data, companies need to invest properly in data protection and security, as well as adhering to other guidelines. As a data analyst, it’s important to advocate for your organization’s compliance with data protection measures.
A Hippocratic oath of big data
An individual company’s actions are important for big data security, but other initiatives are needed, too.
British mathematician and data scientist, Hannah Fry, has called on data scientists to take an ethical pledge. The idea is much like the medical Hippocratic oath that doctors take to “do no harm.” Though controversial, the idea of a Data Science Oath encourages discussion about the ethics of big data, which is no bad thing. In conjunction, many data scientists are also lobbying governments to introduce stricter rules around how big data can be collected, stored, and used.
5. Key takeaways
In this post, we’ve explored the benefits and risks of big data. To answer our initial question—“is big data dangerous?”—in short, it’s only dangerous if we allow it to be. As we’ve seen:
- Big data has vast potential—it can be used to glean ever more powerful insights and to transform the way the world works.
- Big data comes with security issues—security and privacy issues are key concerns when it comes to big data.
- Bad players can abuse big data—if data falls into the wrong hands, big data can be used for phishing, scams, and to spread disinformation.
- Insights are only as good as the quality of the data they come from—bad, noisy, or ‘dirty’ data (or applying poor best practice) can lead to poor insights, which can be risky in the wrong situations.
- There are ethical issues—as a new field, the ethics of big data is still evolving. This is why some are pushing for a Data Science Oath and for ethical guidelines to be developed.
The battle between big data’s potential and its dangers remains ongoing. However, identifying and acknowledging its potential risks goes a long way to resolving them. Ultimately, we all need to do our part to promote a culture of integrity within data science. Putting safeguards in place, and regularly reviewing them, is key.
The field of data science and big data analytics continues to evolve and grow. It’s a thrilling field to work in. If you’re considering a career in data analytics, why not try out our free, five-day data analytics short course?
Learn more about big data and data analytics: