What is big data?
The term “big data” is one of those phrases, like ‘viral marketing’ and ‘permission marketing’, that is quickly becoming a victim of its own success. When a phrase becomes a thing, it takes on all manner of meanings – some right and some wrong.
Don’t be mistaken into thinking, just because big data is being talked about a lot, that it is the latest fad. The big data industry has been around for a long time and it’s only in recent years that there has been an explosion in companies taking advantage of it.
Technologies have evolved to make big data accessible to smaller organisations. Understanding why requires an appreciation of all the benefits and uses of big data, as well as some forward-thinking about what will soon be possible.
90% of all the data in the world was created in the past two years
Big data is not just masses of bytes of information. The term refers to the collection of masses of information from multiple sources, but it also refers to the analysis of that data.
The phrase refers to the wealth and disparity of data, and it refers to the potential for computational analysis of that data.
The data must be usable – and this is where the true power of big data lies. Correlation is the real benefit of big data – the ability to spot patterns that you might not expect.
Consider the Wal-Mart executive who discovered that sales of Pop-Tarts increase seven-fold prior to a hurricane, thanks to big data.
As reported by the New York Times, in 2004, Hurricane Frances was on its way towards the Florida coast. A week ahead of the storm’s arrival, Linda M. Dillman, Wal-Mart’s chief information officer, pressed her staff to come up with forecasts based on what had happened when Hurricane Charley struck several weeks earlier.
She decided that, with the trillions of bytes’ worth of shopper history stored in Wal-Mart’s computer network, the company could “start predicting what’s going to happen, instead of waiting for it to happen”.
If they had relied on gut feel, Wal-Mart may have stocked up on bottled water, but the data revealed that the top seller before Hurricane Charley was beer. Not only that, but sales of Strawberry Pop-Tarts increased by a multiple of seven.
Big data is not about the data
Gary King of Harvard University, stating that the data itself doesn’t matter as much as the analysis
In 2005, in their ground-breaking book Freakonomics, big data economists Levitt and Donohue revealed the controversial discovery that the 1990s fall in crime in the USA was related not to broken window policing but to the legalisation of abortion 20 years earlier.
Debates have raged about this since the book came out. Tipping Point author Malcolm Gladwell wrote his own appraisal in 2006, because the Freakonomics theory contradicted his own reference to crime reduction.
I am not arguing the right and wrong of the theory here, but the story does show how, with enough wide data to cross-reference, correlations can appear in the data rather than answers being contrived by assumption.
Some big data stories you may enjoy
- NCR has internet-enabled millions of ATMs and cash registers to it can collect diagnostic data on machines constantly. This has enabled the company to predict breakdowns and send the right engineer to a machine with the right part before it breaks down.
- Analysis of data relating to mobile tower outages helped AT&T identify which types of outages caused the most customer disruption. It was able to prioritise its repairs, resulting in a 59% uplift in customer satisfaction.
- US retailer Target used big data to discover that a pregnancy can be predicted through the combination of products bought together.
- Email service provider MailChimp sends out 35 billion emails each year on behalf of 3 million users. It has harnessed all the data from those emails to collate an amazing array of insights into email marketing.
Big data is what helps us understand the reason. Our own cognitive biases and blinkered experiences often prevent us from seeing the bigger picture.
Without big data you are blind and deaf in the middle of a freeway
Geoffrey Moore, management consultant
Take medicine, for example. A doctor who administers one drug to one person and sees a positive result could easily conclude that his diagnosis and treatment would work for everyone else.
Humans are not identical. The same drug administered to many people showing similar symptoms won’t always react the same way. That’s why drugs are tested on a sample of people, so medics can compare outcomes and behaviours.
Only with a wide statistical base can you draw strong conclusions, but even then you need to cross-reference multiple sources.
You can have data without information, but you can’t have information without data
Daniel Keys Moran
What if a drug works on males more than females, or on people with diabetes but not people with hepatitis? How do different lifestyles and geographies affect the performance of a drug?
This is what a big data approach is designed for. Finding correlations that can lead to statistically accurate conclusions. Most importantly, the benefits of big data are not just for the corporation collecting the data. Big data has had a major impact on social engineering in a number of ways.
In India, the daily Satyamev Jayate TV show is watched by millions of people. Presented by Bollywood star Aamir Khan, the show tackles meaty social issues like feticide, caste discrimination and dowry deaths. The TV show analyses millions of messages it receives on social media and uses the findings to push for political change.
Google Flu Trends helps to track the spread of the flu virus around the world by tracking search query trends. In the map below, you can see how the southern hemisphere (where it is winter) has a higher concentration of flu-related searches.
Big data is helping seismologists develop models that could help them predict when and where earthquakes will occur. Each new tracked event provides new data, which can enhance the ability of algorithms to analyse and predict. A combination of factors such as animal patterns, tremor strength and placements, changes in atmosphere can act as early warning systems but, one day, we may be able to predict accurately what is likely to occur based on previous patterns.
The introduction of the Oyster card on London’s transport network has enabled Transport for London (TFL) to get a better understanding of exact journeys. Because Oyster users swipe their cards at the start and end of each journey, TfL is able to plan more efficient transportation around the city.
This is an important development for London – a city of more than eight million people that is expected to rise to 10 million quickly.
In the USA, app creator Mark Phillip invented an app called Are You Watching This? The app aggregates votes from millions of people watching sports to help users find what, among the many available live sports channels, is worth watching.
The problem with big data
While the benefits of big data – past, present and future – are boundless, there are also many problems.
IBM said that 90% of all the data in the world was created in the past two years. Imagine how many people it would take to manually process all that data – let alone how many more we would need in two years’ time when we have even more data.
Digital technology is the only way to deal with the load. The emergence and rapid growth of technological solutions is the reason we are now seeing the rapid growth of the big data revolution.
Think about the humble traffic light. It controls traffic at junctions all over the world, 24 hours a day, automatically. But, just a century ago, people used to do that job. Imagine how many policemen were employed all around the world to stand at junctions and control the traffic flow.
The first traffic light was installed outside the houses of Parliament in 1868. That first light was operated by a policeman. It lasted a month because the gas light on top blew up and injured the policeman who was controlling it.
Similar models sprang up all over the USA, all controlled by policemen, who would blow a whistle before changing the sign from GO to STOP.
Since then, traffic has grown exponentially. We have traffic lights at millions of junctions and it is impossible to man them all with policemen. Thanks to computers, we don’t need to.
One day, we may not even need traffic lights. If all cars on the road end up as self-driving vehicles, they will be able to be programmed to handle junctions and avoid each other without the need for traffic lights.
Google’s map technology has been collecting masses of data for a long time now. Imagine what kind of information can be programmed into a self-driving car. Not only the ‘how to drive safely’ instructions – the car could be programmed with information about the least congested routes; the locations of the cheapest petrol.
Imagine, even, being able to feed your shopping list into your car and the car takes you to the nearest supermarket that has the best prices. All things are possible with big data.
We are now in the second machine age
The arrival of automated looms created riots as weavers and textile workers feared for their jobs. One worker, Ned Ludd, is said to have smashed two stocking frames in 1779. This led to the activists called ‘Luddites’ who carried out acts of aggression against mechanisation, fighting the British Army in some cases.
One of the important factors of big data is that conclusions are drawn from cold hard correlated statistics rather than assumptions. If you go looking for proof of a pre-conceived idea, you aren’t doing it properly.
Big data purists believe the answers should present themselves. Discovering the unexpected is one of the core benefits of big data. You can only do that if you have all the data from all the relevant sources.
Without technology, we will never continue to increase the benefits that big data offers. That technology needs to include automation of tasks that are traditionally done by people. Just as the automated looms and knitting machines revolutionised the textile industry in the industrial revolution, today’s tech revolution – the second machine age – will see the emergence of machine learning and artificial intelligence.
It is a capital mistake to theorise before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
Sherlock Holmes in A Scandal in Bohemia
A core component of the big data revolution is the explosion of algorithmic activity. Big data and marketing need to be inextricably linked. Big data and algorithms are inseparable. Ergo, marketing needs to start investing in algorithmic solutions. These solutions can be anything from reporting to decision automation.
How do you crunch millions of pieces of data to find correlations, then report that in a meaningful way and make decisions on what to do with your discoveries? You can create a program to do it all for you – as long as the expertise behind that program is sound.
[chapter name=”Humans versus algorithms”][/chapter]
The idea that algorithms can replace humans is scary to some. How could a computer ever write an article that anyone would want to read?
Well, this already happens. Associated Press uses a service called Automated Insights to produce thousands of financial reports every quarter. Many people who invest in shares on the basis of Associated Press reports probably have no idea that all their information is being written by an algorithm.
Using programs like CoolTrade, you can automatically buy and sell shares while you are away from your computer; even financial advisors could be replaced by computers – the benefit being that the advice is based on data rather than assumption.
Computers can do many things simultaneously, without judgment and without fatigue. While I have been writing this article, and concentrating on finding references for it, I haven’t been able to do anything else. An algorithm could have written thousands of variations of this article by now.
People have flaws – we can only hold one conscious thought at a time, we get tired or bored, we are driven by emotion. When marketing is driven by data and not by gut feel, we make smarter decisions. We could end up selling thousands of Pop-Tarts instead of over-stocking the water.
This revolution in marketing and technology is what’s driving Vertical Leap’s approach to digital marketing. With the combination of specialist experts and a deep data platform, we can make real data-driven decisions.
We have invested a lot of time, effort and money into building Apollo Insights as a big data platform for small businesses. Our vision is to pull every piece of accessible data we can pull in relation to customers’ websites.
With a detailed reporting layer, we can present that data in meaningful ways and plan actions, but we aren’t finished. There are lots of routine tasks that experts need to perform on a site routinely and repeatedly. We’re building software to perform these tasks – this will free us up to do more inventive stuff and it will also mean those tasks are performed many times over by algorithms that don’t need to work in a linear fashion.
Here’s some recommended reading for you
- Data-Ism by Steve Lohr
- Big Data by Viktor Mayer-Schonberger and Kenneth Cukier
- The Formula by Luke Dormehl
- The Second Machine Age by Erik Brynjolfsson
- Automate This by Christopher Steiner
Watch our presentation on the big data dilemma
This presentation was for a webinar that I presented with The Drum Network. Here are the slides from that presentation. They include much of the same examples cited above, with some additions.
Alternatively, click below to watch the full webinar: