Roger Hislop
Roger Hislop Senior Engineer: R&D

For all our amazing new data gathering, storing and processing capability, enterprises still need actionable information about humans and what they’re up to – Roger Hislop, Senior Engineer: R&D at Internet Solutions.

two men looking at cellphone

To quantify the hype around Big Data, trying googling ‘benefits of Big Data’. The phrase returns more than 20 million results.

Big Data is both a specific technical methodology and a frustratingly nebulous promise of valuable business intelligence, found hidden deep within billions of documents, statistics, records and information generated daily by consumers in their interactions with enterprises.

It’s nebulous, because much of what we’re talking about in technical or business terms when it comes to Big Data is not new. What is new is the cost point at which we can do incredible things with mountains of data.

Exponential growth in computing power, economies of scale offered by the cloud and rapid digitisation of business (so many manual processes are now tech-enabled) means that, in tandem with the explosion in the amount of data generated every second by commerce, data can now be collected, stored and processed affordably.

Big Data is supposed to enable improved business outcomes in the future, based on what the organisation has learned. However, examples of Big Data producing tangible insights impossible to generate with conventional data analysis tools are few and far between, because there are fatal flaws in its proposition.

First, volume does not equal value. Second, inference is not insight. Third, complexity is not completeness.

Too-big data

Imagine for a moment that a big-box retailer called Megastores tracks the demographics of all shoppers entering each of their stores every day, using advanced facial recognition technology. In addition to basic biographical information (name, age, gender and so on), the retailer gathers and matches to customers’ profiles data on every education facility ever attended (from nursery school to tertiary institutions), as well as estimated body measurements from the clothing sizes tried on during their visits, and the history of the garments they bought. Add to this the weather outside at the time of the visit, the date and time, all cross-referenced against promotional activities.

We’ll assume that customers have no objection to providing such personal information, and that Megastores somehow manages to gather it all without disrupting the buying or selling experience. Let’s also assume that Megastore’s board has given the IT department infinite budget to collate and store all that data, generated daily.

So much valuable data. And so much of it isn’t useful. Education level may broadly indicate income and therefore propensity to spend, but knowing each customer’s preschool alma mater is unlikely to let Megastores make meaningful predictions. Volume does not equal value.

Now let’s imagine that Megastore’s data reveals that a high proportion of its customers wear blue jerseys while shopping in their stores. The data also reveals that tinned tomatoes are one of its most popular products. These two data points may be related, but without some independent validation, Megastores shouldn’t take as fact that all shoppers wearing blue jerseys will buy tinned tomatoes. Inference is not insight.

Technology now provides enterprises with many more sources of information. Data storage costs are tending towards zero. The silicon necessary to crunch vast amounts of data is plentiful. But this is no guarantee of meaningful business intelligence, because Big Data is a powerful tool, but only in the right hands. And these hands are neither cheap, nor plentiful.

For Big Data to deliver on its promise, enterprises must invest in specialised data science resources to ensure the comprehensive capture of quality, appropriate data, where there is sound understanding of how data sets are related to each other, and where there are dependencies.

Given enough data points, the advanced analytic techniques available today will definitely generate a result. But only skilled data scientists properly understand the nature of data sources, the idiosyncrasies and trade-offs of the various analytic techniques, and how noise, sampling errors and computational artefacts can make fools of the most advanced off-the-shelf Big Data tools.

A wrong answer accurate to eleven decimal places is still wrong. And just because something is true does not mean it’s actionable.

Small data is useful data

Just as CIOs are getting to grips with their Big Data strategy - now comes something completely different. Enterprises should seriously look to small data – data in a volume and format that is immediately accessible, informative, and actionable.

Author Martin Lindstrom explains in his book Small Data the difference between big and small data like so: “Big Data is all about finding correlations. Small data is all about finding the causation, the reason why.”

Small data enables companies to arrive at definitive conclusions without navigating the overwhelming amount of information that Big Data bombards them with. It is a return to directly observing consumer behaviour instead of analysing reams of statistics. It is a return to looking if something is on or off, hot or cold, open or closed.

Lindstrom talks about traveling to Stockholm to meet the owner of IKEA, Ingvar Kamprad. Instead of escorting Lindstrom to Kamprad’s office suite, IKEA staff directed Lindstrom to Kamprad’s ‘usual spot’ – behind a till. He often spent time personally checking out customers, discussing their preferences and purchases at the same time. In his words, “the cheapest and the most efficient research ever”.

Just as Big Data and related technologies such as Machine Learning and Artificial Intelligence matured quickly on the back of Web-scale architectures and falling costs of compute for giant data sets, so Small Data is rapidly gestating thanks to the radical fall in the cost of wireless connectivity for physical “things”.

The explosion in cheap, reliable and convenient “Internet of Things” devices allows small data to be gathered – information such as temperature, electricity use, door openings, movement and so on. We’ve always been able to do this using trusty (expensive) copper wires or familiar (expensive) GPRS – but now using Low Power WAN (LPWAN) technologies such as LoRaWAN, Sigfox or NB-IOT, we can connect a device to the Internet for a Rand or two per month.

A national burger chain could gather information on the time and amount of every branch’s customer purchases for the next ten years, to establish whether its franchisees are sticking to the official operating hours – that’s Big Data’.

Or it could install a small wireless sensor on the door of each store to record when they open to the public - that’s small data.

One can give you insights into trends over a period. One can tell you exactly what is happening in real time. Big data can find patterns deep in a sea of information. Small data can pull up a few fish to tell you what’s down there right now. One lets you validate the other – and that’s really powerful.

Big data, done right by people with the right skills, is changing how we make decisions. Small data is ensuring that the decisions we take are correct, and effective.