Why 'lean data' beats big data | Media Network | Guardian Professional
Matti Keltanen | Tuesday 16 April 2013 15.06 BST
The big data hype may not help you make the right decisions for your business – and there four reasons why a lean approach makes better sense.
Big data is one of today's hottest topics. We're told big data is "the new oil" and that it will lead to a distinct and measurable competitive advantage. Over time, they say, enough big data could even make the scientific method obsolete.
These predictions suggest a certainty and unqualified belief that a big data revolution is imminent. However, they neglect to mention what businesses should do about data. This insistence on going big (at a price) seems strange in such economic times, where large businesses aspire to act more like start-ups. And the questions that big data evangelists ignore that are most revealing: what should I measure? How can I use data to help decision making? What do I need to invest in to use data well?
Big data can do most of the things your laptop does, the difference is simply industrial scale. By contrast, we can use the term 'lean data' to describe an Occam's razor approach to data capture and analysis: the lightest, simplest way to achieve your data analysis goals is the best one.
Here are four reasons to prefer lean – rather than big – data.
1. Starting with 'big' puts the cart before the horse
Without knowing what your data needs are, it's counterproductive to start with the assumption of industrial-scale data. If your data strategy consists of collecting a few thousand data points a day, you're not in the big data club. And maybe the most meaningful data is quite small – such as the example of Austin-based startup Food on the Table (as Eric Ries described in his book, The Lean Startup), who initially offered their service to only a handful of customers.
2. Everyday tools pack a lot of punch
Chances are, the optimal amount of data storage and processing capability for your business is going to be less than Google's. Lean data relies on picking the right tools for the job, and you may already have them. Fjord recently helped Harvard Medical School redesign interactive paediatric growth charts to be used on tablets, using relatively simple data judiciously to improve doctors' decision making and potentially reducing significant harm to patients.
3. Diminishing returns still apply
Statements like "data is the new oil" make it sound like data is currency, when it's actually an investment. In all statistical measurements, once enough data points have been collected to establish a result, adding more data points begins to create less accuracy. This should be a pressing concern when you're investing increasing amounts of money, time and resources into capturing and analysing data.
For example, American statistician, Nate Silver, frequently uses polls of sample sizes ranging from hundreds to thousands, and his model explicitly accounts for diminishing returns.
4. The hard part is still done by humans
The dirty secret of big data is that no algorithm can tell you what's significant, or what it means. Data then becomes another problem for you to solve. A lean data approach suggests starting with questions relevant to your business and finding ways to answer them through data, rather than sifting through countless data sets.
Furthermore, purely algorithmic extraction of rules from data is prone to creating spurious connections, such as false correlations.
None of this is to say there aren't opportunities in big data. New advances are constantly being made that require increasing amounts of data processing power, and large companies will almost by definition need to deal in vast data sets. But today's big data hype seems more concerned with indiscriminate hoarding than helping businesses make the right decisions.