Home » Business » Biz Commentary
Believe more in Big Data or in magic?
One year I spent a lot of time with professional magicians. A few showed me the secrets to their tricks. Whenever they did, the skill and dexterity required for sleight-of-hand struck me as far more impressive than the idea that magic had been performed. It reminded me of my own experience with statistics.
Data analysis is very similar to performing magic. With great skill you can pull things together and create the perception of surprising relationships. Often the magic is getting people to look at one thing, when they should be seeing another. Similarly with statistics, it’s often not the correlation that’s interesting but what you did to find it.
This is important to keep in mind as the world embarks on the big data revolution. Big data is very large data sets, collected by the government, corporations, and institutions, becoming more available. Using this data, firms and policymakers can figure out what programs work and what consumers want. The deluge of information is expected to increase efficiency and lower prices. In a recent report, the McKinsey Global Institute encourages the increased availability of big data. It estimates that greater access to big data has the potential to create US$3 trillion a year in value.
It is generally true that more information is better, though big data comes at a cost in terms of privacy and data collection. Yet what concerns me is the proper interpretation of big data. An earlier McKinsey report addresses this issue. It notes a dearth of trained statisticians, estimating that America is short 140,000 to 190,000 workers with the skills to handle data. But lack of talent is not just an impediment; it’s a potential source of danger. People, even those who know better, often take correlations literally and make decisions based on them, without appreciating the magic behind the numbers.
Interpreting data is more of an art than a science. But unlike magicians, most researchers do not intentionally mislead people. A big concern when you run statistics is bias and mistaking correlation for causation. You might get biased results by either using the wrong data or an inappropriate estimation technique. Minimizing bias requires making subjective judgments. If you ran numbers on a large data set without inspecting it, removing outliers, and choosing the best model — you’d have much more bias than if you used some discretion.
Human nature
The process is complicated by human nature. It is easy to be seduced by your own results when they validate your prior expectation of what you’ll find. Take the financial crisis, in which bad statistics played a large role. Many quants priced exotic housing securities using models that were Fed data from areas where house prices never fell. This made the price of risk look very attractive, but then the products couldn’t remain viable when house prices fell. In most cases the oversight was not intentional. It reflected the data available and the current industry standard.
Often what’s most interesting isn’t the statistical relationship itself, but the data that was required to find it. Take the oft-cited statistic that American life expectancy is lower than that of many other OECD countries. That would suggest that American healthcare is not as successful as other systems. But when you look more deeply at the data, a different story emerges. Once you account for people who died from injury (like violence or car accidents) or obesity-related disease, American life expectancy is similar to Canada’s. America’s lower life expectancy is alarming and should get the attention of policymakers. But to remedy it, we need to understand what’s causing more car fatalities and obesity, and what factors — like poverty or arcane drug laws — lead to so much violence.
Such examples may seem straightforward, but in practice they are hard to spot, even for the most experienced and well-intentioned professionals. That’s why in academia, statistical work under goes a rigorous peer review process. In the same way a magician can discern an impressive or dirty trick, it takes a community with the same expertise to spot sources of bias. But expert peer review won’t be realistic as data becomes more wildly available. It should be a serious concern that people, without adequate experience, might unknowingly produce biased results and make important decisions based on them.
But the use of big data is worth the risk. Statistical analysis is an imperfect process, but it’s all we have to make sense of big data. McKinsey advocates more training and apprenticeships so we have more people who can run and manage data. This is certainly necessary, but not sufficient. We must also view any statistical result with the same humility and skepticism we experience when we see a magic trick.
Allison Schrager is a New York-based writer and economist. She has written for the Economist, Quartz and National Review.
- About Us
- |
- Terms of Use
- |
-
RSS
- |
- Privacy Policy
- |
- Contact Us
- |
- Shanghai Call Center: 962288
- |
- Tip-off hotline: 52920043
- 沪ICP证:沪ICP备05050403号-1
- |
- 互联网新闻信息服务许可证:31120180004
- |
- 网络视听许可证:0909346
- |
- 广播电视节目制作许可证:沪字第354号
- |
- 增值电信业务经营许可证:沪B2-20120012
Copyright © 1999- Shanghai Daily. All rights reserved.Preferably viewed with Internet Explorer 8 or newer browsers.