I wrote this piece as a perspective on the emergence of Big Data. It was published on 6th June 2016.
Francis Beaufort from Navan invented the scale of wind strength now named after him, in 1805. He had been shipwrecked at the age of fifteen, and devoted his career to the development of maritime charts for safe navigation.
In 1831 a naval captain, Robert Fitzroy, sought Beaufort’s suggestions on which scientist to bring with him on a charting expedition to South America. Beaufort introduced the young captain of HMS Beagle to Charles Darwin, who subsequently wrote ’On The Origin of Species’ based on his research during the voyage.
Fitzroy himself was a keen observer of the weather. He invented a barometer and wrote a widely used maritime manual on weather indicators. As a protege of Beaufort, the famous Irish maritime cartographer, in 1854 Fitzroy was appointed as chief of a new government department to collect and analyse weather data at sea. In due course this became the UK Meteorological Office. Fitzroy fastidiously built a nationwide network of weather observers: local harbourmasters, lighthouse keepers and ships captains. Each day he collated by hand literally hundreds of weather reports to produce a three day maritime forecast for the entire British Isles.
Tragically, in 1859 captain Thomas Taylor of the passenger steam clipper ‘Royal Charter’ chose to ignore Fitzroy’s predictions and technology. His ship sunk in a storm with major loss of life, off the coast of Anglesey, having just left Dun Laoghaire for Liverpool, near the end of a long voyage which had originated in Melbourne, Australia. Other ships had heeded Fitzroy’s warnings for that evening on the Irish Sea, and avoided the storm. The resultant public outcry did much to enhance Fitzroy’s status as a weather forecaster.
Fitzroy’s weather predictions are an early example of ‘big data’ and Fitzroy was arguably the first ever ‘data scientist’. While weather forecasting was the first application of big data, it was soon followed by political opinion polling.
Famously, the US general interest magazine ‘Literary Digest’ conducted straw polls amongst its readership, by the simple expediency of returning a postcard, on a series of US Presidential elections from 1920 to 1932. These polls always accurately predicted the winner. Buoyed by their success the editors decided that for the 1936 Presidential election, they would go really ‘big’ so as to obtain the most amount of data possible. Not only did they poll their own readership, but also registered automobile and telephone users, reaching some ten million potential respondents. Their forecast then proved disastrously incorrect! Worse still, their competitor Gallup accurately predicted the election based on a comparatively tiny sample of just 50,000. The Literary Digest methodology had concentrated only on the wealthy and middle classes, missing blue collar workers who had suffered under the Great Depression.
Big data is not always good data.
Political data scientists today are increasingly accurate. Nate Silver, a US baseball league analyst, successfully predicted the outcome in each of the 50 US states during the last Presidential election in 2012. His simultaneously published book ‘The Signal and the Noise’, explains how mathematical analysis can extract meaningful trend indicators, and became a best nationwide seller. Nevertheless just a couple of weeks ago, he publicly blogged on how he had “screwed up on Donald Trump”, and had wrongly predicted Trump’s demise as the Republican candidate for the forthcoming US Presidential election.
Predictions using big data are a hot theme for the technology industry. Digital advertising is the largest application. The preferences and behaviour of literally hundreds of millions of people – including you and I – are fastidiously tracked by computers in the hope that judiciously placed adverts on our individual smart phones and devices will influence our purchasing behaviour. But there are other application areas. For example, IBM’s Watson prediction technology is being applied to areas such as health care, the legal profession, and scientific research.
Big data is big money.