Spotfire Data Science with Hadoop
Using Spotfire Data Science to Operationalize Data Science in the Age of Big Data
THE RISE OF BIG DATA
BIG DATA: A REVOLUTION IN ACCESS
Large-scale data sets are nothing new. After all, before the term “big data,” airline reservation systems tracked millions of flight segments and bookings, and phone companies kept billions of call detail records. But now it is possible for small companies and individuals to access the same massive computational and storage resources using inexpensive commodity hardware and the cloud.
Central to this data-ubiquity story is the open-source distributed computational
framework called Apache™ Hadoop.® Created at Yahoo, based on Google’s
MapReduce and Google File System publications, Hadoop allows large datasets to
be stored and parallel-processed by spreading files across a large number of small
commodity servers. Hadoop has a large following in both the commercial and
open-source software communities. Reduction in the cost of hardware and linear scalability of Hadoop has resulted in an unprecedented amount of data being stored and analyzed to increase our understanding of the physical world, predict human behavior, and improve
performance and security.