Large Scale Computations
When you start operating with data at the scale of the web, the fundamental approach and process of analysis must change. To combat the ever increasing amount of data, Google developed theMapReduce paradigm. This programming model has become the de facto standard for large scale batch processing since the release of ApacheHadoop in 2007, the open-source MapReduce framework
- Books
- Mining Massive Datasets:Mining Massive Datasets: Stanford course resources on large scale machine learning and MapReduce with accompanyingbook.
- Data-Intensive Text Processing with MapReduce:Data-Intensive Text Processing with MapReduce: An introduction to algorithms for the indexing and processing of text that teaches you to “think in MapReduce.”
- Hadoop: The Definitive Guide: The most thorough treatment of the Hadoop framework, a great tutorial and reference alike.
- Programming Pig: An introduction to the Pig framework for programming data flows on Hadoop.
- Courses
- UC Berkeley: Analyzing Big Data with Twitter: A course — taught in close collaboration with Twitter — that focuses on the tools and algorithms for data analysis as applied to Twitter microblog data (with project based curriculum).
- Coursera: Web Intelligence and Big Data: An introduction to dealing with large quantities of data from the web; how the tools and techniques for acquiring, manipulating, querying, and analyzing data change at scale.
- CMU: Machine Learning with Large Datasets: A course on scaling machine learning algorithms on Hadoop to handle massive datasets.
- U of Chicago: Large Scale Learning: A treatment of handling large datasets through dimensionality reduction, classification, feature parametrization, and efficient data structures.
- UC Berkeley: Scalable Machine Learning: A broad introduction to the systems, algorithms, models, and optimizations necessary at scale.
No comments:
Post a Comment