The increasing amount of scientific data being collected through sensors or computational simulations may take advantage of new analytics techniques for being processed in order to extract new meanings out of raw data. The purpose of this workshop is to present scientists tools and techniques, open issues, recent developments, applications and enhancements for MapReduce, and similar systems. Over the years, MapReduce has become one of the main programming models of choice for processing large data sets. Although it was originally developed for processing web information, the technique has gained a lot of attention from the scientific community for its applicability in large parallel data analysis. Participants will learn how to combine tools and techniques from statistics and computer science to solve their problems more efficiently. The course will consist of introductory lectures held by guest data-analyst experts, and hands-on sessions.
Basic principles of Python, MapReduce, and technologies like Hadoop and Spark. Basic understandings for problem analysis and optimization. Project design and strategies for building a scalable data analysis application. About half of the course will consist of practical hands-on sessions. The programme will include one invited talk from a guest speaker working in the field.
Students, PhD, and researchers in computational sciences and scientific areas with different backgrounds, looking for new technologies and methods to process and analyse large amount of data.
Participants must have basic knowledge in programming with Python and using GNU/Linux-based systems.