Tools and techniques for massive data analysis

You are here


The increasing amount of scientific data being collected through sensors or computational simulations may take advantage of new analytics techniques for being processed in order to extract new meanings out of raw data. The purpose of this workshop is to present scientists tools and techniques, open issues, recent developments, applications and enhancements for MapReduce, and similar systems. Over the years, MapReduce has become one of the main programming models of choice for processing large data sets. Although it was originally developed for processing web information, the technique has gained a lot of attention from the scientific community for its applicability in large parallel data analysis. Participants will learn how to combine tools and techniques from statistics and computer science to solve their problems more efficiently. The course will consist of introductory lectures held by guest data-analyst experts, and hands-on sessions.


Basic principles of Python, MapReduce, and technologies like Hadoop and Spark. Basic understandings for problem analysis and optimization. Project design and strategies for building a scalable data analysis application. About half of the course will consist of practical hands-on sessions. The programme will include one invited talk from a guest speaker working in the field.

Target audience: 

Students, PhD, and researchers in computational sciences and scientific areas with different backgrounds, looking for new technologies and methods to process and analyse large amount of data.


Participants must have basic knowledge in programming with Python and using GNU/Linux-based systems.

Research Institutions
Minimum number of attendants required: 

Next courses

Any question?

For HPC and computer graphics courses, write to


Cineca is a non profit Consortium, made up of 70 Italian universities, 5 Italian Research Institutions and the Italian Ministry of Education.

Today it is the largest Italian computing centre, one of the most important worldwide. With more seven hundred employees, it operates in the technological transfer sector through high performance scientific computing, the management and development of networks and web based services, and the development of complex information systems for treating large amounts of data.

It develops advanced Information Technology applications and services, acting like a trait-d'union between the academic world, the sphere of pure research and the world of industry and Public Administration. .

Visit the Cineca website