High Performance Bioinformatics

Monday, 4 December 2023 10:00 to Wednesday, 6 December 2023 17:30

Provided as:

Ordinary Course

Where:

Cineca Site - ROME Via dei Tizii, 6b, 00185 Roma RM

Registrations closing:

Monday, 13 November 2023 at 10:00

The course is FREE of charge,

It will be held EXCLUSIVELY IN PRESENCE (No Streaming Available) and will be held in ITALIAN language.

Organizer:

Silvia Gioiosa

Teachers:

Silvia Gioiosa, Alessandro Grottesi, Bala Chandramouli, Juan Mata Naranjo

We live in a big-data era and simple serial bioinformatic pipelines can’t efficiently handle huge input datasets. Hence, High Performance Computing (HPC) can represent a good solution for researchers who need to analyze and address new biological questions with their data.

This course is both theoretical and practical and is addressed to bioinformaticians who want to scale up their analysis on a cluster machine. It mainly focuses on the development and execution of automated data analysis pipelines.

On the first day, students will become confident with a cluster machine (e.g. hardware, software, module environment, data storage) and will learn how to submit a single batch script via the SLURM scheduler.

On the second day, partecipants will be introduced to the world of Next Generation Sequencing (NGS) and will learn how to build a fully automated RNA-seq pipeline able to handle large input datasets, focusing on job concatenations and HPC request resource optimization.

On the last day students will be introduced to cloud computing and to the world of snakemake, a python workflow management system tool able to create reproducible and scalable data analyses, with particular attention to scaling workflows on cluster and cloud without modifyng the workflow definition.

Ad-hoc hands-on sessions, aimed at applying the concepts explained during the course, will be held every afternoon.

Skills:

By the end of the course each student should be able to:

- Know all the conventions and opportunities offered by CINECA for accessing HPC resources;

- Download datasets from public repositories and/or transfer input files from the user’s local computer to the CINECA clusters;
- Navigate through the software environment set up by CINECA;
- Run single-step jobs on a supercomputer via SLURM scheduler;
- Combine several bioinformatics applications into a fully automated pipeline able to run on a supercomputer;

- Learn how to iterate through samples in order to manage huge input datasets;
- Have an overview of how to take advantage of snakemake to build a portable, scalable and fully automated pipeline.

Target audience:

Biologists, bioinformaticians and computer scientists interested in approaching large-scale NGS-data analysis.

Course prerequisites:

Good knowledge of python and shell command line.

A very basic knowledge of R and biology is recommended but not strictly required.

Intended for:

Companies

Health

Research Institutions

Area:

Science

Course material and recordings:

https://learn.cineca.it/course/view.php?id=1643

Files e allegati:

agenda_2023_-_high_performance_bioinformatics.pdf

Conclusa:

0