An HPC performance monitoring tool is available on MARCONI cluster and is at present in pre-production phase. This is based on a software daemon (hpcmd) that manages to run and query for each detected running job some standard Linux command line tools (perf, ps), virtual file system files (as /proc/loadavg) and other proprietary tools (mmpmon, opainfo) at regular intervals (epochs) to obtain related metrics. The hpcmd daemon is extremely lightweight and operates in the background, being invisible to the user. Data generated by the hpcmd tool is written into syslog records that are collected via rsyslog and stored in a database for subsequent analysis and visualization. User’s Jobs collected data can be visualized (after about 4 hours since the data generation) through a web interface only.
The aim of this session is to present the tool to EUROfusion users on Marconi cluster, by focusing mainly on the presentation of the web interface where users can visualize data collected for their executed jobs on the cluster.