GridKa Technical Overview

High-Throughput Compute Farm

The GridKa compute farm consists of approximately 250 compute nodes (2024). The setup is optimized for independent high-throughput jobs common in high-energy-physics and astroparticle physics computing, which don't require low-latency interconnects between the compute nodes. GPUs are available for R&D and production compute jobs. The HTCondor batch systems manage ~48000 logical CPU cores (including 3840 ARM CPUs) and 56 GPUs (2024).

Batch Farm Monitoring

Online Storage

In order to serve as a data hub for the Worldwide LHC Computing Grid, GridKa operates a large software-defined online storage installation. Based on IBM Spectrum ScaleTM with an internal Infiniband  network, the GridKa online storage is highly scalable in capacity and performance. Access for users is provided through the dCache and xrootd middlewares. In 2024 ~68PB with a total throughput of more than 200GB/s are available to the users.

Online Storage Monitoring

Offline Storage

The GridKa Offline Storage system provides the capacity for efficient long term storage of raw data of the experiments. It provides more than 140 PB of capacity for the four LHC experiments and Belle II. Since mid 2024, all data is available in a Spectra Logic TFinity® library with TS1160 drives by the High Performance Storage System (HPSS).

Tape Monitoring

Network

High bandwidth Wide Area Network connections are essential to receive data directly from CERN and transfer data to and from other WLCG centers all across the globe. Two 100Gbit/s connections to CERN and two 100Gbit/s connections to the internet allow GridKa to cope with expected data rates during LHC Run 3.
The internal network backbone connects the online storage system, management servers, and compute node.

Network Monitoring