Ongoing Projects
Verbundprojekt CausalNet - A flexible, robust, and efficient framework for integrating causality into machine learning models
Existing machine learning (ML) models typically on correlation, but not causation. This can lead to errors, bias, and eventually suboptimal performance. To address this, we aim to develop novel ways to integrate causality into ML models. In the project CausalNet, we advance causal ML toward flexibility, efficiency, and robustness: (1) Flexibility: We develop a general-purpose causal ML model for high-dimensional, timeseries, and multi-modal data. (2) Efficiency: We develop techniques for efficient learning algorithms (e.g., synthetic pre-training, transfer learning, and few-shot learning) that are carefully tailored to causal ML. (3) Robustness: We create new environments/datasets for benchmarking. We also develop new techniques for verifying and improving the robustness of causal ML. (4) Open-source: We fill white spots in the causal ML toolchain to improve industry uptake. (5) Real-world applications: We demonstrate performance gains through causal ML in business, public policy, and bioinformatics for scientific discovery.
Tier-2 Online Storage for ATLAS and CMS at GridKa
since 2024-10-01 - 2025-12-31
Purchase and operation of experiment-specific Tier-2 online storage for ATLAS and CMS at GridKa at KIT.
Distributional Copula Regression for Space-Time Data
Distributional copula regression for space-time data develops novel models for multivariate spatio-temporal data using distributional copula regression. Of particular interest are tests for the significance of predictors and automatic variable selection using Bayesian selection priors. In the long run, the project will consider computationally efficient modeling of non-stationary dependencies using stochastic partial differential equations.
DFG-priority program 2298 Theoretical Foundations of Deep Learning
The goal of this project is to use deep neural networks as building blocks in a numerical method to solve the Boltzmann equation. This is a particularly challenging problem since the equation is a high-dimensional integro-differential equation, which at the same time possesses an intricate structure that a numerical method needs to preserve. Thus, artificial neural networks might be beneficial, but cannot be used out-of-the-box.
We follow two main strategies to develop structure-preserving neural network-enhanced numerical methods for the Boltzmann equation. First, we target the moment approach, where a structure-preserving neural network will be employed to model the minimal entropy closure of the moment system. By enforcing convexity of the neural network, one can show, that the intrinsic structure of the moment system, such as hyperbolicity, entropy dissipation and positivity is preserved. Second, we develop a neural network approach to solve the Boltzmann equation directly at discrete particle velocity level. Here, a neural network is employed to model the difference between the full non-linear collision operator of the Boltzmann equation and the BGK model, which preserves the entropy dissipation principle. Furthermore, we will develop strategies to generate training data which fully sample the input space of the respective neural networks to ensure proper functioning models.
Countrywide service bwGitLab
since 2024-07-01 - 2029-06-30
Based on the GitLab software, a state service for the administration, versioning and publication of software repositories for universities in Baden-Württemberg is being created as part of the IT alliance. GitLab also offers numerous possibilities for collaborative work and has extensive functionalities for software development and automation. The service enables or simplifies cross-location development projects between universities and with external partners, new possibilities in the field of research data management and can be used profitably in teaching. It also creates an alternative to cloud services such as GitHub and makes it easier to keep data from research and teaching in Baden-Württemberg.
Translated with DeepL.com (free version)
Smart weather predictions for the 21st century through probabilistic AI-models initialized with strom-resolving climate projections - SmartWeather21
since 2024-05-01 - 2027-04-30
The ongoing warming of the earth's climate due to man-made climate change is fundamentally changing our weather. Traditionally, weather forecasts have been made based on numerical models, so-called Numerical Weather Predictions (NWP). Data-driven machine learning models and in particular deep neural networks offer the potential as surrogate models for fast and (energy) efficient emulation of NWP models. As part of the SmartWeather21 project, we want to investigate which DL architecture for NWP is best suited for weather forecasting in a warmer climate, based on the high-resolution climate projections generated with ICON as part of WarmWorld. To incorporate the high resolutions of the WarmWorld climate projections, we will develop data- and model-parallel approaches and architectures for weather forecasting in a warmer climate. Furthermore, we will investigate which (learnable combinations of) variables from the ICON climate projections provide the best, physically plausible forecast accuracy for weather prediction in a warmer climate. With this in mind, we develop dimension reduction techniques for the various input variables that learn a latent, lower-dimensional representation based on the accuracy of the downstream weather forecast as an upstream task. The increased spatial resolution of the ICON simulations also allows conclusions to be drawn about the uncertainties of individual input and output variables at lower resolutions. As part of SmartWeather21, we will develop methods that parameterise these uncertainties using probability distributions and use them as input variables with lower spatial resolution in DL-based weather models. These can be propagated through the model as part of probabilistic forecasts.
EOSC Beyond
since 2024-04-01 - 2027-03-31
EOSC Beyond overall objective is to advance Open Science and innovation in research in the context of the European Open Science Cloud (EOSC) by providing new EOSC Core capabilities allowing scientific applications to find, compose and access multiple Open Science resources and offer them as integrated capabilities to researchers
bwNET 2.0
Resilient, powerful and flexible network operation and the provision of individualized and secure network services are at the core of bwNET2.0. The aim is to contribute to fundamental research in the field of autonomous networks on the one hand and to design and implement concrete building blocks and mechanisms that can be used to further develop BelWü and campus networks on the other.
bwJupyter for teaching
The aim of this project is to strengthen research-oriented teaching, especially in the areas of AI, machine learning, simulation and modeling, by providing a state-wide service bwJupyter.
AI-enhanced differentiable Ray Tracer for Irradiation-prediction in Solar Tower Digital Twins - ARTIST
Solar tower power plants play a key role in facilitating the ongoing energy transition as they deliver dispatchable climate neutral electricity and direct heat for chemical processes. In this work we develop a heliostat-specific differentiable ray tracer capable of modeling the energy transport at the solar tower in a data-driven manner. This enables heliostat surface reconstruction and thus drastically improved the irradiance prediction. Additionally, such a ray tracer also drastically reduces the required data amount for the alignment calibration. In principle, this makes learning for a fully AI-operated solar tower feasible. The desired goal is to develop a holistic AI-enhanced digital twin of the solar power plant for design, control, prediction, and diagnosis, based on the physical differentiable ray tracer. Any operational parameter in the solar field influencing the energy transport may be, optimized with it. For the first time gradient-based, e.g., field design, aim point control, and current state diagnosis are possible. By extending it with AI-based optimization techniques and reinforcement learning algorithms, it should be possible to map real, dynamic environmental conditions with low-latency to the twin. Finally, due to the full differentiability, visual explanations for the operational action predictions are possible. The proposed AI-enhanced digital twin environment will be verified at a real power plant in Jülich. Its inception marks a significant step towards a fully automatic solar tower power plant.
Authentication and Authorisation for Research Collaboration Technical Revision to Enhance Effectiveness - AARC TREE
Collaboration and sharing of resources is critical for research. Authentication and Authorisation Infrastructures (AAIs) play a key role in enabling federated interoperable access to resources.
The AARC Technical Revision to Enhance Effectiveness (AARC TREE) project takes the successful and globally recognised “Authentication and Authorisation for Research Collaboration” (AARC) model and its flagship outcome, the AARC Blueprint Architecture (BPA), as the basis to drive the next phase of integration for research infrastructures: expand federated access management to integrate user-centring technologies, expand access to federated data and services (authorisation), consolidating existing capacities and avoiding fragmentation and unnecessary duplication.
SCCs participates in AARC-TREE to continue developing the Blueprint Architectures. Here we contribute to technical recommendations, as well as to policy development. Since SCC is also a core member of the IAM project of the german NFDI, we can raise the awareness of NFDI requirements in AARC, as well as feed new developments back to NFDI, in a very timely manner.
Holistic Imaging and Molecular Analysis in life-threatening Ailments - HIMALAYA
since 2024-02-01 - 2027-01-31
The overall goal of this project is to improve the radiological diagnosis of human prostate cancer in clinical MRI by AI-based exploitation of information from higher resolution modalities. We will use the brilliance of HiP-CT imaging at beamline 18 and an extended histopathology of the entire prostate to optimise the interpretation of MRI images in the context of a research prototype. In parallel, the correlation of the image data with the molecular properties of the tumours is planned for a better understanding of invasive tumour structures. An interactive multi-scale visualisation across all modalities forms the basis for vividly conveying the immense amounts of data. As a result, at the end of the three-year project phase, the conventional radiological application of magnetic resonance imaging (MRI) is to be transferred into a diagnostic standard that also reliably recognises patients with invasive prostate tumours that have often been incorrectly diagnosed to date, taking into account innovative AI algorithms. In the medium term, a substantial improvement in the care of patients with advanced prostate carcinoma can therefore be expected. In addition, we will make the unique multimodal data set created in the project, including visualisation tools, available as open data to enable further studies to better understand prostate cancer, which could potentially lead to novel diagnostic and therapeutic approaches.
bwHPC-S5: Scientific Simulation and Storage Support Services Phase 3 - bwHPC-S5 Phase 3
Zusammen mit den gewonnenen Erkenntnissen, Bewertungen und Empfehlungen sollen die aktuellen Herausforderungen und definierten Handlungsfelder des Rahmenkonzepts der Universitäten des Landes Baden-Württemberg für das HPC und DIC im Zeitraum
2025 bis 2032 durch folgende Maßnahmen im Projekt konkretisiert werden:
• Weiterentwicklung der Wissenschaftsunterstützung bzgl. Kompetenzen zur Unterstützung neuartiger
System- und Methodekonzepte (KI, ML oder Quantencomputing), Vernetzung mit Methodenfor-
schung, ganzheitliche Bedarfsanalysen und Unterstützungsstrategien (z.B. Onboarding)
• Steigerung der Energieeffizienz durch Sensibilisierung sowie Untersuchung und Einsatz neuer Be-
triebsmodelle und Workflows inkl. optimierter Software
• Erprobung und flexible Integration neuer Systemkomponenten und -architekturen, Ressourcen (z.B.
Cloud) sowie Virtualisierung- und Containerisierungslösungen
• Umsetzung neue Software-Strategien (z.B. Nachhaltigkeit und Entwicklungsprozesse)
• Ausbau der Funktionalitäten der baden-württembergischen Datenföderation (z.B. Daten-Transfer-
Service)
• Umsetzung von Konzepten beim Umgang mit sensiblen Daten und zur Ausprägung einer digitalen
Souveränität
• Vernetzung und Kooperation mit anderen Forschungsinfrastrukturen
bwCloud 3
Enhancement of bwCloud infrastructure (locations FR, KA, MA, UL). Adding further bwServices on it. (Predecessor of project bwCloud SCOPE)
ICON-SmART
Contact: Dr. Jörg Meyer Funding: Hans-Ertel-Zentrum für Wetterforschung
“ICON-SmART” addresses the role of aerosols and atmospheric chemistry for the simulation of seasonal to decadal climate variability and change. To this end, the project will enhance the capabilities of the coupled composition, weather and climate modelling system ICON-ART (ICON, icosahedral nonhydrostatic model – developed by DWD, MPI-M and DKRZ with the atmospheric composition module ART, aerosols and reactive trace gases – developed by KIT) for seasonal to decadal predictions and climate projections in seamless global to regional model configurations with ICON-Seamless-ART (ICON-SmART).
Based on previous work, chemistry is a promising candidate for speed-up by machine learning. In addition, the project will explore machine learning approaches for other processes. The ICON-SmART model system will provide scientists, forecasters and policy-makers with a novel tool to investigate atmospheric composition in a changing climate and allows us to answer questions that have been previously out of reach.
Photonic materials with properties on demand designed with AI technology
This project uses artificial neural networks in an inverse design problem of finding nano-structured materials with optical properties on demand. Achieving this goal requires generating large amounts of data from 3D simulations of Maxwell's equations, which makes this a data-intensive computing problem. Tailored algorithms are being developed that address both the learning process and the efficient inversion. The project complements research in the SDL Materials Science on AI methods, large data sets generated by simulations, and workflows.
Artificial intelligence for the Simulation of Severe AccidentS - ASSAS
The ASSAS project aims at developing a proof-of-concept SA (severe accident) simulator based on ASTEC (Accident Source Term Evaluation Code). The prototype basic-principle simulator will model a simplified generic Western-type pressurized light water reactor (PWR). It will have a graphical user interface to control the simulation and visualize the results. It will run in real-time and even much faster for some phases of the accident. The prototype will be able to show the main phenomena occurring during a SA, including in-vessel and ex-vessel phases. It is meant to train students, nuclear energy professionals and non-specialists.
In addition to its direct use, the prototype will demonstrate the feasibility of developing different types of fast-running SA simulators, while keeping the accuracy of the underlying physical models. Thus, different computational solutions will be explored in parallel. Code optimisation and parallelisation will be implemented. Beside these reliable techniques, different machine-learning methods will be tested to develop fast surrogate models. This alternate path is riskier, but it could drastically enhance the performances of the code. A comprehensive review of ASTEC's structure and available algorithms will be performed to define the most relevant modelling strategies, which may include the replacement of specific calculations steps, entire modules of ASTEC or more global surrogate models. Solutions will be explored to extend the models developed for the PWR simulator to other reactor types and SA codes. The training data-base of SA sequences used for machine-learning will be made openly available. Developing an enhanced version of ASTEC and interfacing it with a commercial simulation environment will make it possible for the industry to develop engineering and full-scale simulators in the future. These can be used to design SA management guidelines, to develop new safety systems and to train operators to use them.
Data and services to support marine and freshwater scientists and stakeholders - AquaINFRA
The AquaINFRA project aims to develop a virtual environment equipped with FAIR multi-disciplinary data and services to support marine and freshwater scientists and stakeholders restoring healthy oceans, seas, coastal and inland waters. The AquaINFRA virtual environment will enable the target stakeholders to store, share, access, analyse and process research data and other research digital objects from their own discipline, across research infrastructures, disciplines and national borders leveraging on EOSC and the other existing operational dataspaces. Besides supporting the ongoing development of the EOSC as an overarching research infrastructure, AquaINFRA is addressing the specific need for enabling researchers from the marine and freshwater communities to work and collaborate across those two domains.
Implementation of an InfraStructure for dAta-BasEd Learning in environmental sciences - ISABEL
since 2022-12-01 - 2025-11-30
The amount and diversity of digitally available environmental data is continuously increasing. However, they are often hardly accessible or scientifically usable. The datasets frequently lack sufficient metadata description, are stored in a variety of data formats, and are still saved on local storage devices instead of data portals or repositories. Based on the virtual research environment V-FOR-WaTer, which was developed in a previous project, ISABEL aims at making this data abundance available in an easy-to-use web portal. Environmental scientists get access to data from different sources, e.g. state offices or university projects, and can share their own data through the portal. Integrated tools help to easily pre-process and scale the data and make them available in a consistent format. Further tools for more complex scientific analyses will be included. These are both implemented by the developers of the portal according to the requirements of the scientific community and contributed directly by the portal’s users. The possibility to store workflows together with the tools and respective data ensures reproducible data analysis. Additionally, interfaces with existing data repositories enable easy publication of the scientists’ data directly from the portal. ISABEL addresses the needs of researchers of hydrology and environmental science to not only find and access datasets but also conduct efficient data-based learning with standardised tools and reproducible workflows.
More
DAPHONA
Nano-optics deals with the optical properties of structures that are comparable to or smaller than the wavelength. All optical properties of a scatterer are determined by its T-matrix. Currently, these T-matrices are recalculated over and over again and are not used systematically. This wastes computing resources and does not allow novel questions to be addressed.
DAPHONA remedies this deficiency. The project provides technologies with which the geometric and material properties of an object and its optical properties are brought together in a data structure. This data is systematically used to extract the T-matrix for a given object. It should also be possible to identify objects with predefined optical properties. Using these approaches, the DAPHONA project will answer novel questions that can only be addressed using this data-driven approach.
The aim of the project is also to train young scientists at various qualification levels and to anchor the described approach in teaching. In addition, the data structure is to be coordinated within the specialist community. The data will be discussed in workshops and available methods for its use will be disseminated. The DAPHONA concept is open, based on the FAIR principles and will bring sustainable benefits to the entire community.
Skills for the European Open Science Commons: Creating a Training Ecosystem for Open and FAIR Science (Skills4EOSC)
Skills4EOSC brings together leading experts from national, regional, institutional and thematic open science and data competence centers from 18 European countries with the aim of unifying the current training and education landscape into a common cross-European ecosystem to train researchers and data specialists from Europe at an accelerated pace in the fields of FAIR open data, data-intensive science and scientific data management.
Scalable and efficient uncertainty quantification for AI-based time series forecasting - EQUIPE
since 2022-09-01 - 2025-08-31
The EQUIPE project deals with the quantification of uncertainties in large transformer models for time series prediction. Although the transformer architecture is able to achieve astonishingly high prediction accuracy, it requires immense amounts of computational resources. Common approaches to error estimation in neural networks are equally computationally intensive, which currently makes their use in transformers considerably more difficult.
The research work within EQUIPE aims to solve these problems and to develop scalable algorithms for quantifying uncertainties in large neural networks, which will enable the methods to be used in real-time systems in the future.
iMagine
iMagine is an EU-funded project that provides a portfolio of ‘free at the point of use’ image datasets, high-performance image analysis tools empowered with Artificial Intelligence (AI), and Best Practice documents for scientific image analysis. These services and materials enable better and more efficient processing and analysis of imaging data in marine and freshwater research, relevant to the overarching theme of ‘Healthy oceans, seas, coastal and inland waters’.
Artificial Intelligence for the European Open Science Cloud - AI4EOSC
since 2022-09-01 - 2025-08-31
Project page: ai4eosc.eu The AI4EOSC (Artificial Intelligence for the European Open Science Cloud) is an EU-funded project that delivers an enhanced set of advanced services for the development of AI/ML/DL models and applications in the European Open Science Cloud (EOSC). These services are bundled together into a comprehensive platform providing advanced features such as distributed, federated and split learning; novel provenance metadata for AI/ML/DL models; event-driven data processing services. The project builds on top of the DEEP-Hybrid-DataCloud outcomes and the EOSC compute platform.
Development and validation of a hybrid grid/particle method for turbulent flows supported by high performance computations with OpenFOAM - hGVtSOF
The main goal of the present project is the further development and validation of a new computational fluid dynamics (CFD) method using a combination of grid-free (particles) and grid-based techniques. A fundamental assumption of this novel approach is the decomposition of any physical quantity into the grid based (large scale) and the fine scale parts, whereas large scales are resolved on the grid and fine scales are represented by particles. Dynamics of large and fine scales is calculated from two coupled transport equations one of which is solved on the grid whereas the second one utilizes the Lagrangian grid free Vortex Particle Method (VPM).
InterTwin
InterTwin co-designs and implements the prototype of an interdisciplinary Digital Twin Engine (DTE), an open source platform that provides generic and tailored software components for modelling and simulation to integrate
application- specific Digital Twins (DTs). Its specifications and
implementation are based on a co-designed conceptual model - the DTE blueprint architecture - guided by the principles of open standards and interoperability. The ambition is to develop a common approach to the implementation of DTs that is applicable across the whole spectrum of scientific disciplines and beyond to facilitate developments and collaboration.
Gaia-X4 for the Innovations Campus Future Mobility - Gaia-X4ICM
The Gaia-X is an European initiative that aims to provide a secure and trustworthy platform for data sharing and collaboration across various industries and sectors in Europe. One of the main goals of the Gaia-X4ICM research initiative for the Innovations Campus Future Mobility (ICM) is to create the basic infrastructure with all necessary hardware and software components that play a significant role in connecting various sectors involved in an industrial production process. The SCC builds and runs such a cloud infrastructure that operates on our own hardware retaining control over the digital infrastructure and data – ensuring data sovereignty, privacy and security.
Materialized holiness - toRoll
In the project "Materialized Holiness" Torah scrolls are studied as an extraordinary codicological, theological and social phenomenon. Unlike, for example, copies of the Bible, the copying of sacred scrolls has been governed by strict regulations since antiquity and is complemented by a rich commentary literature.
Together with experts in Jewish studies, materials research, and the social sciences, we would like to build a digital repository of knowledge that does justice to the complexity of this research subject. Jewish scribal literature with English translations, material analyses, paleographic studies of medieval Torah scrolls, as well as interview and film material on scribes of the present day are to be brought together in a unique collection and examined in an interdisciplinary manner for the first time. In addition, a 'virtual Torah scroll' to be developed will reveal minute paleographic details of the script and its significance in cultural memory.
Metaphors of Religion - SFB1475
The SFB 1475, located at the Ruhr-Universität Bochum (RUB), aims to understand and methodically record the religious use of metaphors across times and cultures. To this end, the subprojects examine a variety of scriptures from Christianity, Islam, Judaism, Zoroastrianism, Jainism, Buddhism, and Daoism, originating from Europe, the Near and Middle East, as well as South, Central, and East Asia, and spanning the period from 3000 BC to the present. For the first time, comparative studies on a unique scale are made possible through this collaborative effort.
Within the Collaborative Research Center, the SCC, together with colleagues from the Center for the Study of Religions (CERES) and the RUB, is leading the information infrastructure project "Metaphor Base Camp", in which the digital data infrastructure for all subprojects is being developed. The central component will be a research data repository with state-of-the-art annotation, analysis and visualization tools for the humanities data.
Translated with www.DeepL.com/Translator (free version)
Helmholtz Platform for Research Software Engineering - Preparatory Study (HiRSE_PS)
The HiRSE concept sees the establishment of central activities in RSE and the targeted sustainable funding of strategically important codes by so-called Community Software Infrastructure (CSI) groups as mutually supportive aspects of a single entity.
Bayesian Machine Learning with Uncertainty Quantification for Detecting Weeds in Crop Lands from Low Altitude Remote Sensing
Weeds are one of the major contributors to crop yield loss. As a result, farmers deploy various approaches to manage and control weed growth in their agricultural fields, most common being chemical herbicides. However, the herbicides are often applied uniformly to the entire field, which has negative environmental and financial impacts. Site-specific weed management (SSWM) considers the variability in the field and localizes the treatment. Accurate localization of weeds is the first step for SSWM. Moreover, information on the prediction confidence is crucial to deploy methods in real-world applications. This project aims to develop methods for weed identification in croplands from low-altitude UAV remote sensing imagery and uncertainty quantification using Bayesian machine learning, in order to develop a holistic approach for SSWM. The project is supported by Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS) and co-supervised by Prof. Dr. Martin Herold from GFZ German Research Centre for Geosciences.
Sustainable concepts for the campus networks of universities in BW - bwCampusnetz
In the bwCampusnetz project, several universities in Baden-Württemberg are working together to shed light on university campus networks away from the constraints of day-to-day business, but still very close to the realities of operation. Future-proof concepts are to be found and their practical feasibility is to be investigated and demonstrated through prototypical implementations.
PUNCH4NFDI
PUNCH4NFDI is the NFDI consortium of particle, astro-, astroparticle, hadron and nuclear physics, representing about 9.000 scientists with a Ph.D. in Germany, from universities, the Max Planck society, the Leibniz Association, and the Helmholtz Association. PUNCH physics addresses the fundamental constituents of matter and their interactions, as well as their role for the development of the largest structures in the universe - stars and galaxies. The achievements of PUNCH science range from the discovery of the Higgs boson over the installation of a 1 cubic kilometer particle detector for neutrino detection in the antarctic ice to the detection of the quark-gluon plasma in heavy-ion collisions and the first picture ever of the black hole at the heart of the Milky Way.
The prime goal of PUNCH4NFDI is the setup of a federated and "FAIR" science data platform, offering the infrastructures and interfaces necessary for the access to and use of data and computing resources of the involved communities and beyond. The SCC plays a leading role in the development of the highly distributed Compute4PUNCH infrastructure and is involved in the activities around Storage4PUNCH a distributed storage infrastructure for the PUNCH communities.
NFDI-MatWerk
The NFDI-MatWerk consortium receives a five-year grant within the framework of the National Research Data Infrastructure (NFDI) for the development of a joint materials research data space. NFDI-MatWerk stands for Materials Science and Engineering to characterize the physical mechanisms in materials and develop resource-efficient high-performance materials with the most ideal properties for the respective application.
Data from scientific groups distributed across Germany are to be able to be addressed via a knowledge-graph-based infrastructure in such a way that fast and complex search queries and evaluations become possible.
At KIT, the Scientific Computing Center (SCC) and the Institute for Applied Materials (IAM) are involved. In the SCC, we will establish the Digital Materials Environment with the infrastructure services for the research data and their metadata together with the partners.
Simulated worlds
The Simulated Worlds project aims to provide students in Baden-Württemberg with a deeper critical understanding of the possibilities and limitations of computer simulations. The project is jointly supported by the Scientific Computing Center (SCC), the High Performance Computing Center Stuttgart (HLRS) and the University of Ulm and is already working with several schools in Baden-Württemberg.
Shallow priors and deep learning: The potential of Bayesian statistics as an agent for deep Gaussian mixture models
Despite significant overlap and synergy, machine learning and statistical science have developed largely in parallel. Deep Gaussian mixture models, a recently introduced model class in machine learning, are concerned with the unsupervised tasks of density estimation and high-dimensional clustering used for pattern recognition in many applied areas. In order to avoid over-parameterized solutions, dimension reduction by factor models can be applied at each layer of the architecture. However, the choice of architectures can be interpreted as a Bayesian model choice problem, meaning that every possible model satisfying the constraints is then fitted. The authors propose a much simpler approach: Only one large model needs to be trained and unnecessary components will empty out. The idea that parameters can be assigned prior distributions is highly unorthodox but extremely simple bringing together two sciences, namely machine learning and Bayesian statistics.
bwIDM-Security and Community
The outlined project bwIDM2 is dedicated to the increased demands on IT security and takes into account current technical developments. It creates the prerequisites for the integration of services across higher education institutions and establishes a group/role administration for supraregional and national communities with delegation mechanisms. In addition, specialist concepts for the integration of a long-term person identifier in bwIDM, as required for use in research data management, are being developed.
NFFA Europe Pilot - NEP
NEP provides important resources for nanoscientific research and develop new cooperative working methods. The use of innovative research data and metadata management technologies is becoming increasingly important. In the NEP project, the SCC contributes with new methods for metadata enrichment, development of large data collections, and the provision of virtual services to the establishment of a joint research data infrastructure.
Joint Lab VMD - JL-VMD
Within the Joint Lab VMD, the SDL Materials Science develops methods, tools and architectural concepts for supercomputing and big data infrastructures, which are tailored to tackle the specific application challenges and to facilitate the digitalization in materials research and the creation of digital twins. In particular, the Joint Lab develops a virtual research environment (VRE) that integrates computing and data storage resources in existing workflow managements systems and interactive environments for simulation and data analyses.
Joint Lab MDMC - JL-MDMC
Within the framework of the Joint Lab "Integrated Model and Data Driven Materials Characterization" (MDMC), the SDL Materials Science is developing a concept for a data and information platform to make data on materials available in a knowledge-oriented way as an experimental basis for digital twins and for the development of simulation-based methods for predicting material structure and properties. It defines a metadata model to describe samples and datasets from experimental measurements and harmonizes data models for material simulation and correlative characterization using materials science vocabularies and ontologies.
NFDI4Ing
NFDI4Ing is a consortium of engineering sciences and promotes the management of technical research data. NFDI4Ing was founded back in 2017 and is in close exchange with researchers from all engineering disciplines. The consortium offers a unique method-oriented and user-centered approach to make technical research data FAIR - discoverable, accessible, interoperable, and reusable. An important challenge here is the large number of sub-disciplines in engineering and their subject-specific peculiarities. KIT is involved with a co-spokesperson, Britta Nestler from the Institute for Applied Materials (IAM) and a co-spokesperson, Achim Streit from the Scientific Computing Center (SCC).
As part of NFDI4Ing, the SCC is developing and implementing the concepts for federated research data infrastructures, data management processes, repositories and metadata management in close cooperation with the partners. The NFDI4Ing application https://doi.org/10.5281/zenodo.4015200 describes the planned research data infrastructure in detail.
Translated with www.DeepL.com/Translator (free version)
NFDI4Chem Chemistry Consortium in the NFDI
The vision of NFDI4Chem is the digitization of all work processes in chemical research. To this end, infrastructure is to be established and expanded to support researchers in collecting, storing and archiving, processing and analyzing research data, as well as publishing the data in repositories together with descriptive metadata and DOIs, thus making them referencable and reusable. As a professional consortium, NFDI4Chem represents all disciplines of chemistry and works closely with the major professional societies to this end.
Translated with www.DeepL.com/Translator (free version)
Boosting copulas - multivariate distributional regression for digital medicine
Traditional regression models often provide an overly simplistic view on complex associations and relationships to contemporary data problems in the area of biomedicine. In particular, capturing relevant associations between multiple clinical endpoints correctly is of high relevance to avoid model misspecifications, which can lead tobiased results and even wrong or misleading conclusions and treatments. As such, methodological development of statistical methods tailored for such problems in biomedicine are of considerable interest. It is the aim of this project to develop novel conditional copula regression models for high-dimensional biomedical data structures by bringing together efficient statistical learning tools for high-dimensional data and established methods from economics for multivariate data structures that allow to capture complex dependence structuresbetween variables. These methods will allow us to model the entire joint distribution of multiple endpoints simultaneously and to automatically determine the relevant influential covariates and risk factors via algorithms originally proposed in the area of statistical and machine learning. The resulting models can thenbe used both for the interpretation and analysis of complex association-structures as well as for prediction inference (simultaneous prediction intervals for multiple endpoints). Additional implementation in open software and its application in various studies highlight the potentials of this project’s methodological developments in the area of digital medicine.
HAICORE
The Helmholtz AI COmpute REssources (HAICORE) infrastructure project was launched in early 2020 as part of the Helmholtz Incubator "Information & Data Science" to provide high-performance computing resources for artificial intelligence (AI) researchers in the Helmholtz Association. Technically, the AI hardware is operated as part of the high-performance computing systems JUWELS (Julich Supercomputing Centre) and HoreKa (KIT) at the two centers. The SCC primarily covers prototypical development operations in which new approaches, models and methods can be developed and tested. HAICORE is open to all members of the Helmholtz Association in the field of AI research.
Regression Models Beyond the Mean – A Bayesian Approach to Machine Learning
Recent progress in computer science has led to data structures of increasing size, detail and complexity in many scientific studies. In particular nowadays, where such big data applications do not only allow but also require more flexibility to overcome modelling restrictions that may result in model misspecification and biased inference, further insight in more accurate models and appropriate inferential methods is of enormous importance. This research group will therefore develop statistical tools for both univariate and multivariate regression models that are interpretable and that can be estimated extremely fast and accurate. Specifically, we aim to develop probabilistic approaches to recent innovations in machine learning in order to estimate models for huge data sets. To obtain more accurate regression models for the entire distribution we construct new distributional models that can be used for both univariate and multivariate responses. In all models we will address the issues of shrinkage and automatic variable selection to cope with a huge number of predictors, and the possibility to capture any type of covariate effect. This proposal also includes software development as well as applications in natural and social sciences (such as income distributions, marketing, weather forecasting, chronic diseases and others), highlighting its potential to successfully contribute to important facets in modern statistics and data science.
Helmholtz Metadata Collaboration Platform - HMC
With the Helmholtz Metadata Collaboration Platform, an important topic area of the Helmholtz Incubator "Information & Data Science" was launched at the end of 2019, bringing together the expertise of Helmholtz centers and shaping the topic of "Information & Data Science" across the boundaries of centers and research fields. The overarching goal of the platform is to advance the qualitative enrichment of research data through me-tadata in the long term, to support researchers - and to implement this in the Helmholtz Association and beyond.
With the work package FAIR Data Commons Technologies, SCC develops technologies and processes to make research data from the research fields of the Helmholtz Association and beyond available to researchers according to the FAIR principles. This is achieved on a technical level by providing uniform access to metadata using standardized interfaces that are based on recommendations and standards adopted by consensus within globally networked research data initiatives, e.g., the Research Data Alliance (RDA, https://www.rd-alliance.org/). For researchers, these interfaces are made usable through easy-to-use tools, generally applicable processes and recommendations for handling research data in everyday scientific life.
Helmholtz AI
The Helmholtz AI Platform is a research project of the Helmholtz Incubator "Information & Data Science". The overall mission of the platform is the "democratization of AI for a data-driven future" and aims at making AI algorithms and approaches available to a broad user group in an easy-to-use and resource-efficient way.
(Translated with DeepL.com)
More
RTG 2450 - GRK 2450 (DFG)
In the Research Training Group (RTG) "Tailored Scale-Bridging Approaches to Computational Nanoscience" we investigate problems, that are not tractable by computational chemistry standard tools. The research is organized in seven projects. Five projects address scientific challenges such as friction, materials aging, material design and biological function. In two further projects, new methods and tools in mathematics and computer science are developed and provided for the special requirements of these applications. The SCC is involved in projects P4. P5 and P6.
Helmholtz Federated IT Services - HIFIS
Helmholtz Federated IT Services (HIFIS) establishes a secure and easy-to-use collaborative environment with ICT services that are efficient and accessible from anywhere. HIFIS also supports the development of research software with a high level of quality, visibility and sustainability.
Computational and Mathematical Modeling Program - CAMMP
CAMMP stands for Computational and Mathematical Modeling Program. It is an extracurricular offer of KIT for students of different ages. We want to make the public aware of the social importance of mathematics and simulation sciences. For this purpose, students actively engage in problem solving with the help of mathematical modeling and computer use in various event formats together with teachers. In doing so, they explore real problems from everyday life, industry or research.