Projekte

Projects

Ongoing Projects

Structured explainability for interactions in deep learning models applied to pathogen phenotype prediction

Contact: Prof. Dr. Nadja Klein
Funding: DFG
since 2025-01-01 - 2028-12-31
Project page: gepris.dfg.de/gepris/projekt/498589566?language

Explaining and understanding the underlying interactions of genomic regions are crucial for proper pathogen phenotype characterization such as predicting the virulence of an organism or the resistance to drugs. Existing methods for classifying the underlying large-scale data of genome sequences face challenges with regard to explainability due to the high dimensionality of data, making it difficult to visualize, access and justify classification decisions. This is particularly the case in the presence of interactions, such as of genomic regions. To address these challenges, we will develop methods for variable selection and structured explainability that capture the interactions of important input variables: More specifically, we address these challenges (i) within a deep mixed models framework for binary outcomes fusing generalized linear mixed models and a deep variant of structured predictors. We thereby combine statistical logistic regression models with deep learning for disentangling complex interactions in genomic data. We particularly enable estimation when no explicitly formulated inputs are available for the models, as for instance relevant with genomics data. Further, (ii), we will extend methods for explainability of classification decisions such as layerwise relevance propagation to explain these interactions. Investigating these two complementary approaches on both the model and explainability levels, it is our main objective to formulate and postulate structured explanations that not only give first-order, single variable explanations of classification decisions, but also regard their interactions. While our methods are motivated by our genomic data, they can be useful and extended to other application areas in which interactions are of interest.

Probabilistic learning approaches for complex disease progression based on high-dimensional MRI data

Contact: Prof. Dr. Nadja Klein
Funding: DFG
since 2025-01-01 - 2028-12-31
Project page: https://gepris.dfg.de/gepris/projekt/498590773?language=en

This project proposes informed, data-driven methods to reveal pathological trajectories based on high-dimensional medical data obtained from magnetic resonance imaging (MRI), which are relevant as both inputs and outputs in regression equations to adequately perform early diagnosis and to model, understand, and predict actual and future disease progression. For this, we will fuse deep learning (DL) methods with Bayesian statistics to (1) accurately predict the complete outcome distributions of individual patients based on MRI data and further confounders and covariates (such as clinical or demographical variables) to adequately quantify uncertainty in predictions in contrast to point predictions not delivering any measures of confidence (2) model temporal dynamics in biomedical patient data. Regarding (1), we will develop deep distributional regression models for image inputs to accurately predict the entire distributions of the different disease scores (e.g. symptom severity), which can be multivariate and are typically highly non-normally distributed. Regarding (2), we will model the complex temporal evolution in neurological diseases by developing DL-based state-space models. Neither model is tailored to a specific disease, but both will be exemplary developed and tested for two neurological diseases, namely Alzheimer’s disease (AD) and multiple sclerosis (MS), chosen for their different disease progression profiles.

Distributional Copula Regression for Space-Time Data

Contact: Prof. Dr. Nadja Klein
Funding: DFG
since 2024-10-01 - 2028-06-30
Project page: gepris.dfg.de/gepris/projekt/544966988

Distributional copula regression for space-time data develops novel models for multivariate spatio-temporal data using distributional copula regression. Of particular interest are tests for the significance of predictors and automatic variable selection using Bayesian selection priors. In the long run, the project will consider computationally efficient modeling of non-stationary dependencies using stochastic partial differential equations.

DFG-priority program 2298 Theoretical Foundations of Deep Learning

since 2024-09-01 - 2027-08-31
Project page: https://www.foundationsofdl.de/

The goal of this project is to use deep neural networks as building blocks in a numerical method to solve the Boltzmann equation. This is a particularly challenging problem since the equation is a high-dimensional integro-differential equation, which at the same time possesses an intricate structure that a numerical method needs to preserve. Thus, artificial neural networks might be beneficial, but cannot be used out-of-the-box. We follow two main strategies to develop structure-preserving neural network-enhanced numerical methods for the Boltzmann equation. First, we target the moment approach, where a structure-preserving neural network will be employed to model the minimal entropy closure of the moment system. By enforcing convexity of the neural network, one can show, that the intrinsic structure of the moment system, such as hyperbolicity, entropy dissipation and positivity is preserved. Second, we develop a neural network approach to solve the Boltzmann equation directly at discrete particle velocity level. Here, a neural network is employed to model the difference between the full non-linear collision operator of the Boltzmann equation and the BGK model, which preserves the entropy dissipation principle. Furthermore, we will develop strategies to generate training data which fully sample the input space of the respective neural networks to ensure proper functioning models.

Bayesian Machine Learning with Uncertainty Quantification for Detecting Weeds in Crop Lands from Low Altitude Remote Sensing

Contact: Prof. Dr. Nadja Klein
Funding: Helmholtz-Gemeinschaft
since 2022-01-01
Project page: www.heibrids.berlin/people/doctoral-students/

Weeds are one of the major contributors to crop yield loss. As a result, farmers deploy various approaches to manage and control weed growth in their agricultural fields, most common being chemical herbicides. However, the herbicides are often applied uniformly to the entire field, which has negative environmental and financial impacts. Site-specific weed management (SSWM) considers the variability in the field and localizes the treatment. Accurate localization of weeds is the first step for SSWM. Moreover, information on the prediction confidence is crucial to deploy methods in real-world applications. This project aims to develop methods for weed identification in croplands from low-altitude UAV remote sensing imagery and uncertainty quantification using Bayesian machine learning, in order to develop a holistic approach for SSWM. The project is supported by Helmholtz Einstein International Berlin Research School in Data Science (HEIBRiDS) and co-supervised by Prof. Dr. Martin Herold from GFZ German Research Centre for Geosciences.

Simulated worlds

Contact: Dr. Jasmin Hörter, Dr. Katharina Bata
Funding: MWK
since 2021-09-01 - 2025-03-31
Project page: simulierte-welten.de

The Simulated Worlds project aims to provide students in Baden-Württemberg with a deeper critical understanding of the possibilities and limitations of computer simulations. The project is jointly supported by the Scientific Computing Center (SCC), the High Performance Computing Center Stuttgart (HLRS) and the University of Ulm and is already working with several schools in Baden-Württemberg.

Shallow priors and deep learning: The potential of Bayesian statistics as an agent for deep Gaussian mixture models

Contact: Prof. Dr. Nadja Klein
Funding: Volkswagenstiftung

Despite significant overlap and synergy, machine learning and statistical science have developed largely in parallel. Deep Gaussian mixture models, a recently introduced model class in machine learning, are concerned with the unsupervised tasks of density estimation and high-dimensional clustering used for pattern recognition in many applied areas. In order to avoid over-parameterized solutions, dimension reduction by factor models can be applied at each layer of the architecture. However, the choice of architectures can be interpreted as a Bayesian model choice problem, meaning that every possible model satisfying the constraints is then fitted. The authors propose a much simpler approach: Only one large model needs to be trained and unnecessary components will empty out. The idea that parameters can be assigned prior distributions is highly unorthodox but extremely simple bringing together two sciences, namely machine learning and Bayesian statistics.

Boosting copulas - multivariate distributional regression for digital medicine

Contact: Prof. Dr. Nadja Klein
Funding: DFG
since 2020-09-01 - 2025-03-31
Project page: gepris.dfg.de/gepris/projekt/428239776

Traditional regression models often provide an overly simplistic view on complex associations and relationships to contemporary data problems in the area of biomedicine. In particular, capturing relevant associations between multiple clinical endpoints correctly is of high relevance to avoid model misspecifications, which can lead tobiased results and even wrong or misleading conclusions and treatments. As such, methodological development of statistical methods tailored for such problems in biomedicine are of considerable interest. It is the aim of this project to develop novel conditional copula regression models for high-dimensional biomedical data structures by bringing together efficient statistical learning tools for high-dimensional data and established methods from economics for multivariate data structures that allow to capture complex dependence structuresbetween variables. These methods will allow us to model the entire joint distribution of multiple endpoints simultaneously and to automatically determine the relevant influential covariates and risk factors via algorithms originally proposed in the area of statistical and machine learning. The resulting models can thenbe used both for the interpretation and analysis of complex association-structures as well as for prediction inference (simultaneous prediction intervals for multiple endpoints). Additional implementation in open software and its application in various studies highlight the potentials of this project’s methodological developments in the area of digital medicine.

Regression Models Beyond the Mean – A Bayesian Approach to Machine Learning

Contact: Prof. Dr. Nadja Klein
Funding: DFG

Recent progress in computer science has led to data structures of increasing size, detail and complexity in many scientific studies. In particular nowadays, where such big data applications do not only allow but also require more flexibility to overcome modelling restrictions that may result in model misspecification and biased inference, further insight in more accurate models and appropriate inferential methods is of enormous importance. This research group will therefore develop statistical tools for both univariate and multivariate regression models that are interpretable and that can be estimated extremely fast and accurate. Specifically, we aim to develop probabilistic approaches to recent innovations in machine learning in order to estimate models for huge data sets. To obtain more accurate regression models for the entire distribution we construct new distributional models that can be used for both univariate and multivariate responses. In all models we will address the issues of shrinkage and automatic variable selection to cope with a huge number of predictors, and the possibility to capture any type of covariate effect. This proposal also includes software development as well as applications in natural and social sciences (such as income distributions, marketing, weather forecasting, chronic diseases and others), highlighting its potential to successfully contribute to important facets in modern statistics and data science.

RTG 2450 - GRK 2450 (DFG)

since 2019-04-01 - 2028-03-31
Project page: www.compnano.kit.edu

In the Research Training Group (RTG) "Tailored Scale-Bridging Approaches to Computational Nanoscience" we investigate problems, that are not tractable by computational chemistry standard tools. The research is organized in seven projects. Five projects address scientific challenges such as friction, materials aging, material design and biological function. In two further projects, new methods and tools in mathematics and computer science are developed and provided for the special requirements of these applications. The SCC is involved in projects P4. P5 and P6.

Computational and Mathematical Modeling Program - CAMMP

since 2015-01-01
Project page: forschung/CAMMP

CAMMP stands for Computational and Mathematical Modeling Program. It is an extracurricular offer of KIT for students of different ages. We want to make the public aware of the social importance of mathematics and simulation sciences. For this purpose, students actively engage in problem solving with the help of mathematical modeling and computer use in various event formats together with teachers. In doing so, they explore real problems from everyday life, industry or research.

Finished Projects

i2Batman - i2batman

Contact: Prof. Dr. Martin Frank
Funding: Helmholtz-Gemeinschaft
since 2020-08-01 - 2023-07-31

Together with partners at Forschungszentrum Jülich and Fritz Haber Institute Berlin, our goal is to develop a novel intelligent management system for electric batteries that can make better decisions about battery charging cycles based on a detailed surrogate model ("digital twin") of the battery and artificial intelligence.