DESCARTES PhD

PhD positions to be filled

Contact descartes-hiring@cnrsatcreate.sg. Please add your CV and the topics you are interested in in your email.

Interactive Explainability for Hybrid AI

Modern AI systems appear like black boxes which are hard to understand.

The aim of the PhD is to develop deductive reasoning-based explanations for the behaviors of the AI model predictions, as well as for the violations of key properties. We will study interactive mechanisms for producing explanations while considering the cognitive load of understanding the explanations by human users. A particular focus can be on the NLP side to understand the user purpose, or on the visualization part to represent the explanations in an intuitive way.

Hybridization for regular and safe Deep Learning

Deep Learning has revolutionized Learning, bringing extremely efficient solutions for tasks which seemed out of reached a decade ago. However, due to several shortcomings (brittleness to perturbations…), it is unsafe, in particular for use in critical systems. The aim of this PhD is to develop new hybrid methods, leveraging on deep learning but producing intrinsically regular AI systems that are by construction safer to use. A starting point would be to break down Neural NEtworks into several components (coding, decoding, latent/features space…), and considering which part component can be reliably improved.

Building Self-Healing Autonomous Systems via NeuroSymbolic Reasoning

Intelligent systems can be flexible in terms of behaviour, based on its inference of the intent of the human agent (or physical environment) it is interacting with. This makes the problem of checking intelligent systems particularly challenging. We want to provide formal guarantees about the intelligent systems, considering all of its possible behaviours. In doing so, combining formal reasoning with learning techniques is known as neuro-symbolic reasoning. Moving beyond verification, it would be critical for autonomous systems to employ self-healing, subject to adaptations in the environment which includes humans. This would make the hybrid AI model more adaptable and resilient. The self-healing process may often take more of a role of program/system hardening, to adapt to changes in the learning component of the software. This provides one concrete set-up for neuro-symbolic reasoning, where deductive reasoning is used to complement the data overfitting in the inductive reasoning inherent in the AI model synthesis. Another concrete set-up involves deductive reasoning of a conventional (physics based) model or a software system, which gets complemented by inductive reasoning when the deductive reasoning runs into performance bottlenecks.

Regular Strategies through Hybrid Evolutionnary algorithms and Deep Reinforcement Learning

Intelligent Control is a difficult task, for which Deep Reinforcement Learning brings efficient solutions. The strategies produced in practice are however noisy, in the form of large look up tables which are very hard to understand, with no guarantees on their behaviours. The aim of this PhD is to hybridize Deep Reinforcement Learning with Evolutionnary algorithms in order to produce policies that are regular: this means less Noisy, more explanable, and on which strong guarantees can be obtained.

Data Quality, Provenance, and Uncertainty Management

In this PhD project, we focus on data quality, uncertainty, and lineage models in databases. We aim to advance the state-of-the-art by addressing the design of data models and query languages that are able to represent and process data with varying quality, for instance the computation of possible and certain answers over incomplete and uncertain data or the. management of provenance information.

Keywords: data quality, data provenance, semi-ring provenance, probabilistic databases, probabilistic query processing, knowledge compilation, sampling and optimization, data analysis, machine learning.

Christoffel-Darboux kernels with applications in deep learning explainability

Lasserre’s Hierarchy is a generic tool which can be used to solve global polynomial optimization problems under polynomial positivity constraints. The general idea is to reformulate the initial problem as an optimization problem over probability measures. Recent research investigated the ability of Christoffel-Darboux kernels to capture information about the support of an unknown probability measure. A distinguishing feature of this approach is to allow one to infer support characteristics, based on the knowledge of finitely many moments of the underlying measure. The first investigation track will consist of analyzing the last layer of an existing classification network with the Christoffel-Darboux kernels. A more theoretical goal will be the study Christoffel-Darboux kernels to extend the approach from [8] for measures supported on specific classes of mathematical varieties. In a further step, we intend to apply this framework to deep learning network models, for which latent representation correspond to such low-dimensional varieties. Numerical experiments will be performed on several benchmark suites, including MNIST, CIFAR10 or fashion MNIST.

Theoretical Studies on Deep Learning and Machine Learning optimisation

Deep Learning and Machine Learning methodologies has been proven to be very effective. Optimization via gradient descend plays a central part for the success of Deep Learning and Machine Learning. However there remains major theoretically unresolved issues in the optimization of these large AI models. In this PhD programme, we aim to explore these issues using alternative optimization methods such as statistical physics based entropic sampling methods.

Coupling model reduction and Koopman operator for faster than real-time simulation of complex parametrized dynamical systems

In the PhD, we deal with large parametrized dynamical systems that are models describing physical systems (or systems of systems) such as those encountered in smart city. We aim at developing an effective strategy to construct and simulate such complex dynamical systems, for fast predictions. This has to be performed in an offline stage, from all available engineering knowledge (coming from physics-based models and/or from stored sensing data). We wish to address this challenge by merging/coupling reduction techniques (such as POD or PGD) on physics-based models and the data-driven Koopman operator that permits to design and manage complex dynamical systems (without knowing the underlying physics equations). In particular, we plan to build a hybrid twin in which the Koopman operator acts on a correction part of the dynamical system, complementary to the description potentially provided by a given physics-based model. We also plan to tailor the Koopman operator by using physics-based basis functions coming from model reduction.

Quantitative accuracy assessment (certification) of the resulting parametrized hybrid model will be performed, for prognosis and control purposes. Adaptive modelling will also be addressed to compute right at the right cost. The final objective of the PhD is to assess performance and validate the proposed approach on targeted applications of the Descartes programme (digital energy, monitoring of structures, air traffic control).

Strong collaborations with other researchers in the DesCartes project, working on topics of interest (learning from Koopman operator, control synthesis, etc.), will be conducted during the PhD.

Optimization and machine learning with application to signal and image processing

The main objective of WP3 is to support the whole Descartes program in order to develop advanced optimization-based solutions in the context of hybrid AI. Any AI system or machine learning algorithm ultimately involves a formulation with an objective or loss function to be minimized. The modelling of the problem as well as the chosen objective function optimization algorithm is crucial to the success of the overall AI task. This is all the more crucial in the context of hybrid AI, which seeks to integrate physics-inspired models with machine learning algorithms. We will address this problem from two complementary angles, namely optimization-based methods and machine learning-based methods.

WP3 is looking for a PhD in optimization and machine learning with application to signal and image processing. The objective is to tackle inverse problems by proposing new approaches intertwining optimization and machine learning.

Human-in-the-loop Machine Learning and Re-Learning In the Presence of Out-of-Distribution and Noisy Human Inputs

In this thesis, we will examine human-in-the-loop machine learning systems that combine expertise from humans with data-driven machine learning approaches in making critical decisions. Human experts can influence machine learning systems in several ways, including in the labeling of data (active learning) and deferral to experts. We are interested in addressing several open challenges in this thesis: (i) how to make the machine learning system more robust to noisy human inputs (e.g., data poisoning attack)? (ii) how to make the machine learning system learn better from inputs from human experts? (iii) how can machine learning re-learn efficiently upon discovering incorrect inputs from experts?

Hate speech detection

Hate Speech (HS) and harassment are particularly widespread in online communication, especially due to users’ freedom and anonymity and the lack of regulation provided by social media platforms. This phenomenon has determined a growing interest in using artificial intelligence and Natural Language Processing techniques to address social and ethical issues. An extensive body of work has been proposed to automatically detect HS relying on a variety of deep learning methods (Founta and Nunes, 2018; Schmidt and Wiegand, 2017). Most research focus on HS as expressed in texts without taking into account the contexts in which they have been uttered. This PhD aims to bridge the gap by investigating for the first time how HS are expressed and detected in multi-party dialogues. We will propose new dialogue datasets for HS detection as well as new context-based deep learning methods that leverage the conversation thread to account for hateful contents and how they evolve as the dialogue proceeds.

Smart Data for Hybrid Twin Digital Energy Systems

In this thesis, we will develop a methodology to prepare data according to physical constraints, and to adjust physical models to real situations captured by data. The idea will be to combining imperfect data from models, from observations distributed in time and space, exploiting any relevant physical constraints, to produce a more accurate and comprehensive picture of the system as it evolves in time.

A key ingredient will be data assimilation tools and Bayesian framework to integrate physical constraints, statistical correlations and confidence metrics. Data assimilation will help to resolve dynamic state estimation, parameter adjustment, and unknown inputs identification, while Bayesian framework will allow to manage uncertainties. The self-consistent learning process renews the prior distribution of correlated features jointly with the estimation of the environment with stochastic Bayesian estimation (SBE).

Different time scale (dynamics and time horizon) can be identified in order to deal with all decisions in a complex critical system such as energy grids. These objectives place constraints on the models and data that must be managed accordingly.

In this thesis, a methodology will be developed through applied mathematics tools to real world constraints, in order to make real time dynamic estimation that is robust against out of distribution data and noise. Indeed, we will develop robust estimation methods that perform well over a wide range of noisy scenarios with outliers.

Sensitivity analysis will help us to define main uncertainties that have to be considered. For instance, human behavior is one of the main source of uncertainties, leading us to drive our research to unknown inputs identification of energy system usage.

We will also address engineering challenges such as model size, model errors, limited data availability, and statistical performance issues for a real-world application. In order to implement effective controls in real time for a real-world system, states need to be estimated for a realistic-size system in real time with real computation hardware

Robust and Predictable Out-of-Distribution (OOD) Data Detection and Reasoning for Safety-Critical Systems

PhD Objectives:

Hybrid-AI based OOD detection and reasoning: in this task we plan to explore multi-modal inputs (e.g., radar and lidar together with video streams) and physics-based system models to augment the OOD detection and reasoning capabilities of DNN based solutions. In particular, focusing on the generative model based approaches (e.g., VAEs and their variants), we aim to develop an integrated Hybrid-AI technique that uses the physics-based models parameterized with low dimensional data (e.g., expected trajectory for objects obtained with radar/lidar data) to improve the generalizability and robustness of the OOD decisions.
Probabilistic quantification of OOD uncertainty (Probably Approximately Correct – PAC): In this research activity, we plan to explore this direction using black-box (deriving guarantees based only on DNN inputs/outputs) and grey-box (additionally using DNN intermediate layer statistics to derive guarantees) techniques. The objective would be to derive a relatively simple surrogate model using scenario optimization to characterize the uncertainty in the DNN based OOD detection and reasoning task.

Hybrid AI in advanced engineering systems

TBD

This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.

CREATE is an international collaboratory housing research centres set up by top universities. At CREATE, researchers from diverse disciplines and backgrounds work closely together to perform cutting-edge research in strategic areas of interest, for translation into practical applications leading to positive economic and societal outcomes for Singapore. The interdisciplinary research centres at CREATE focus on four areas of interdisciplinary thematic areas of research, namely human systems, energy systems, environmental systems and urban systems. More information on the CREATE programme can be obtained from www.create.edu.sg.

Visit the CNRS website here.