Statistical Machine Learning and Modelling of Biological Systems

Capture

Thomas Walter Team Leader

An increasing number of new technologies enable the study of living organisms on a hitherto unexplored scale. For example, next-generation sequencing can be used to read the complete genetic information of a biological sample, DNA chips directly measure the activity of thousands of genes simultaneously, mass spectrometry characterizes proteins expressed in a tissue, high-resolution imaging tracks changes in cell cultures, and high-throughput screening characterizes the biological activity of a large number of molecules. In addition, electronic health records contain large amount of text data, images, or biological time series that describe the dynamics of patient diagnoses and response to treatment.

These technologies all generate huge amounts of raw data, which are often difficult to comprehend directly. To exploit these massive amounts of data more effectively, and notably to extract from them relevant biological and medical information for predictive and precision medicine, our team is developing mathematical methods and innovative algorithms, based on our extensive expertise in mathematical modelling, statistics, machine learning, bioimage informatics and and structural biology.

We are developing new tools and methods to examine specific questions of medical or biological interest, notably:

  • In silico basic and systems biology: We develop innovative approaches to reverse engineer biological networks from omics data, model tumor progression at the genomic, transcriptomic and epigenetic level, automatically annotate new proteins and functional elements through integration of complex and heterogeneous data, including data obtained from high-throughput sequencing or time-lapse video-microscopy.
  • Towards predictive and precision medicine: We develop tools to classify tumors and identify biomarkers for diagnosis, prognosis, and prediction of drug response. These classifications are based on large amounts of data including clinical data, somatic mutations, gene and alternative transcript expression, or structural DNA modification, and involve high-dimensional statistical machine learning techniques.
  • Drug design: we develop new virtual screening and chemoinformatics methods. This can help identify new molecules likely to inhibit certain therapeutic targets and to lead to novel drug candidates. We make use of sequence-based, graph-based, and 3D representation of proteins and their ligands, and develop in silico chemogenomic approaches to analyze jointly the chemical space of small molecules and biological space of protein targets, leading in particular to the prediction of secondary targets, efficacy profiles, and adverse effects.
Data, machine learning, applications.
Machine learning makes use of diverse data types for applications to bioimage informatics, systems biology, drug design or precision medicine.

Key publications

Year of publication 2020

Racha Chouaib, Adham Safieddine, Xavier Pichon, Arthur Imbert, Oh Sung Kwon, Aubin Samacoits, Abdel-Meneem Traboulsi, Marie-Cécile Robert, Nikolay Tsanov, Emeline Coleno, Ina Poser, Christophe Zimmer, Anthony Hyman, Hervé Le Hir, Kazem Zibara, Marion Peter, Florian Mueller, Thomas Walter, Edouard Bertrand (2020 Aug 14)

A Dual Protein-mRNA Localization Screen Reveals Compartmentalized Translation and Widespread Co-translational RNA Targeting.

Developmental cell : 773-791.e5 : DOI : S1534-5807(20)30584-0

Year of publication 2019

Collier Olivier, Stoven Véronique, Vert Jean-Philippe (2019 Sep 25)

A Single- and Multitask Machine Learning Algorithm for the Prediction of Cancer Driver Genes

Plos Computational Biology
Slim L., Chatelain C., Azencott C.A., Vert J.P. (2019 Jun 1)

kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection

International Conference on Machine LearningInternational Conference on Machine Learning : 5857-5865
Mélanie Durand, Thomas Walter, Tiphène Pirnay, Thomas Naessens, Paul Gueguen, Christel Goudot, Sonia Lameiras, Qing Chang, Nafiseh Talaei, Olga Ornatsky, Tatiana Vassilevskaia, Sylvain Baulande, Sebastian Amigorena, Elodie Segura (2019 May 11)

Human lymphoid organ cDC2 and macrophages play complementary roles in T follicular helper responses.

The Journal of experimental medicine : DOI : jem.20181994
Peter Naylor, Marick Lae, Fabien Reyal, Thomas Walter (2019 Feb 5)

Segmentation of Nuclei in Histopathology Images by Deep Regression of the Distance Map.

IEEE Transactions on Medical Imaging : 448-459 : DOI : 10.1109/TMI.2018.2865709

Year of publication 2018

Aubin Samacoits, Racha Chouaib, Adham Safieddine, Abdel-Meneem Traboulsi, Wei Ouyang, Christophe Zimmer, Marion Peter, Edouard Bertrand, Thomas Walter, Florian Mueller (2018 Nov 4)

A computational framework to study sub-cellular RNA localization.

Nature Communications : 4584 : DOI : 10.1038/s41467-018-06868-w
All publications