Statistical Machine Learning and Modelling of Biological Systems

J-P. Vert

Jean-Philippe Vert Team Leader Tel:

An increasing number of new technologies enable the study of living organisms on a hitherto unexplored scale. For example, next-generation sequencing can be used to read the complete genetic information of a biological sample, DNA chips directly measure the activity of thousands of genes simultaneously, mass spectrometry characterizes proteins expressed in a tissue, high-resolution imaging tracks changes in cell cultures, and high-throughput screening characterizes the biological activity of a large number of molecules.

These technologies all generate huge amounts of raw data, which are often difficult to comprehend directly. To exploit these masses of data more effectively, and notably to extract from them relevant biological and medical information for predictive and precision medicine, our team is developing mathematical methods and innovative algorithms, based on our extensive expertise in mathematical modelling, statistics, machine learning, bioimage informatics and and structural biology.

We are developing new tools and methods to examine specific questions of medical or biological interest, notably:

– In silico basic and systems biology: We develop innovative approaches to reverse engineer biological networks from omics data, model tumor progression at the genomic, transcriptomic and epigenetic level, automatically annotate new proteins and functional elements through integration of complex and heterogeneous data, including data obtained from high-throughput sequencing or time-lapse video-microscopy.

– Towards predictive and precision medicine: We develop tools to classify tumors and identify biomarkers for diagnosis, prognosis, and prediction of drug response. These classifications are based on large amounts of data inclusing clinical data, somatic mutations, gene and alternative transcript expression, or structural DNA modification, and involve high-dimensional statistical machine learning techniques.

– Drug design: we develop new virtual screening and chemoinformatics methods. This can help identify new molecules likely to inhibit certain therapeutic targets and to lead to novel drug candidates. These methods make use of modelling of the 3D structures of proteins and their ligands, and of original statistical approaches to the increasing amount of data on structures and interactions (Figure 1). We also develop in silico chemogenomic approaches to analyze jointly the chemical space of small molecules and biological space of protein targets, leading in particular to efficacy profile and secondary effect prediction.

Figure 1 : This image shows the 3D surface of interaction between a membrane protein and a small molecule. The modeling and statistical study of these interfaces enable us to predict which molecules are likely to interact with proteins of therapeutic interest, notably membrane receptors, so as to propose new leads in the search for drugs.
Figure 1 : This image shows the 3D surface of interaction between a membrane protein and a small molecule. The modeling and statistical study of these interfaces enable us to predict which molecules are likely to interact with proteins of therapeutic interest, notably membrane receptors, so as to propose new leads in the search for drugs.

Key publications

Year of publication 2014

Elsa Bernard, Laurent Jacob, Julien Mairal, Jean-Philippe Vert (2014 May 9)

Efficient RNA isoform identification and quantification from RNA-Seq data with network flows.

Bioinformatics (Oxford, England) : 2447-55 : DOI : 10.1093/bioinformatics/btu317
Ferhat Ay, Evelien M Bunnik, Nelle Varoquaux, Sebastiaan M Bol, Jacques Prudhomme, Jean-Philippe Vert, William Stafford Noble, Karine G Le Roch (2014 Mar 26)

Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression.

Genome research : 974-88 : DOI : 10.1101/gr.169417.113

Year of publication 2013

Veronika Graml, Xenia Studera, Jonathan L D Lawson, Anatole Chessel, Marco Geymonat, Miriam Bortfeld-Miller, Thomas Walter, Laura Wagstaff, Eugenia Piddini, Rafael E Carazo-Salas (2013 Nov 2)

A genomic Multiprocess survey of machineries that control and link cell shape, microtubule organization, and cell-cycle progression.

Developmental cell : 227-39 : DOI : 10.1016/j.devcel.2014.09.005
James C Costello, Laura M Heiser, Elisabeth Georgii, Mehmet Gönen, Michael P Menden, Nicholas J Wang, Mukesh Bansal, Muhammad Ammad-ud-din, Petteri Hintsanen, Suleiman A Khan, John-Patrick Mpindi, Olli Kallioniemi, Antti Honkela, Tero Aittokallio, Krister Wennerberg, , James J Collins, Dan Gallahan, Dinah Singer, Julio Saez-Rodriguez, Samuel Kaski, Joe W Gray, Gustavo Stolovitzky (2013 Jul 20)

A community effort to assess and improve drug sensitivity prediction algorithms.

Nature biotechnology : 1202-12 : DOI : 10.1038/nbt.2877

Year of publication 2012

Rosa M Suárez, Franciane Chevot, Andrea Cavagnino, Nicolas Saettel, François Radvanyi, Sandrine Piguel, Isabelle Bernard-Pierrot, Véronique Stoven, Michel Legraverend (2012 Apr 19)

Inhibitors of the TAM subfamily of tyrosine kinases: synthesis and biological evaluation.

European journal of medicinal chemistry : 2-25 : DOI : 10.1016/j.ejmech.2012.06.005
All publications