Moving towards predictive models for diabetes patients
Date:
Changed on 15/09/2025
What can we learn from data obtained from the monitoring of diabetics to improve the care of these patients? To answer this question, Louis Potier, a doctor in the Diabetology and Endocrinology Department of the Bichat - Claude-Bernard Hospital, called upon the Inria teams in 2021. The aim was to exploit data from the APHP Health Data Warehouse (EDS) which was created in July 2017 and stores data on more than 19 million patients having received treatment at the 38 APHP hospitals. “During the Covid crisis, I had set up a project to exploit this immense data source in order to assess the impact of the virus among diabetics. Quite naturally, I wanted to go further and explore the possibility of improving diagnostic tools for those patients most at risk of complications”, the specialist recalls. That is how the TRADIAB project came about (Complication Risk Trajectory and Therapeutic Responses in Type 2 Diabetes), involving Professor Potier’s teams, the SODA project team of the Inria Saclay centre, and the Pharmacoepidemiology Centre of La Pitié-Salpêtrière Hospital.
There is much at stake because diabetes is the most common cause of acquired disability in adults. Indeed, this chronic disease can lead to multiple complications - ophthalmological, renal, cardiovascular or podiatric - which can have severe consequences such as blindness or amputations.
Image
Verbatim
Although we know the main determining risk factors - unbalanced diet, sedentary lifestyle, age - we still need to fine-tune our ability to identify patients who are the most likely to be affected.
Auteur
Poste
Doctor in the Diabetes and Endocrinology Department at Bichat-Claude Bernard Hospital
As a result, the researchers have formed the CODIA cohort from the EDS patient records of some 1.3 million identified diabetics. These records contain all information relating to their care: sociodemographic data, blood test results, medical reports etc.... A gold mine that makes it possible to establish statistical links between certain patient characteristics and the progression of their illness.
The problem is the heterogeneous nature of this raw data.
Image
Verbatim
Between unevenly filled-in medical reports, prescriptions varying from one doctor to another, missing or erroneous blood test dates etc., a huge amount of work was required to model the missing information so that the data could be exploited.
Auteur
Poste
Chercheuse Inria au sein de l’équipe-projet SODA
This is where the Soda team’s expertise in data science, and machine learning in particular, which are the technologies behind the recent AI explosion, takes on a whole new meaning. “To process gigantic volumes of data, these techniques work extremely well compared to other, standard statistical methods which are more suited to small samples”, the researcher points out. The team also contributed its skills in natural language processing (NLP), essential for extracting information from handwritten notes.
The SODA team does research at the intersection between machine-learning, databases, and quantitative social sciences (eg empirical economy, epidemiology…).
Researchers develop computational data processing tools to generate insights and predictions from today's large databases that characterize populations. They contribute tools of statistical machine learning to answer data-science problems, typical on relational data. The main applications are in health and education.
Verbatim
In this project, our skills complement one another perfectly: Inria provided the data science expertise, and the diabetologists their knowledge of the disease and medically pertinent questions on the topic. We have added our own expertise in the design of an epidemiological health study, in particular in terms of the biases to be avoided and the particularities linked to the use of data from health data warehouses.
Auteur
Poste
Head of the Public Health Department at La Pitié Salpêtrière Hospital and of the APHP Pharmacoepidemiology Centre
The first two studies have confirmed on a large scale the results previously suggested by clinical studies based on a more limited population sample. The replication of results that are already known is indeed a key factor in validating the quality of the CODIA cohort and thus enabling studies to be carried out on more innovative topics. The first study characterises the outcome of patients affected by diabetic foot ulcers (DFUs) - a severe complication of the disease. “It reveals higher rates of death and amputation in patients hospitalised for a first DFU than in other cohorts, as well as the already known risk factors, and it also highlights the possible role of inflammation”, summarises Judith Abécassis. This first study has just been published in the journal Diabetes and Metabolism. The second study, on the monitoring of two well-known treatments (including Ozempic (semaglitude)), confirms their benefits and limitations in controlling weight and blood sugar levels. “The advantage of these studies is that they are based on ‘real-life’ data that is representative of the diversity of patients that we treat, whereas clinical studies are based on a carefully selected sample of the population. This can lead to biases, such as a lack of patients over the age of 70, or an overrepresentation of Caucasians”, emphasises Louis Potier.
The third study, conducted as part of a European project coordinated by Nicolas Venteclef (INEM) and named Intercept-T2D, is more groundbreaking in its conclusions. It establishes a link between certain ‘pro-inflammatory’ profiles, i.e. patients in whom immune system activation is higher than average, and the risk of developing complications, particularly cardiovascular ones. This link had already been suggested by previous studies, but “we revealed more specifically that certain patients with a high level of monocytes, a particular type of white blood cell that is a macrophage precursor, were especially at risk”, explains Judith Abécassis. This paves the way for an early diagnosis aid for this patient sub-group. The partners are now preparing a second study in which they will propose a new screening score - a medical decision aid tool that combines the different risk factors in a single value. “Compared to existing scores, this one will attach greater importance to inflammatory markers, which are actually easy to measure via a simple blood test already carried out as a matter of routine”, states Louis Pottier. The aim is to test this new score on the CODIA cohort. If it proves capable of providing better complication predictions, this will lead doctors to adjust their medical prescriptions: different drugs or doses, closer cardiovascular monitoring, etc.
“We also plan to go a step further in the analysis of inflammatory profiles, by taking into account a wider range of immunity markers and studying their impacts on the progression of the disease”, adds Judith Abécassis. “Past and future studies of the CODIA cohort represent a major breakthrough for diabetologists and health authorities, but also for manufacturers who develop new therapeutic molecules and will be able to have their effectiveness assessed on the basis of real-life data, in addition to clinical studies”, Florence Tubach concludes.