OrphaMine: a tool to better understand rare diseases
Like Dr House in the eponymous television series, doctors are sometimes confronted with enigmas when faced with certain pathologies. The OrphaMine platform, developed as part of the ANR Hybride project, aims to provide specialists with a better understanding of rare diseases. Tested in-house, it will soon be offered to a broader panel made up of doctors, researchers and representatives of the pharmaceutical industry.
Yannick Toussaint: A researcher, he was one of the founder members of the ORPAILLEUR team in 1998. For the Hybride project, he received a four-year French National Research Agency (ANR) grant. Funding was extended for another year in order to enable his team to finalise the OrphaMine platform.
Chedy Raïssi: Researcher with the ORPAILLEUR team at Inria Nancy - Grand Est since 2009, he works together with Yannick Toussaint on the OrphaMine platform.
How did the OrphaMine platform begin?
YT : We realised that specialists in rare diseases had problems gathering together existing knowledge. Over 8,000 diseases have been identified and characterised, but in reality there are more than 15,000. Furthermore, certain pathologies manifest themselves in very different ways depending on the patient and, as a result, diagnosis can take several years. There is a limited number of patients, which is why few doctors take an interest in it. Our aim is therefore to provide them with a platform in order to visualise all of the knowledge acquired about these rare pathologies. One of our working sources is the bibliographic database Medline, which compiles millions of medical texts. The aim is to extract the most relevant data from this mass of texts.
Which methods do you use?
YT : With the Hybride project, we work with teams from the French National Institute of Health and Medical Research (Inserm) but also the Greyc* and MoDyCo** laboratories. We study the texts in order to extract information from them. This is similar to "Machine Learning": our algorithms need to be able to recognise words and understand if they refer to a disease, a symptom, a bacteria or a treatment. Then, to extract the information, we make sure that the software recognises the syntax of the phrase in order to understand the links between the words (causality, opposition, etc.) including in complex sentences. The link between a disease and a symptom, for example, can be expressed in different ways, and shades of meaning can be very subtle. The MoDyCo team helps us a lot with this.
CR : In concrete terms, starting with medical texts, we draw a network of interconnected data: a disease is linked to several symptoms and to the genes involved, for example. This initial work makes it possible to obtain a representation of existing knowledge. Subsequently I work on detecting hidden patterns and links in this data network, and for this I use data mining algorithms. Here is an example: our data network enables us to affirm that a disease A is linked to a gene B. It also informs us that gene B is present, with certain variations, in mice, and that direct interactions exist between proteins encoded by the mouse genome and the existing gene B. It would be in the doctors' interest to study the equivalent of these proteins identified in humans in more detail in order to improve their knowledge of the disease. My work makes it possible to highlight these links.
In concrete terms, how does the platform operate?
YT :We still have work to do to make our platform accessible and easy to use for doctors and, eventually, the general public. The aim is for them to be able to use the software to ask a question about a rare disease. For example, they can enter the name of an illness to find out all of the symptoms associated with it. They will also be able to enter a certain number of symptoms seen in their patient. The higher the number of symptoms, the smaller the group of possible diseases will be.
CR :We have already presented our platform to doctors in several hospitals, and they provide us with regular feedback on our work. A doctor from the University Hospital (CHU) in Nancy works within the team in order to refine our results. In January, the panel of users will be extended to include doctors, researchers and representatives of the pharmaceutical industry.
*Greyc: Computer science laboratory at the Université de Caen
**MoDyCo: Models, Dynamics, Corpora Linguistics laboratory bringing together researchers from the Université Paris-Ouest Nanterre-La Défense and the CNRS.
The ORPAILLEUR team is a joint project team between the CNRS, Inria and the Université de Lorraine.