Sites Inria

Version française


Charlotte Renauld - 2/12/2015

Back on the Business Convention on Big Data

The first business convention on the subject of "Big Data", organised by Paris-Saclay University at HEC Paris, was held on 24 and 25 November 2015. 

Feedback on this two-day event that mustered almost 550 visitors and 40 exhibitors.

This first edition of the Business Convention on Big Data was designed to offer opportunities and the best possible conditions for the convergence of skills, know-how and expression of needs between the academic world and the socio-economic and industrial ecosystem, major groups, SMEs and innovative start-ups.

The themes of machine learning, data science, open data, health, energy distribution, digital marketing and transport were discussed during seminars moderated by members of Paris-Saclay University (Inria, CNRS, Ecole Polytechnique, etc.) but also major industrial groups (IBM, Renault, Sanofi, etc.). 

Paris-Saclay University was privileged to welcome Christopher Bishop , Director Microsoft Research, Cambridge; Masaru Kitsuregawa , CEO National Institute of Informatics – Japan; Jean-Noël Georges , Global Program Director at Frost & Sullivan; and Francoise Soulié-Fogelman , Professor at the School of Computer Software, Tianjin University, China.

Among the 40 exhibitors (3 large groups, 3 SMEs/SMIs, 12 start-ups and 22 institutions), Inria presented its research relating to the theme of Big Data including in particular the presence of the Geometrica (ToMATo project), Oak (CliqueSquare project), Parietal (Scikit-learn project) and Tao (STOIC project) research teams.


Back on the projects presented on Inria's booth:

* Project-team Geometrica: ToMATo - Topological Mode Analysis Tool

What is ToMATo ? 

ToMATo is a novel scheme for classification and clustering of point-cloud data generated by simulations or measurements of physical processes.

It is highly flexible and has sound theoretical foundations. It provides feedback on the structure of the data in the form of a 2-dimensional diagram called a «persistence diagram». Such feedback can be used for determining the number of clusters, and for distinguishing between the signal and the noise. 

ToMATo is able to perform both hard and soft clustering, and scales up with the size and dimensionality of the data. 

A part of ToMATo’s methods developped are integrated in the open source library Gudhi developped by Geometrica.


Steve Oudot, Geometrica project-team - Inria Saclay - Île-de-France


* Project-team  OAK: CliqueSquare - RDF data management platform based on Hadoop architecture

What is CliqueSquare ? 

RDF (Ressource Description Framework) is the data format for the semantic web. CliqueSquare allows storing and querying very large volumes of RDF data in a massively parralel fashion in a Hadoop cluster. The system uses its own partitioning and storage model for the RDF triples in the cluster. 

CliqueSquare evaluates queries expressed in a dialect of the SPARQL query language. It is particularly efficient when processing complex queries, because it is capable of translating them into MapReduce programs guaranteed to have the minimum number of successive jobs. Given the high overhead of a MapReduce job, this advantage is considerable.


Ioana Manolescu, Benjamin Djahandideh, Project-team Oak - Inria Saclay - Île-de-France / LRI


* Project-team  Parietal: Scikit-learn

What is Scikit-learn ? 

Scikit-learn can be used as a middleware for prediction tasks. For example, many web startups adapt Scikitlearn to predict buying behavior of users, provide product recommendations, detect trends or abusive behavior (fraud, spam). 

Scikit-learn is used to extract the structure of complex data (text, images) and classify such data with techniques relevant to the state of the art. 

Easy to use, efficient and accessible to non datascience experts, Scikit-learn is an increasingly popular machine learning library in Python. In a data exploration step, the user can enter a few lines on an interactive (but non-graphical) interface and immediately sees the results of his request. Scikit-learn is a prediction engine. 

Scikit-learn is developed in open source, and available under the BSD license.


Bertrand Thirion, Gaël Varoquaux, Olivier Grisel, Project-team Parietal - Inria Saclay - Île-de-France


* Project-team  Tao: STOIC

What is STOIC ? 

Current marketing strategies rely heavily on the analysis of online media and social networks. For example, identifying the opinion leaders gives a competitive advantage in selling and promoting products. 

STOIC allows one to identify online opinion leaders from data such as their blog posts or their twitter profiles. 

The key ingredients of STOIC are learning-to-rank techniques and the ground truth.


Philippe Caillou, Project-team Tao - Inria Saclay - Île-de-France / LRI


Pictures of the event :

Keywords: INRIA Saclay - Île-de-France Big data Université Paris-Saclay