Learning visual models from large-scale data
Learning visual models from large-scale data

Thoth is a joint team of Inria and Laboratoire Jean Kuntzmann, and started in January 2016. It is a follow up to the LEAR team (2003-2015).

Thoth is motivated by today's context in which the quantity of digital images and videos available on-line continues to grow at a phenomenal speed: home users put their movies on YouTube and their images on Flickr; journalists and scientists set up web pages to disseminate news and research results; and audiovisual archives from TV broadcasts are opening to the public. Thus, there is a pressing and in fact increasing demand to annotate and index this visual content for home and professional users alike. Current object recognition and scene understanding technology mostly relies on fully supervised classification engines, and visual models are essentially (piecewise) rigid templates learned from hand labeled images. The sheer scale of on-line data and the nature of the embedded annotation call for a departure from this fully supervised scenario. The main objective of the Thoth project-team is to develop a new framework for learning the structure and parameters of visual models by actively exploring large digital image and video sources (off-line archives as well as growing on-line content), and exploiting the weak supervisory signal provided by the accompanying meta-data.

Centre(s) inria
Inria Centre at Université Grenoble Alpes


Team leader

Nathalie Gillot

Team assistant