ANR Autogramm Project •••


Autogramm (Induction of descriptive grammar from annotated corpora) is a project funded by ANR (Agence Nationale de la Recherche), Projet-ANR-21-CE38-0017. It started in January 2022 and last 4 years.


The goal of this project is to automate, as far as possible, the extraction of descriptive grammars and grammatical descriptions from annotated corpora for linguistic and typological studies. We aim at descriptions that

  1. highlight the main properties of the corpus (and by extension the language or variety that the corpus represents);
  2. are easily understandable for a linguist;
  3. can be visualized by text, diagrams, or tables, including grammar databases generally oriented toward comparative and typological studies;
  4. and may vary in size.

As these grammatical descriptions are induced from a corpus, they contain quantitative information associated with each observation made on that corpus, as well as relevant examples extracted from it.

The project also promotes the development of treebanks for under-endowed languages, in order to extract quantitative descriptive grammars for these languages. The project uses the annotation scheme SUD (Surface-syntactic Universal Dependencies), the query tool Grew-match and the annotation tool ArboratorGrew.



sylvain at