Natural Language Processing

Industrial use of natural language processing techniques is subject to specific constraints (real time, short development cycles, low linguistic expertise locally available, etc…) which are not particularly compatible with the methods usually applied in classical approaches of computational linguistics. However, recent advances in the field of corpora-based linguistics open a whole set of new possibilities. In particular, the research in Natural Language Processing at the LIA focuses on text-mining (knowledge extraction out of textual data), automatic production of syntactic tools and evaluation of NLP tools.

Our text-mining methods ar based on techniques developed for information retrieval using a Distributional Semantic approach. In such methods, semantic proximities are derived from co-frequency matrices computed on large textual corpora. Different similarity measures are used to characterize the proximity between queries and documents which are represented in an unified way as projections in a high-dimensional vector space of pertinent terms.

Methods for automatic production of syntactic tools aim to implement probabilistic techniques and models operating on textual corpora (raw or annotated texts) in order to adapt various generic algorithms to specific applications: part-of-speech tagging, speech recognition, information retrieval, etc…

Current projects:

Past projects: