The défi fouille de textes (DEFT) consists in an evaluation campaign in text mining, organized from 2005. Each year, new thematics are proposed. Within the framework of the 2011 edition, we proposed to mine scientific papers from the Humanities, asking participants to identify the pair abstract/article.
In this new edition, we proposed to work again on this corpus of scientific papers, by focusing the work on the issue of indexing the scientific papers: to identify the keywords chosen by the authors to index their paper, considering both abstract and whole article. The closing workshop will take place, theoretically, during the JEP/TALN2012 conference in Grenoble (38).
The team participating in the DEFT2012 have to register online, and sign agreement to access the corpora. Training corpora will be given to registered participants that would have send their signed agreement, from February 6th. Those corpora are composed of 60% of the original corpus. The remaining 40% will be used for the test corpora. The test stage will take place during the period from 9 to 15th of April. From the begining day they will have chose in this period, participants will benefit of three days to apply, on the test corpora, the methods they designed during the training stage.
Results will be evaluated using classical evaluation measures (recall, precision, F-measure) comparing keywords given by the participants with keywords from the reference.
Reference: trak 1 and 2 reference files (training and test corpora).