DEFT2011

homepresentationcorpusformatsevaluationsfaqproceedings

Output formats

This page explains the awaited output formats for each task. Scripts will be provided lately in order to check the well formedness of each output file.

Remind of objectives for each task:

For each track, participants are allowed to submit up to 3 runs.

For each document to deal with (a newspaper extract in task 1, an abstract in task 2), participants are allowed, if wanted, to give several results that must be weighted using a confidence score. The sum of each confidence score for a document must be equal to 1.

▸ File name format:

Task 1. Diachronic variation

For this task, whatever the choosen output format (w/ or w/o confidence score), we ask the participants to give inevitably the rank of each answer for each document (see examples below). Results will be evaluated with two methdologies:

▸ Output format w/o confidence score

We are waiting for an XML file indicating, for each studyed portion, the estimated year of parution.

<?xml version="1.0" encoding="utf-8" ?>
<corpus>
 <portion id="1">
  <annee valeur="1879" rang="1" />
 </portion>
 <portion id="2">
  <annee valeur="1934" rang="1" />
 </portion>
</corpus>

Where "annee" stands for "year", "valeur" for "value" and "rang" for "rank".

▸ Output format w/ confidence score

We are waiting for an XML file indicating for each studyed portion, all estimated year of parution, each year being weighted with a confidence score (the sum not being more than 1 for a document). We ask the participants to indicate inevitably the rank of each result (year of rank 1 being used to compute the final ranking).

<?xml version="1.0" encoding="utf-8" ?>
<corpus>
 <portion id="1">
  <annee valeur="1879" score="0.42" rang="1" />
  <annee valeur="1878" score="0.27" rang="2" />
  <annee valeur="1880" score="0.14" rang="3" />
  <annee valeur="1882" score="0.09" rang="4" />
  <annee valeur="1874" score="0.08" rang="5" />
 </portion>
 <portion id="2">
  <annee valeur="1931" score="0.41" rang="1" />
  <annee valeur="1934" score="0.41" rang="2" />
  <annee valeur="1943" score="0.18" rang="3" />
 </portion>
</corpus>

Where "annee" stands for "year", "valeur" for "value" and "rang" for "rank".

 

Task 2. Abstract/article matching

For this task, whatever the choosen output format (w/ or w/o confidence score), to indicate the rank of each answer for a document is not necessary. Results will be computed as follow: each given answer (the id of an article associated to the studyed abstract in following examples) will be taken in consideration in the evaluation (in terms of number of documents brought and those correctly brought, necessary to compute recall and precision).

▸ Output format w/o confidence score

We are waiting for an XML file indicating, for each studyed abstract (tag <resume fichier="name.res" />), the scientific article that matches (tag <article fichier="name.art" />) by merging this couple between <doc> and </doc> tags. In this case, the association force is maximum. If the given article matches to the abstract, the participant will obtain 100% of points.

<?xml version="1.0" encoding="utf-8" ?>
<corpus>
 <doc>
  <resume fichier="001.res" />
  <article fichier="127.art" />
 </doc>
 <doc>
  <resume fichier="002.res" />
  <article fichier="246.art" />
 </doc>
</corpus>

Where "resume" stands for "abstract" and "fichier" to "file".

And for abstract/text pairing:

<?xml version="1.0" encoding="utf-8" ?>
<corpus>
 <doc>
  <resume fichier="001.res" />
  <texte fichier="199.txt" />
 </doc>
 <doc>
  <resume fichier="002.res" />
  <texte fichier="064.txt" />
 </doc>
</corpus>

Where "resume" stands for "abstract" and "fichier" to "file".

▸ Output format w/ confidence score

We are waiting for an XML file indicating, for each studyed abstract (tag <resume fichier="name.res" />), all the scientific articles estimated to match with (tag <article fichier="name.art" score="score" />) by merging this group of tags between <doc> and </doc> tags. In this case, the association force is the given confidence score. For each document, the sum of all confidence score must be of 1.

<?xml version="1.0" encoding="utf-8" ?>
<corpus>
 <doc>
  <resume fichier="001.res" />
  <article fichier="127.art" score="0.41" />
  <article fichier="199.art" score="0.31" />
  <article fichier="001.art" score="0.28" />
 </doc>
 <doc>
  <resume fichier="002.res" />
  <article fichier="246.art" score="0.49" />
  <article fichier="016.art" score="0.37" />
  <article fichier="177.art" score="0.14" />
 </doc>
</corpus>

Where "resume" stands for "abstract" and "fichier" to "file".

And for abstract/text pairing:

<?xml version="1.0" encoding="utf-8" ?>
<corpus>
 <doc>
  <resume fichier="001.res" />
  <texte fichier="127.txt" score="0.41" />
  <texte fichier="199.txt" score="0.31" />
  <texte fichier="001.txt" score="0.28" />
 </doc>
 <doc>
  <resume fichier="002.res" />
  <texte fichier="246.txt" score="0.49" />
  <texte fichier="016.txt" score="0.37" />
  <texte fichier="177.txt" score="0.14" />
 </doc>
</corpus>