The 2019 issue of the DEFT challenge is dedicated to the analysis of clinical cases in French. This issue is composed of three tasks on information retrieval and extraction. This is the first challenge in which clinical texts in French are to be processed.
What are clinical cases?
Clinical cases describe clinical situations of patients, real or fake. The cases are published in various sources (scientific, didactic, associative, legal...). They are de-identified. Their purpose is to present situations that are typical (as in didactic sources) or rare (as in scientific sources).
Global information on the corpus
The corpus used in this challenge is part of a larger corpus with clinical cases, with more complete annotations and associated information . For DEFT 2019, the Organizers focused on clinical cases associated with keywords and discussions. These clinical cases are related to various medical specialties (cardiology, urology, oncology, obstetrics, pulmonology, gastro-enterology...). They have been published in different French-speaking countries (France, Belgium, Switzerland, Canada, African countries, tropical countries...).
The reference data are consensual and obtained from two independent annotations.
 N Grabar, V Claveau, C Dalloux. CAS: French Corpus with Clinical Cases. LOUHI 2018, p. 1-7
Access to data
Access to data is possible only after the user agreement is signed by all the team members. The participants can engage in one or more tasks. When getting the data, the participants are committed to submit the results for at least one task.
Proposed tasks are:
- Task 1: Indexing of clinical cases
Task 2: Semantic similarity between clinical cases and discussions
- Purpose: to identify, in the list of keywords, the keywords corresponding to a given couple clinical case/discussion
- Input: couples clinical case/discussion, indication of the expected number of keywords, whole set of keywords
- Output: pairing of keywords with couples clinical case/discussion
- Remarks: a given keyword may be associated with several couples clinical case/discussion, some keywords from the whole set of training (and test) are not associated with clinical cases/discussions. The keywords are defined and chosen by the Authors
- Evaluation: the main evaluation measure is Mean Average Precision (MAP), the second evaluation measure is Prec@N (precision at rank N), where N corresponds to the number of the expected keywords. Normalization (inflection, affixation) of keywords for a better comparison and evaluation will be done by the organizers.
Task 3: Information extraction
- Purpose: to pair a given clinical case with the corresponding discussion
- Input: a set of clinical cases, a set of discussions
- Output: paring between clinical cases and discussions
- Remarks: one discussion may be associated with more than one clinical case
- Evaluation: boolean
- Purpose: to detect, in clinical cases, demographic and clinical information
Four types of information are aimed:
- the age of the patient concerned by the case, at the moment of the last clinical event described, normalized to integer (e.g., 0 for babies younger than 1 year, 1 for babies between 1 and 2 years, 20 for twentyish patients, etc.).
- the gender of the patient concerned by the case. Two values are possible: female, male (there is no other possibilities).
- the reason of the appointment or hospitalization, for the last clinical event. This category is usually concerned by pathologies, signs and symptoms, sometimes accidents. The clinical follow-up is in the continuity of preceding clinical events and is not considered as proper reason.
- the outcome among the five possible values: 1° recovery (the clinical problem described has been removed and the patient has fully recovered), 2° improvement (the clinical condition is improved but it is impossible to conclude to full recovery), 3° stable (either the condition remains stable, or it is impossible to define whether there is an improvement or worsening), 4° worsening (the clinical condition is getting worse), or 5° death (when the death is directly related to the clinical case).
- Input: a set of clinical cases
- Output: values extracted for the four types of information aimed
- Remarks: When a document is related to several patients, ages and genders of each one must be identified (for instance, in the case of graft from one donor given to two patients successively, the age and gender of the two beneficiaries must be identified). It is not necessary to link the age with the corresponding gender. When several ages are mentioned for a given patient (the current age and the ages in his medical history), only the age related to the clinical case described must be extracted. Few documents do not permit to define all the categories, in which case the default value NUL is to be used.
- Evaluation: Values of age, gender and outcome will be evaluated through strict comparison (same value between the reference and extracted values). It is not required to indicate the text spans which provide these values. The reason is evaluated by comparison and the intersection rate between the textual portion extracted and the reference textual portion.