Distillation Datasets ===================== This section lists datasets for knowledge distillation in neural IR, where teacher model scores are used to train student rankers. Pointwise Distillation ---------------------- Pointwise distillation datasets provide teacher-scored ``(query, document, similarity)`` triples — a single document per query together with a teacher similarity score. .. dm:datasets:: ai.lighton.embeddings_pre_training ir .. dm:datasets:: ai.lighton.embeddings_pre_training.denseon_lateon ir Pairwise Distillation --------------------- Pairwise distillation datasets contain triples of (query, positive document, negative document) with teacher model scores for each document. Hofstaetter Neural Ranking KD ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Teacher scores for MS MARCO passage ranking from `neural-ranking-kd `_. Contains ~40M triples with BERT-based teacher scores in TSV format (pos_score, neg_score, query_id, pos_passage_id, neg_passage_id). .. dm:datasets:: com.github.hofstaetter.distillation ir Listwise Distillation --------------------- Listwise distillation datasets contain ranked lists of documents for each query, produced by a teacher model. .. dm:datasets:: com.github.webis-de.rank-distillm ir