Conversational IR Datasets
==========================
This section lists datasets for conversational information retrieval
and contextual query understanding tasks.
Conversational Search
---------------------
These datasets evaluate multi-turn conversational search systems where
users engage in conversations to satisfy complex information needs.
TREC CaST
~~~~~~~~~~
The `TREC Conversational Assistance Track `_ (CaST)
evaluates conversational information seeking over multi-turn dialogues.
Runs from 2019 to 2022 with evolving document collections across versions.
.. dm:datasets:: gov.nist.trec.cast ir
iKAT
~~~~
The `iKAT `_ (Interactive Knowledge
Assistance Track) datasets for conversational search and query rewriting,
using the ClueWeb22 document collection. Runs from 2023 to 2025.
.. dm:datasets:: com.github.ikat ir
Contextual Query Rewriting
--------------------------
These datasets contain conversational queries that need to be rewritten
to be self-contained (decontextualization), resolving coreferences and
ellipses from the conversation context.
CANARD
~~~~~~
Context-dependent Query Rewriting dataset for conversational question answering.
Contains queries from QuAC that have been manually rewritten to be self-contained.
.. dm:datasets:: com.github.aagohary.canard ir
Example:
.. code-block:: python
from datamaestro import prepare_dataset
canard = prepare_dataset("com.github.aagohary.canard.train")
for entry in canard.iter():
print(f"Original: {entry.source}")
print(f"Rewritten: {entry.rewrite}")
OrConvQA
~~~~~~~~
Open-Retrieval Conversational Question Answering dataset.
Contains multi-turn QA conversations with passage retrieval.
.. dm:datasets:: com.github.prdwb.orconvqa ir
QReCC
~~~~~
Question Rewriting in Conversational Context dataset.
Contains conversations with human rewrites of questions.
.. dm:datasets:: com.github.apple.ml-qrecc ir