Conversation API
This module provides data types for conversational information retrieval and query understanding tasks.
Core Data Classes
Entry types for conversation turns:
- class datamaestro_ir.data.conversation.base.RetrievedEntry(documents: List[str], relevant_documents: Dict[int, Tuple[int | None, int | None]] | None = None)
Bases:
objectList of system-retrieved documents and their relevance
- documents: List[str]
List of retrieved documents
- relevant_documents: Dict[int, Tuple[int | None, int | None]] | None
List of relevance status (optional), with start/stop position
- class datamaestro_ir.data.conversation.base.DecontextualizedItem
Bases:
objectA topic record with decontextualized versions of the topic
- abstractmethod get_decontextualized_query(mode=None) str
Returns the decontextualized query
Conversation structures:
- class datamaestro_ir.data.conversation.base.ConversationEntry
Bases:
TypedDict
- class datamaestro_ir.data.conversation.base.ConversationNode
Bases:
object- abstractmethod entry() ConversationEntry
The current conversation entry
- abstractmethod history() Sequence[ConversationEntry]
Preceding conversation entries, from most recent to more ancient
- class datamaestro_ir.data.conversation.base.ConversationTree
Bases:
ABCRepresents a conversation tree
Conversational IR
- XPM Configdatamaestro_ir.data.conversation.base.ConversationUserTopics(*, id, conversations)
Bases:
TopicsExtract user topics from conversations
- id: str
The unique (sub-)dataset ID
- conversations: datamaestro_ir.data.conversation.base.ConversationDataset
Contextual Query Reformulation
Base class for conversation datasets:
- XPM Configdatamaestro_ir.data.conversation.base.ConversationDataset(*, id)
Bases:
Base,ABCA dataset made of conversations
- id: str
The unique (sub-)dataset ID
CANARD Dataset
- XPM Configdatamaestro_ir.data.conversation.canard.CanardDataset(*, id, path)
Bases:
ConversationDataset,FileA dataset in the CANARD JSON format
The CANARD dataset is composed of
- id: str
The unique (sub-)dataset ID
- path: path
The path of the file
OrConvQA Dataset
- XPM Configdatamaestro_ir.data.conversation.orconvqa.OrConvQADataset(*, id, path)
Bases:
ConversationDataset,File- id: str
The unique (sub-)dataset ID
- path: path
The path of the file
QReCC Dataset
- XPM Configdatamaestro_ir.data.conversation.qrecc.QReCCDataset(*, id, path)
Bases:
ConversationDataset,File- id: str
The unique (sub-)dataset ID
- path: path
The path of the file
iKAT Dataset
- XPM Configdatamaestro_ir.data.conversation.ikat.IkatConversations(*, id, path)
Bases:
ConversationDataset,FileA dataset containing conversations from the IKAT project
- id: str
The unique (sub-)dataset ID
- path: path
The path of the file
CaST Dataset
- XPM Configdatamaestro_ir.data.conversation.cast.CastConversations(*, id, path, year)
Bases:
ConversationDataset,FileA dataset containing TREC CaST conversations (2019-2022).
Parses the official CaST topic JSON files and produces conversation trees compatible with the ConversationUserTopics extractor.
JSON format:
[{"number": N, "title": "...", "turn": [{"number": N, "raw_utterance": "...", ...}]}]- id: str
The unique (sub-)dataset ID
- path: path
The path of the file
- year: int
CaST year (2019, 2020, 2021, or 2022)