SIGIR 2025, Padua, 13-18 July

Conversational Search: Towards Personalization and Evaluation

Zahra Abbasiantaeb - University of Amsterdam

DOI: https://doi.org/10.1145/3726302.3730126

Conversational information seeking (CIS) systems aim to understand a user`s evolving information needs in the context of a conversation and respond effectively. These systems are especially valuable for users who may struggle with traditional interfaces, offering a more natural and accessible mode of interaction. To ensure the groundedness and accuracy of responses, existing methods break the task into several subtasks, namely, dialogue context modeling, retrieval, and answer generation. The user`s information need is often represented by either a single rewritten query or a single representation in the query embedding space. This leads to several limitations, especially in cases where the information need cannot be answered using a single passage and requires complex reasoning over multiple facts from different sources. We address this limitation by proposing the MQ4CS model, which breaks the information need of the user into multiple queries with different aspects. In MQ4CS, the retrieval is done for each query and rank list fusion is done over the list of documents retrieved for each query. Evaluation of responses generated by Retrieval Augmented Generation (RAG) systems remains an open problem. The quality of the response is mainly assessed using surface-based QA metrics, human evaluation, or instructing the Large Language Models (LLMs). Since RAG-generated responses integrate both retrieved documents and the LLMs`s internal knowledge, traditional surface-based QA metrics are not effective for assessment. Furthermore, existing RAG benchmarks address more complex information needs compared to ad-hoc retrieval and QA, highlighting the necessity of measuring the completeness of generated responses. To better evaluate completeness and correctness, we propose a nugget-based evaluation pipeline called CONE-RAG that measures the precision and recall of key information nuggets in generated answers. Personalization is an emerging key challenge in Conversational Search (CS). A personalized CS system must adapt its response based on the user`s personal information and search history. Given the same user question, the response of a personalized CS system must be different for different users. To facilitate the research on the development and evaluation of personalized CS systems, I have co-organized the iKAT track at TREC, where we released the iKAT 2023 and 2024 datasets.

Presented by:

Zahra, Abbasiantaeb

University of Amsterdam

Slides: Hidden