SIGIR 2025, Padua, 13-18 July

Tutorials

SIGIR 2025 will feature a rich program of 14 tutorials offering in-depth introductions, advanced techniques, and hands-on explorations across topics in Information Retrieval, Recommendation, and Retrieval-Augmented Generation.

Half Day Tutorials:

Conversational Search: From Fundamentals to Frontiers in the LLM Era
Efficient In-Memory Inverted Indexes: Theory and Practice
R²LLMs: Retrieval and Ranking with LLMs
Retrieval-Enhanced Machine Learning: Synthesis and Opportunities
Query Understanding in LLM-based Conversational Information Seeking
Navigating Large Language Models for Recommendation: From Architecture to Learning Paradigms and Deployment
Theory and Toolkits for User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation
Psychological Aspects in Retrieval and Recommendation
Dynamic and Parametric Retrieval-Augmented Generation
Fairness in Information Retrieval from an Economic Perspective
Unveiling Knowledge Boundary of Large Language Models for Trustworthy Information Access
Neural Lexical Search with Learned Sparse Retrieval
Long Context vs. RAG: Strategies for Processing Long Documents in LLMs

Full Day Tutorials:

Information Retrieval in Finance: Industry and Academic Perspectives on Innovation

Half Day Tutorials

Conversational Search: From Fundamentals to Frontiers in the LLM Era

Organizers: Fengran Mo, Chuan Meng, Mohammad Aliannejadi and Jian-Yun Nie.

Overview: Conversational search enables multi-turn interactions between users and systems to fulfill users' complex information needs. During this interaction, the system should understand the users' search intent within the conversational context and then return the relevant information through a flexible, dialogue-based interface. The recent powerful large language models (LLMs) with capacities of instruction following, content generation, and reasoning, attract significant attention and advancements, providing new opportunities and challenges for building up intelligent conversational search systems. This tutorial aims to introduce the connection between fundamentals and the emerging topics revolutionized by LLMs in the context of conversational search. It is designed for students, researchers, and practitioners from both academia and industry. Participants will gain a comprehensive understanding of both the core principles and cutting-edge developments driven by LLMs in conversational search, equipping them with the knowledge needed to contribute to the development of next-generation conversational search systems.

Website: https://convsearch.github.io/

Efficient In-Memory Inverted Indexes: Theory and Practice

Organizers: Joel Mackenzie, Sean MacAvaney, Antonio Mallia and Michal Siedlaczek.

Overview: Inverted indexes are the backbone of most large-scale information retrieval systems. Although conceptually simple, high-performance inverted indexes require a deep understanding of low-level system optimizations, compression techniques, and traversal strategies. With the widespread adoption of in-memory search engines, the rise of learned sparse retrieval (LSR), and the increasing complexity of ranking pipelines, the design space for efficient indexing and retrieval systems has expanded significantly. This tutorial addresses a critical knowledge gap between textbook-style explanations and advanced techniques required for efficient and optimized retrieval. It aims to equip researchers and practitioners with a comprehensive understanding of how modern in-memory search systems are designed, built, and optimized for high-performance retrieval across large-scale document collections. As part of this tutorial, the participants will learn important theoretical concepts and how to apply them in practice using the open source PISA search engine. They will work through a series of examples illustrating how to build and query an index, compare performance and relevance across multiple parameters such as compression techniques and retrieval algorithms, etc. The knowledge and skills learned from this tutorial will serve as a basis for extending PISA with new state-of-the-art IR techniques and evaluating them in an academic setting.

Website:https://pisa-engine.github.io/sigir-2025.html

R²LLMs: Retrieval and Ranking with LLMs

Organizers: Guido Zuccon, Shengyao Zhuang and Xueguang Ma.

Overview: Generative Large Language Models (LLMs) like GPT, Gemini, and Llama are transforming Information Retrieval, enabling new and more effective approaches to document retrieval and ranking. The switch from the previous generation pre-trained language models backbones (e.g., BERT, T5) to the new generative LLMs backbones has required the field to adapt training processes; it also has provided unprecedented capabilities and opportunities, stimulating research into zero-shot approaches, reasoning approaches, reinforcement learning based training, and multilingual and multimodal applications. This tutorial will provide a structured overview of LLM-based retrievers and rankers, covering fundamental architectures, training paradigms, real-world deployment considerations, and open challenges and research directions.

Website:https://ielab.io/tutorials/r2llms.html

Retrieval-Enhanced Machine Learning: Synthesis and Opportunities

Organizers: Fernando Diaz, Andrew Drozdov, To Eun Kim, Alireza Salemi and Hamed Zamani.

Overview: Retrieval-enhanced machine learning (REML) refers to the use of information retrieval methods to support reasoning and inference in machine learning tasks. Although relatively recent, these approaches can substantially improve model performance. This includes improved generalization, knowledge grounding, scalability, freshness, attribution, interpretability and on-device learning. To date, despite being influenced by work in the information retrieval community, REML research has predominantly been presented in natural language processing (NLP) conferences. Our tutorial addresses this disconnect by introducing core REML concepts and synthesizing the literature from various domains in machine learning (ML), including but beyond NLP. What is unique to our approach is that we used consistent notations, to provide researchers with a unified and expandable framework.

Website:https://retrieval-enhanced-ml.github.io/sigir-2025.html

Query Understanding in LLM-based Conversational Information Seeking

Organizers: Yifei Yuan, Zahra Abbasiantaeb, Mohammad Aliannejadi and Yang Deng.

Overview: Query understanding in Conversational Information Seeking (CIS) involves accurately interpreting user intent through context-aware interactions. This includes resolving ambiguities, refining queries, and adapting to evolving information needs. Large Language Models (LLMs) enhance this process by interpreting nuanced language and adapting dynamically, improving the relevance and precision of search results in real-time. In this tutorial, we explore advanced techniques to enhance query understanding in LLM-based CIS systems. We delve into LLM-driven methods for developing robust evaluation metrics to assess query understanding quality in multi-turn interactions, strategies for building more interactive systems, and applications like proactive query management and query reformulation. We also discuss key challenges in integrating LLMs for query understanding in conversational search systems and outline future research directions. Our goal is to deepen the audience’s understanding of LLM-based conversational query understanding and inspire discussions to drive ongoing advancements in this field.

Website:https://sigirusertutorial.github.io/

Navigating Large Language Models for Recommendation: From Architecture to Learning Paradigms and Deployment

Organizers: Xinyu Lin, Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang and Fuli Feng.

Overview: Large Language Models (LLMs) are reshaping the landscape of recommender systems, giving rise to the emerging field of LLM4Rec that attracts both academia and industry. Unlike earlier approaches that simply borrowed model architectures or learning paradigms from language models, recent advances have led to a dedicated and evolving technical stack for LLM4Rec, spanning architecture design, pre-training and post-training strategies, inference techniques, and real-world deployment. This tutorial offers a systematic and in-depth overview of LLM4Rec through the lens of this technical stack. We will examine how LLMs are being adapted to recommendation tasks across different stages, empowering them with capabilities such reasoning, planning, and in-context learning. Moreover, we will highlight practical challenges including complex user modeling, trustworthiness, and evaluation. Distilling insights from recent research and identifying open problems, this tutorial aims to equip participants with a comprehensive understanding of LLM4Rec and inspire continued innovation in this rapidly evolving field.

Website:https://generative-rec.github.io/tutorial-sigir25/

Theory and Toolkits for User Simulation in the Era of Generative AI: User Modeling, Synthetic Data Generation, and System Evaluation

Organizers: Krisztian Balog, Nolwenn Bernard, Saber Zerhoudi and Chengxiang Zhai.

Overview: Interactive AI systems, including search engines, recommender systems, conversational agents, and generative AI applications, are increasingly central to user experiences. However, rigorously evaluating their performance, training them effectively with interaction data, and modeling user behavior for personalization remain significant challenges, often difficult to address reproducibly and at scale. User simulation, which employs intelligent agents to mimic human interaction patterns, offers a powerful and versatile methodology to tackle these interconnected issues. This half-day tutorial provides a comprehensive overview of modern user simulation techniques for interactive AI systems. We will explore the theoretical foundations and practical applications of simulation for system evaluation, algorithm training, and user modeling, emphasizing the crucial connections between these uses. The tutorial covers key simulation methodologies, with a particular focus on recent advancements leveraging large language models, discussing both the opportunities they present and the open challenges they entail. Crucially, we will also provide practical guidance, highlighting relevant toolkits, libraries, and datasets available to researchers and practitioners.

Website:http://usersim.ai/sigir2025-tutorial

Psychological Aspects in Retrieval and Recommendation

Organizers: Markus Schedl, Elisabeth Lex and Marko Tkalcic.

Overview: Psychological processes play a critical role in shaping users' interactions with information retrieval (IR) and recommender systems (RS). Therefore, understanding human cognition, decision-making, and emotions is vital to enable user-centric retrieval and recommendation systems. Vice versa, understanding whether these aspects are also present in the systems themselves (e.g., in training data, ranking models, or outputs), or even injecting them on purpose, can inform the development of psychology-inspired systems. The tutorial provides its attendees with an introduction to psychological concepts that are important in the ecosystem of search, retrieval, and recommendation. More precisely, we introduce cognitive architectures as computational frameworks that model human cognitive processes such as memory, learning, attention, and decision-making. Leveraging these architectures can improve IR and RS by making them more adaptive, interpretable, and user-centric. Subsequently, we discuss a mixture of well-studied and lesser-studied cognitive biases in the context of IR and RS, pertaining to both the system (training, model, and inference) and the user-system interactions. Afterwards, we introduce models of personality and affect and discuss how they can be used in IR and RS. Ultimately, we discuss opportunities and challenges of psychology-informed IR and RS. The interdisciplinary tutorial requires intermediate expertise in terms of IR and RS, while we do not assume knowledge in psychology.

Website:https://github.com/aisocietylab/Psy-IR-RecSys-SIGIR25

Dynamic and Parametric Retrieval-Augmented Generation

Organizers: Weihang Su, Qingyao Ai, Jingtao Zhan, Qian Dong and Yiqun Liu.

Overview: Retrieval-Augmented Generation (RAG) has become a foundational paradigm for enhancing large language models (LLMs) with external knowledge, playing an important role in modern information retrieval and knowledge-intensive tasks. Standard RAG systems typically adopt a static retrieve-then-generate pipeline and rely on in-context knowledge injection, which can be suboptimal for complex tasks that require multihop reasoning, adaptive information access, and deeper integration of external knowledge. Motivated by these limitations, the research community has moved beyond static retrieval and in-context knowledge injection. Among the emerging directions, this tutorial delves into two rapidly growing and complementary research directions on RAG: Dynamic RAG and Parametric RAG. Dynamic RAG explores when and what to retrieve during the LLM's generation process, enabling real-time adaptation to its evolving information needs. Parametric RAG rethinks how the retrieved knowledge should be incorporated, moving from input-level to parameter-level knowledge injection for improved efficiency and effectiveness. This tutorial offers a comprehensive overview of recent advances in both directions. It also shares theoretical foundations and practical insights to support and inspire further research in RAG.

Website:https://sites.google.com/view/sigir2025-tutorial-dprag/home-page

Fairness in Information Retrieval from an Economic Perspective

Organizers: Chen Xu, Clara Rus, Yuanna Liu, Marleen de Jonge, Jun Xu and Maarten de Rijke.

Overview: Recently, fairness-aware information retrieval (IR) systems have been receiving much attention. Numerous fairness metrics and algorithms have been proposed. The complexity of fairness and IR systems makes it challenging to provide a systematic summary of the progress that has been made. This complexity calls for a more structured framework to navigate future fairness-aware IR research directions. The field of economics has long explored fairness, offering a strong theoretical and empirical foundation. Its system-oriented perspective enables the integration of IR fairness into a broader framework that considers societal and intertemporal trade-offs. In this tutorial, we first highlight that IR systems can be understood as a specialized economic market. Then, we re-organize fairness algorithms through three key economic dimensions—macro vs.\ micro, demand vs.\ supply, and short-term vs.\ long-term. We effectively view most fairness categories in IR from an economic perspective. Finally, we illustrate how this economic framework can be applied to various real-world IR applications and we demonstrate its benefits in industrial scenarios. Different from other fairness-aware tutorials, our tutorial not only provides a new and clear perspective to re-frame fairness-aware IR but also inspires the use of economic tools to solve fairness problems in IR. We hope this tutorial provides a fresh, broad perspective on fairness in IR, highlighting open problems and future research directions.

Website:https://economic-fairness-ir.github.io/

Unveiling Knowledge Boundary of Large Language Models for Trustworthy Information Access

Organizers: Yang Deng, Moxin Li, Liang Pang, Wenxuan Zhang and Wai Lam.

Overview: Large Language Models (LLMs) have emerged as powerful tools for generating content and facilitating information seeking across diverse domains. While their integration into conversational systems opens new avenues for interactive information-seeking experiences, their effectiveness is constrained by their knowledge boundaries—the limits of what they know and their ability to provide reliable, truthful, and contextually appropriate information. Understanding these boundaries is essential for maximizing the utility of LLMs for real-time information seeking while ensuring their reliability and trustworthiness. In this tutorial, we will explore the taxonomy of knowledge boundary in LLMs, addressing their handling of uncertainty, response calibration, and mitigation of unintended behaviors that can arise during interaction with users. We will also present advanced techniques for optimizing LLM behavior in generative information-seeking tasks, ensuring that models align with user expectations of accuracy and transparency. Attendees will gain insights into research trends and practical methods for enhancing the reliability and utility of LLMs for trustworthy information access.

Website:https://knowledge-boundary-tutorial.github.io/

Neural Lexical Search with Learned Sparse Retrieval

Organizers: Andrew Yates, Carlos Lassance, Cosimo Rulli, Eugene Yang, Sean MacAvaney, Siddharth Singh, Thong Nguyen and Yibin Lei.

Overview: Learned Sparse Retrieval (LSR) techniques use neural machinery to represent queries and documents as learned bags of words. In contrast with other neural retrieval techniques, such as generative retrieval and dense retrieval, LSR has been shown to be a remarkably robust, transferable, and efficient family of methods for retrieving high-quality search results. This half-day tutorial aims to provide an extensive overview of LSR, ranging from its fundamentals to the latest emerging techniques. By the end of the tutorial, attendees will be familiar with the important design decisions of an LSR system, know how to apply them to text and other modalities, and understand the latest techniques for retrieving with them efficiently.

Website:https://lsr-tutorial.github.io/

Long Context vs. RAG: Strategies for Processing Long Documents in LLMs

Organizers: Xinze Li, Yushi Bai, Bowen Jin, Fengbin Zhu, Liangming Pan and Yixin Cao.

Overview: Large Language Models (LLMs) excel at zero- and few-shot learning but are restricted by the length of context windows when processing long documents. Two strategies have emerged to overcome this limitation: (1) Long Context (LC) methods, which extend or compress transformer architectures to input more text; and (2) Retrieval-Augmented Generation (RAG), which integrates external knowledge sources via embedding- or index-based retrieval. This half-day tutorial offers a unified, beginner-friendly introduction to both approaches. We first review transformer fundamentals—positional encoding, attention complexity, and common LC techniques. Next, we explain the classic RAG pipeline and recent RAG strategies, alongside evaluation metrics and benchmarks. We also analyze recent empirical studies to highlight strengths, limitations, and trade-offs of LC vs. RAG in terms of scalability, computational cost, and retrieval effectiveness. We conclude with best practices for real-world deployments, emerging hybrid architectures, and open research directions, equipping IR researchers and practitioners with actionable guidelines for processing long documents in LLMs.

Website:https://sites.google.com/view/sigir25-lc-vs-rag/

Full Day Tutorials

Information Retrieval in Finance: Industry and Academic Perspectives on Innovation

Organizers: Chung-Chi Chen, Yongjae Lee, Chanyeol Choi, Richard Mccreadie, Javier Sanz-Cruzado and Alejandro Lopez-Lira.

Overview: Information retrieval (IR) plays a critical role in financial decision-making across investment research, trading, risk management, and reporting. With the rise of large language models (LLMs), IR systems have evolved to support more natural, context-aware workflows. In this tutorial, we survey recent advances in applying IR and LLM technologies in finance, covering agent-based simulations, investor recommender systems, retrieval-augmented research management, and LLM-driven portfolio construction. We highlight practical challenges and propose future research directions at the intersection of IR, LLMs, and financial innovation.

Website:https://sites.google.com/view/irfin/