news

Tsinghua University wins the Best Paper Time Test Award, Shandong University receives honorary nomination, SIGIR 2024 is released

2024-07-18

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Editor: Xiaozhou, Chen Chen

Tsinghua University has achieved outstanding results.

The 47th ACM SIGIR Conference will be held in Washington, D.C., USA from July 14 to 18, 2024. The conference is the top academic conference in the field of information retrieval.

Just now, the conference announced the Best Paper Award, Best Paper Runner-up, Best Paper Honorable Mention Award, and the Test of Time Award.

Among them, Tsinghua University, Renmin University of China Gaoling School of Artificial Intelligence, and Xiaohongshu team won the best paper awards; researchers from the University of Glasgow and the University of Pisa won the runner-up; the Best Paper Honorary Nomination Award was awarded to researchers from Shandong University (Qingdao), Leiden University, and the University of Amsterdam; and the Test of Time Award was awarded to researchers from Tsinghua University and the University of California, Santa Cruz.

Next, let’s take a look at the specific content of the winning papers.

Best Paper



Paper: Scaling Laws for Dense Retrieval

Paper authors: Fang Yan, Jingtao Zhan, Ai Qingyao, Mao Jiaxin, Weihang Su, Jia Chen, Liu Yiqun

Institutions: Tsinghua University, Renmin University of China Gaoling School of Artificial Intelligence, Xiaohongshu

Paper link: https://dl.acm.org/doi/abs/10.1145/3626772.3657743

Paper IntroductionScaling laws have been observed in a wide range of tasks, especially in language generation. They show that the performance of large language models follows a predictable pattern with model and dataset size, which helps in designing training strategies effectively and efficiently, especially as large-scale training becomes increasingly resource-intensive. However, in dense retrieval, scaling laws have not been fully explored.

The study explored how scaling affects the performance of dense retrieval models. Specifically, the research team implemented dense retrieval models with different numbers of parameters and trained them with different amounts of annotated data. The study used contrastive entropy as an evaluation metric, which is continuous compared to discrete ranking metrics and can therefore accurately reflect the performance of the model.



Experimental results show that the performance of dense retrieval models follows an exact power-law scaling with the model size as well as the number of annotations.







In addition, the study also showed that the expansion law can help optimize the training process, such as solving resource allocation problems under budget constraints.



This study greatly contributes to the understanding of the extension effect of dense retrieval models and provides meaningful guidance for future research.

Best Paper Runner-Up

The runner-up for this year's ACM SIGIR Best Paper Award went to the paper "A Reproducibility Study of PLAID". The authors of the paper include Sean MacAvaney from the University of Glasgow and Nicola Tonellotto from the University of Pisa.



Paper address: https://arxiv.org/pdf/2404.14989

Abstract:ColBERTv2's PLAID algorithm uses clustered term representations to retrieve and progressively prune documents to obtain a final document score. This paper replicates and fills in the missing gaps in the original paper. By studying the parameters introduced by PLAID, the researchers found that its Pareto frontier is formed by a balance between three parameters. Deviations beyond the recommended settings may significantly increase latency without necessarily improving its effectiveness.

Building on this finding, we compare PLAID to an important baseline missing from the paper: a re-ranking vocabulary system. We find that applying ColBERTv2 as a re-ranker on top of an initial pool of BM25 results provides a better efficiency-effectiveness trade-off in a low-latency setting. This work highlights the importance of carefully choosing relevant baselines when evaluating the efficiency of retrieval engines.

Best Paper Honorable Mention Award

The honorable mention award for the best paper of this conference was won by researchers from Shandong University (Qingdao), Leiden University, and University of Amsterdam. The winning paper is "Generative Retrieval as Multi-Vector Dense Retrieval".



Authors: Wu Shiguang, Wei Wenda, Zhang Mengqi, Chen Zhumin, Ma Jun, Ren Zhaochun, Maarten de Rijke, Ren Pengjie

Paper address: https://arxiv.org/pdf/2404.00684

Abstract: This paper demonstrates that generative retrieval and multi-vector dense retrieval share the same framework to measure the relevance of a document query. Specifically, they study the attention layer and prediction head of generative retrieval, revealing that generative retrieval can be understood as a special case of multi-vector dense retrieval. Both methods calculate relevance by computing the sum of the products of the query vector and the document vector with the alignment matrix.

The researchers then explored how this framework can be applied to generative retrieval, using different strategies to compute document token vectors and alignment matrices. They also conducted experiments to verify their conclusions, showing that both paradigms exhibit commonality in term matching in their alignment matrices.

Test of Time Award

This year's ACM SIGIR Test of Time Award was given to the research on explainable recommendation published at SIGIR 2014 10 years ago. The paper is "Explicit Factor Models for Explainable Recommendation based on Phrase-level Sentiment Analysis".



Authors: Zhang Yongfeng, Lai Guokun, Zhang Min, Yi Zhang, Liu Yiqun, Ma Shaoping

Institution: Tsinghua University, University of California, Santa Cruz

Paper link: https://www.cs.cmu.edu/~glai1/papers/yongfeng-guokun-sigir14.pdf

This study defined the "explainable recommendation" problem for the first time and proposed corresponding sentiment analysis methods to solve this technical challenge, and has been playing a leading role in related fields.

Abstract: Recommendation algorithms based on collaborative filtering (CF), such as latent factor model (LFM), perform well in terms of prediction accuracy. However, latent features make it difficult to explain recommendation results to users.

Fortunately, with the growing amount of online user reviews, the information available for training recommender systems is no longer limited to numerical star ratings or user/item features. By extracting users’ explicit opinions on various aspects of a product from reviews, we can gain a more detailed understanding of what users care about, which further reveals the possibility of making explainable recommendations.

This paper proposes EFM (Explicit Factor Model) to generate explainable recommendations while maintaining high prediction accuracy.

The researchers first extracted explicit product features and user opinions by performing phrase-level sentiment analysis on user reviews, and then generated recommendations and non-recommendations based on specific product features of user interest and learned latent features. In addition, intuitive feature-level explanations about why a certain item is recommended or not are generated from the model.

Offline experimental results on multiple real-world datasets show that the proposed framework outperforms competitive baseline algorithms in both rating prediction and top-K recommendation tasks. Online experiments show that detailed explanations make recommendations and non-recommendations more influential on users' purchasing behavior.

Young Scholar Award

The ACM SIGIR Young Scholar Award aims to recognize researchers who play an important role in information retrieval research, scholar community building, and the promotion of academic equity. It is awarded to young researchers who have received their doctoral degrees within 7 years. Ai Qingyao, an assistant professor from the Department of Computer Science at Tsinghua University, and Wang Xiang, a professor and doctoral supervisor from the School of Cyberspace Security and the School of Big Data at the University of Science and Technology of China, won the SIGIR 2024 Young Scholar Award.

Ai Qingyao

Ai Qingyao is an assistant professor in the Department of Computer Science at Tsinghua University. His main research areas are information retrieval, machine learning, and natural language processing. His research focuses on the research and design of intelligent information retrieval systems, including information representation learning, ranking optimization theory, and the application of large language models in Internet search and push and smart justice.

Wang Xiang

Wang Xiang is a professor and doctoral supervisor at the School of Cyberspace Security and School of Big Data at the University of Science and Technology of China. Professor Wang Xiang’s research interests include information retrieval, data mining, and trustworthy and explainable artificial intelligence, especially recommendation systems, graph learning, and social media analysis.