news

ACL 2024 Awards: One of the Best Papers on Oracle Deciphering by Huazhong University of Science and Technology, GloVe Time Test Award

2024-08-15

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

Machine Heart Report

Synced Editorial Department

At this ACL conference, contributors gained a lot.

The six-day ACL 2024 is being held in Bangkok, Thailand.



ACL is a top international conference in the field of computational linguistics and natural language processing, organized by the Association for Computational Linguistics and held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference.

This year's ACL conference is the 62nd, and received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper awards. This time, there were 7 best paper awards (two unpublished), 1 best theme paper award, and 35 outstanding paper awards.

The conference also awarded three Resource Award papers, three Social Impact Award papers, and two Test of Time Awards.

In addition, the Lifetime Achievement Award of this conference was presented to Ralph Grishman, professor of computer science at New York University.

The following is the specific award information.

Best Paper



Paper 1: Mission: Impossible Language Models

  • 作者:Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
  • Institutions: Stanford University, University of California, Irvine, University of Texas at Austin
  • Paper link: https://arxiv.org/abs/2401.06416

Abstract: Chomsky et al. argue that large language models (LLMs) are equally capable of learning languages ​​that humans may or may not learn. However, there is little published experimental evidence to support this claim.

The study developed a set of synthetic languages ​​of varying complexity, each designed by systematically altering English data using unnatural word orders and grammatical rules, with the aim of synthesizing a language that is impossible for humans to learn.

The study conducted extensive evaluation experiments to evaluate the ability of the GPT-2 small model to learn these "impossible languages", and conducted these evaluations at different stages throughout the training to compare the learning process of each language. The core finding of the study is that GPT-2 has a hard time learning "impossible languages" compared to English, which challenges the claims of Chomsky et al.

More importantly, we hope that our approach will open up a fruitful line of inquiry, allowing different LLM architectures to be tested on a variety of “impossible languages” to understand how the LLM can be used as a cognitive and typological investigation tool.



Paper 2: Why are Sensitive Functions Hard for Transformers?

  • Author: Michael Hahn, Mark Rofin
  • Institution: Saarland University
  • Paper link: https://arxiv.org/abs/2402.09963

Abstract: Experimental studies have identified a range of learnability biases and limitations of transformers, such as persistent difficulties in learning simple formal languages ​​such as computational PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing representation theories either overestimating or underestimating realistic learning capabilities.

The study demonstrates that under the transformer architecture, the loss landscape is constrained by input space sensitivity: transformers whose outputs are sensitive to many parts of the input string are located at isolated points in parameter space, leading to a low-sensitivity bias in generalization.

The study shows theoretically and experimentally that the theory unifies a wide range of experimental observations about transformer learning capabilities and biases, such as their bias toward low sensitivity and low generalization, and difficulty generalizing across parity lengths. This suggests that understanding the inductive biases of transformers requires studying not only their principled expressive power, but also their loss function landscape.



Paper 3: Deciphering Oracle Bone Language with Diffusion Models

  • Authors: Haisu Guan, Huanxin Yang, Xinyu Wang, Shengwei Han, etc.
  • Institutions: Huazhong University of Science and Technology, University of Adelaide, Anyang Normal University, South China University of Technology
  • Paper link: https://arxiv.org/pdf/2406.00684

Abstract: Oracle Bone Script (OBS), which originated in China's Shang Dynasty about 3,000 years ago, is a cornerstone of language history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a large number of oracle bones remain undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technology has opened up new areas for oracle bone deciphering, challenging traditional NLP methods that rely heavily on large text corpora.

This paper introduces a new method using image generation technology to develop a diffusion model optimized for oracle bone script deciphering, Oracle Bone Script Decipher (OBSD). Using the conditional diffusion strategy, OBSD generates important clues for oracle bone script deciphering and opens up a new direction for AI-assisted analysis of ancient languages. To verify the effectiveness, the researchers conducted extensive experiments on the oracle bone script dataset, and the quantitative results proved the effectiveness of OBSD.



Paper 4: Causal Estimation of Memorisation Profiles

  • 作者:Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel
  • Institutions: University of Cambridge, ETH Zurich
  • Paper link: https://arxiv.org/pdf/2406.04327

Paper Introduction: Understanding memory in language models has practical and social implications, such as studying the training dynamics of models or preventing copyright infringement. Previous studies have defined memory as the causal relationship between "training with an instance" and "the model's ability to predict that instance." This definition relies on a counterfactual: the ability to observe what would happen if the model had not seen that instance. Existing methods have difficulty providing computationally efficient and accurate estimates of such counterfactuals. In addition, these methods typically estimate the memory of the model architecture rather than the memory of a specific model instance.

This paper fills an important gap by proposing a new, principled, and efficient approach to estimating memory based on an econometric difference-in-difference design. With this approach, the researchers describe the model's memory profile, i.e., the trend of its memory during training, by observing the model's behavior on only a small subset of instances throughout the entire training process. In experiments using the Pythia model suite, they find that memory (i) is stronger and more persistent in larger models, (ii) is determined by the data order and learning rate, and (iii) has a stable trend across model sizes, such that memory in larger models can be predicted from smaller models.



Paper 5: Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

  • Author: Ahmet Üstün, Viraat Aryabumi, Zheng Xin Yong, Wei-Yin Ko, etc.
  • Institutions: Cohere, Brown University, etc.
  • Paper link: https://arxiv.org/pdf/2402.07827

Paper Description: Recent breakthroughs in Large Language Models (LLMs) have focused on a small number of data-rich languages. How can the path to breakthroughs be extended beyond other languages? This study introduces Aya, a massively multilingual generative language model that follows instructions for 101 languages, more than 50% of which are considered low-resource. Aya outperforms mT0 and BLOOMZ on most tasks, while covering twice as many languages ​​as mT0 and BLOOMZ.

In addition, the study introduces an extensive new evaluation suite that extends the state-of-the-art in multilingual evaluation to 99 languages. Finally, the study provides a detailed investigation of the optimal fine-tuning mixture composition, data pruning, and model toxicity, bias, and safety.



Paper 6: Semisupervised Neural Proto-Language Reconstruction

  • Author: Liang Lu, Peirong Xie, David R. Mortensen
  • Institution: CMU, University of Southern California
  • Paper link: https://arxiv.org/pdf/2406.05930

Reason for the award: This groundbreaking research aims to semi-automate the task of prototypical language reconstruction in historical linguistics and proposes a new semi-supervised architecture. This method outperforms previous supervised methods by introducing a "prototype-native" reflection process in the "native-prototype" reconstruction. This paper is a good example of how modern computational models (such as neural encoder-decoders) can contribute to linguistics.



Paper 7: Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models (unpublished)

  • 作者:Tharindu Madusanka、Ian Pratt-Hartmann、Riza Batista-Navarro

Reason for winning: This paper clearly describes a synthetic evaluation dataset for logical reasoning. This is a good complement to the large number of reasoning datasets, where it is not clear what abilities are measured. In theory, there are indeed reasons to expect that some subsets are harder than others, and these expectations are verified in the paper. In each category, the authors pay special attention to extracting those truly challenging cases.

Test of Time Award

The ACL Test of Time Award is given to honor papers that have had a long-lasting impact on the field of natural language processing and computational linguistics. It is divided into two awards, one from 10 years ago (2014) and the other from 25 years ago (1999), with a maximum of two papers awarded each year.



Paper 1: GloVe: Global Vectors for Word Representation

  • 作者:Jeffrey Pennington, Richard Socher, Christopher D. Manning
  • Institution: Stanford University
  • Paper link: https://aclanthology.org/D14-1162.pdf

Paper introduction: Methods for learning vector space representations of words have been successful in capturing fine-grained semantic and syntactic rules using vector arithmetic, but syntactic rules remain opaque. This study analyzes and clarifies what properties the model needs to have in order for syntactic rules to appear in word vectors.

This study proposes a new global log-linear regression model, GloVe, which aims to learn vector representations of words. The model combines the advantages of global matrix decomposition and local context window methods.

GloVe achieves 75% of the best performance on the word analogy task and outperforms related models on the word similarity task and named entity recognition.

Reason for winning: Word embeddings were the cornerstone of deep learning methods for natural language processing (NLP) between 2013 and 2018, and continue to have a significant impact. They not only enhance the performance of NLP tasks, but also have a significant impact on computational semantics, for example on word similarity and analogy. The two most influential word embedding methods are probably skip-gram/CBOW and GloVe. Compared with skip-gram, GloVe was proposed later. Its relative advantage lies in its conceptual simplicity, directly optimizing vector space similarity based on the distributional properties between words, rather than indirectly as a set of parameters from a simplified language modeling perspective.





Paper 2: Measures of Distributional Similarity

  • By Lillian Lee
  • Institution: Cornell University
  • Paper link: https://aclanthology.org/P99-1004.pdf

Paper description: The authors study distributional similarity measures with the goal of improving probability estimates of unseen co-occurrence events. Their contributions are threefold: they empirically compare a wide range of measures; they classify similarity functions based on the information they contain; and they introduce a new function that is superior in evaluating the distribution of latent agents.



Lifetime Achievement Award

ACL's Lifetime Achievement Award was given to Ralph Grishman. Ralph Grishman is a professor in the Department of Computer Science at New York University, focusing on research in the field of natural language processing (NLP). He is the founder of the Proteus Project, which has made significant contributions to information extraction (IE) and promoted the development of this field.



He also developed the Java Extraction Toolkit (JET), a widely used information extraction tool that provides a variety of language analysis components, such as sentence segmentation, named entity tagging, temporal expression tagging and normalization, part-of-speech tagging, partial parsing, and coreference analysis. These components can be combined into pipelines for different applications, either for interactive analysis of single sentences or for batch analysis of entire documents. In addition, JET provides simple tools for document annotation and display, and includes a complete process for the extraction of entities, relations, and events according to the ACE (Automatic Content Extraction) specification.

Professor Grishman's work covers several core issues in NLP and has had a profound impact on modern language processing technology.

35 outstanding papers

  • Paper 1: Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
  • 作者:Zhengxin Zhang, Dan Zhao, Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Qing Li, Yong Jiang, Zhihao Jia
  • Institutions: CMU, Tsinghua University, Pengcheng Laboratory, etc.
  • Paper link: https://arxiv.org/pdf/2401.07159
  • Paper 2: L-Eval: Instituting Standardized Evaluation for Long Context Language Models
  • 作者:Chenxin An, Shansan Gong, Ming Zhong, Xingjian Zhao, Mukai Li, Jun Zhang, Lingpeng Kong, Xipeng Qiu
  • Institutions: Fudan University, University of Hong Kong, University of Illinois at Urbana-Champaign, Shanghai AI Lab
  • Paper link: https://arxiv.org/abs/2307.11088
  • Paper 3: Causal-Guided Active Learning for Debiasing Large Language Models
  • Paper link: https://openreview.net/forum?id=idp_1Q6F-lC
  • Paper 4: CausalGym: Benchmarking causal interpretability methods on linguistic tasks
  • Author: Aryaman Arora, Dan Jurafsky, Christopher Potts
  • Institution: Stanford University
  • Paper link: https://arxiv.org/abs/2402.12560
  • Paper 5: Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
  • 作者:Shangbin Feng, Weijia Shi, Yike Wang, Wenxuan Ding, Vidhisha Balachandran, Yulia Tsvetkov
  • Institutions: University of Washington, University of California, Berkeley, Hong Kong University of Science and Technology, CMU
  • Paper link: https://arxiv.org/abs/2402.00367
  • 论文 6:Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
  • Author: Marco Gaido, Sara Papi, Matteo Negri, Luisa Bentivogli
  • Institution: Bruno Kessler Foundation, Italy
  • Paper link: https://arxiv.org/abs/2402.12025
  • Paper 7: Must NLP be Extractive?
  • By Steven Bird
  • Institution: Charles Darwin University
  • Paper link: https://drive.google.com/file/d/1hvF7_WQrou6CWZydhymYFTYHnd3ZIljV/view
  • Paper 8: IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators
  • Author: Indraneil Paul, Goran Glavaš, Iryna Gurevych
  • Institution: Technical University of Darmstadt, etc.
  • Paper link: https://arxiv.org/abs/2403.03894
  • Paper 9: MultiLegalPile: A 689GB Multilingual Legal Corpus
  • Author: Matthias Stürmer, Veton Matoshi, etc.
  • Institutions: University of Bern, Stanford University, etc.
  • Paper link: https://arxiv.org/pdf/2306.02069
  • 论文 10:PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
  • 作者: Zaibin Zhang 、 Yongting Zhang 、 Lijun Li 、 Hongzhi Gao 、 Lijun Wang 、 Huchuan Lu 、 Feng Zhao 、 Yu Qiao、Jing Shao
  • Institutions: Shanghai Artificial Intelligence Laboratory, Dalian University of Technology, University of Science and Technology of China
  • Paper link: https://arxiv.org/pdf/2401.11880
  • 论文 11:Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation
  • Author: Dongjin Kang, Sunghwan Kim, etc.
  • Institution: Yonsei University, etc.
  • Paper link: https://arxiv.org/pdf/2402.13211
  • 论文 12:Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
  • Author: Paul Röttger, Valentin Hofmann, et al.
  • Institutions: Bocconi University, Allen Institute for Artificial Intelligence, etc.
  • Paper link: https://arxiv.org/pdf/2402.16786
  • Paper 13: Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
  • By: Mosh Levy, Alon Jacoby, Yoav Goldberg
  • Institution: Bar-Ilan University, Allen Institute for Artificial Intelligence
  • Paper link: https://arxiv.org/pdf/2402.14848
  • Paper 14: Do Llamas Work in English? On the Latent Language of Multilingual Transformers
  • Author: Chris Wendler, Veniamin Veselovsky, etc.
  • Institution: EPFL
  • Paper link: https://arxiv.org/pdf/2402.10588
  • Paper 15: Getting Serious about Humor: Crafting Humor Datasets with Unfunny Large Language Models
  • Author: Zachary Horvitz, Jingru Chen, etc.
  • Institutions: Columbia University, EPFL
  • Paper link: https://arxiv.org/pdf/2403.00794
  • Paper 16: Estimating the Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets
  • Author: Amr Keleg, Walid Magdy, Sharon Goldwater
  • Institution: University of Edinburgh
  • Paper link: https://arxiv.org/pdf/2405.11282
  • Paper 17: G-DlG: Towards Gradient-based Dlverse and hiGh-quality Instruction Data Selection for Machine Translation
  • 作者:Xingyuan Pan, Luyang Huang, Liyan Kang, Zhicheng Liu, Yu Lu, Shanbo Cheng
  • Agency: ByteDance Research
  • Paper link: https://arxiv.org/pdf/2405.12915
  • Paper 18: Media Framing: A typology and Survey of Computational Approaches Across Disciplines
  • Author: Yulia Otmakhova, Shima Khanehzar, Lea Frermann
  • Paper link: https://openreview.net/pdf?id=9AV_zM56pwj
  • Paper 19: SPZ: A Semantic Perturbation-based Data Augmentation Method with Zonal-Mixing for Alzheimer's Disease Detection
  • Author: FangFang Li, Cheng Huang, PuZhen Su, Jie Yin
  • Paper 20: Greed is All You Need: An Evaluation of Tokenizer Inference Methods
  • Institutions: Ben-Gurion University of the Negev, Massachusetts Institute of Technology
  • Author: Omri Uzan, Craig W.Schmidt, Chris Tanner, Yuval Pinter
  • Paper link: https://arxiv.org/abs/2403.01289
  • 论文 21:Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn't
  • Institution: University of Notre Dame (USA)
  • Author: Chihiro Taquchi, David Chiang
  • Paper link: https://arxiv.org/abs/2406.09202
  • Paper 22: Steering Llama 2 via Contrastive Activation Addition
  • Institutions: Anthropic, Harvard University, University of Göttingen (Germany), Center for Human-Compatible AI
  • 作者:Nina Rimsky、Nick Gabrieli、Julian Schulz、Meg Tong、Evan J Hubinger、Alexander Matt Turner
  • Paper link: https://arxiv.org/abs/2312.06681
  • Paper 23: EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities
  • Institution: Tsinghua University-Shenzhen International Graduate School, Tsinghua University
  • Author: Nian Li, Chen Gao, Mingyu Li, Yong Li, Qingmin Liao
  • Paper link: https://arxiv.org/abs/2310.10436
  • 论文 24:M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
  • Institutions: The Chinese University of Hong Kong, Huawei Noah's Ark Lab, Hong Kong University of Science and Technology
  • 作者:Wai-Chung Kwan、Xingshan Zeng、Yufei Wang、Yusen Sun、Liangyou Li、Lifeng Shang、Qun Liu、Kam-Fai Wong
  • Paper link: https://arxiv.org/abs/2310.19240
  • Paper 25: CHECKWHY: Causal Fact Verification via Argument Structure
  • 作者:Jiasheng Si、Yibo Zhao、Yingjie Zhu、Haiyang Zhu、Wenpeng Lu、Deyu Zhou
  • Paper 26: On Efficient and Statistical Quality Estimation for Data Annotation
  • 作者:Jan-Christoph Klie,Juan Haladjian,Marc Kirchner,Rahul Nair
  • Institutions: UKP Lab, TU Darmstadt, Apple
  • Paper link: https://arxiv.org/pdf/2405.11919
  • Paper 27: Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
  • 作者:Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao
  • Institution: Shanghai Artificial Intelligence Laboratory
  • Paper link: https://arxiv.org/pdf/2402.12343
  • Paper 28: IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
  • Author: Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, etc.
  • Institutions: Nilekani Centre at AI4Bharat, Indian Institute of Technology (Madras), Microsoft, etc.
  • Paper link: https://arxiv.org/pdf/2403.06350
  • Paper 29: MultiPICo: Multilingual Perspectivist lrony Corpus
  • Author: Silvia Casola, Simona Frenda, Soda Marem Lo, Erhan Sezerer, etc.
  • Institutions: University of Turin, aequa-tech, Amazon Development Center (Italy), etc.
  • 论文链接:https://assets.amazon.science/08/83/9b686f424c89b08e8fa0a6e1d020/multipico-multilingual-perspectivist-irony-corpus.pdf
  • Paper 30: MMToM-QA: Multimodal Theory of Mind Question Answering
  • Author: Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, etc.
  • Institutions: New York University, Harvard University, MIT, University of California, San Diego, University of Virginia, Johns Hopkins University
  • Paper link: https://arxiv.org/pdf/2401.08743
  • Paper 31: MAP's not dead yet: Uncovering true language model modes by conditioning away degeneracy
  • Author: Davis Yoshida, Kartik Goyal, Kevin Gimpel
  • Institution: Toyota Technological University-Chicago, Georgia Institute of Technology
  • Paper link: https://arxiv.org/pdf/2311.08817
  • Paper 32: NounAtlas: Filling the Gap in Nominal Semantic Role Labeling
  • Author: Roberto Navigli, Marco Lo Pinto, Pasquale Silvestri, etc.
  • Paper 33: The Earth is Flat because.. lnvestigating LLMs' Belief towards Misinformation via PersuasiveConversation
  • Author: Rongwu Xu, Brian S. Lin, Shujian Yang, Tiangi Zhang, etc.
  • Institutions: Tsinghua University, Shanghai Jiao Tong University, Stanford University, Nanyang Technological University
  • Paper link: https://arxiv.org/pdf/2312.09085
  • Paper 34: Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
  • Author: Se Jin Park, Chae Won Kim, Hyeongseop Rha, Minsu Kim, etc.
  • Institution: Korea Advanced Institute of Science and Technology (KAIST)
  • Paper link: https://arxiv.org/pdf/2406.07867
  • Paper 35: Word Embeddings Are Steers for Language Models
  • 作者:Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek F. Abdelzaher, Heng Ji
  • Institution: University of Illinois at Urbana-Champaign
  • Paper link: https://arxiv.org/pdf/2305.12798

Best Theme Paper Award



Thesis: OLMo: Accelerating the Science of Language Models

  • Author: Dirk Groeneveld, Iz Beltagy, etc.
  • Institutions: Allen Institute for Artificial Intelligence, University of Washington, etc.
  • Paper link: https://arxiv.org/pdf/2402.00838

Citation: This work is an important step toward transparency and reproducibility in the training of large language models, which is something the community needs to make progress (or at least to enable other researchers who are not industry giants to contribute to progress).

Resource Paper Award

3 papers won the Resource Paper Award.

Paper 1: Latxa: An Open Language Model and Evaluation Suite for Basque

Institution: University of the Basque Country, Spain

  • 作者:Julen Etxaniz、Oscar Sainz、Naiara Perez、Itziar Aldabe、German Rigau、Eneko Agirre、Aitor Ormazabal、Mikel Artetxe、Aitor Soroa
  • Link: https://arxiv.org/pdf/2403.20266

Reason for winning: This paper describes in detail the corpus collection and dataset evaluation. Although it is a study related to the Basque language, this methodology can be extended to the construction of large models for other low-resource languages.

Paper 2: Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

  • Institutions: Allen Institute for Artificial Intelligence, University of California, Berkeley, etc.
  • Author: Luca Soldaini, Rodney Kinney, etc.
  • Link: https://arxiv.org/abs/2402.00159

Reason for winning: This paper demonstrates the importance of data management when preparing datasets for training large language models. This provides very valuable insights for a large number of people in the community.

Paper 3: AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

  • Institutions: Stony Brook University, State University of New York, Allen Institute for Artificial Intelligence, etc.
  • Author: Harsh Trivedi, Tushar Khot, etc.
  • Link: https://arxiv.org/abs/2407.18901

Reason for winning: This research is very important and amazing work in building interactive environment simulation and evaluation. It will encourage everyone to produce more hardcore dynamic benchmarks for the community.

Social Impact Award

3 papers won the Social Impact Award.

论文 1:How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

  • Author: Yi Zeng, Hongpeng Lin, Jingwen Zhang, Diyi Yang, etc.
  • Institutions: Virginia Tech, Renmin University of China, University of California, Davis, Stanford University
  • Paper link: https://arxiv.org/pdf/2401.06373

Reason for the award: This paper explores the AI ​​safety topic - jailbreaking, studying a method developed within the social science research field. The research is very interesting and has the potential to have a significant impact on the community.

Paper 2: DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

  • Author: Fahim Faisal, Orevaoghene Ahia, Aarohi Srivastava, Kabir Ahuja, etc.
  • Institutions: George Mason University, University of Washington, University of Notre Dame, RC Athena
  • Paper link: https://arxiv.org/pdf/2403.11009

Reason for the award: Dialect variation is an understudied phenomenon in NLP and AI. However, its study is of great value from a linguistic and social perspective, and has important implications for applications. This paper proposes a very novel benchmark to study this problem in the LLM era.

Paper 3: Having Beer after Prayer? Measuring Cultural Bias in Large LanguageModels

  • Author: Tarek Naous, Michael J. Ryan, Alan Ritter, Wei Xu
  • Institution: Georgia Institute of Technology
  • Paper link: https://arxiv.org/pdf/2305.14456

Reason for the award: This paper demonstrates an important issue in the LLM era: cultural bias. This paper studies the Arabic culture and language environment, and the results show that we need to consider cultural differences when designing LLMs. Therefore, the same study can be replicated in other cultures to generalize and evaluate whether other cultures are also affected by this issue.