2024-08-15
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Machine Heart Report
Synced Editorial Department
At this ACL conference, contributors gained a lot.
The six-day ACL 2024 is being held in Bangkok, Thailand.
ACL is a top international conference in the field of computational linguistics and natural language processing, organized by the Association for Computational Linguistics and held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference.
This year's ACL conference is the 62nd, and received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper awards. This time, there were 7 best paper awards (two unpublished), 1 best theme paper award, and 35 outstanding paper awards.
The conference also awarded three Resource Award papers, three Social Impact Award papers, and two Test of Time Awards.
In addition, the Lifetime Achievement Award of this conference was presented to Ralph Grishman, professor of computer science at New York University.
The following is the specific award information.
Best Paper
Paper 1: Mission: Impossible Language Models
Abstract: Chomsky et al. argue that large language models (LLMs) are equally capable of learning languages that humans may or may not learn. However, there is little published experimental evidence to support this claim.
The study developed a set of synthetic languages of varying complexity, each designed by systematically altering English data using unnatural word orders and grammatical rules, with the aim of synthesizing a language that is impossible for humans to learn.
The study conducted extensive evaluation experiments to evaluate the ability of the GPT-2 small model to learn these "impossible languages", and conducted these evaluations at different stages throughout the training to compare the learning process of each language. The core finding of the study is that GPT-2 has a hard time learning "impossible languages" compared to English, which challenges the claims of Chomsky et al.
More importantly, we hope that our approach will open up a fruitful line of inquiry, allowing different LLM architectures to be tested on a variety of “impossible languages” to understand how the LLM can be used as a cognitive and typological investigation tool.
Paper 2: Why are Sensitive Functions Hard for Transformers?
Abstract: Experimental studies have identified a range of learnability biases and limitations of transformers, such as persistent difficulties in learning simple formal languages such as computational PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing representation theories either overestimating or underestimating realistic learning capabilities.
The study demonstrates that under the transformer architecture, the loss landscape is constrained by input space sensitivity: transformers whose outputs are sensitive to many parts of the input string are located at isolated points in parameter space, leading to a low-sensitivity bias in generalization.
The study shows theoretically and experimentally that the theory unifies a wide range of experimental observations about transformer learning capabilities and biases, such as their bias toward low sensitivity and low generalization, and difficulty generalizing across parity lengths. This suggests that understanding the inductive biases of transformers requires studying not only their principled expressive power, but also their loss function landscape.
Paper 3: Deciphering Oracle Bone Language with Diffusion Models
Abstract: Oracle Bone Script (OBS), which originated in China's Shang Dynasty about 3,000 years ago, is a cornerstone of language history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a large number of oracle bones remain undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technology has opened up new areas for oracle bone deciphering, challenging traditional NLP methods that rely heavily on large text corpora.
This paper introduces a new method using image generation technology to develop a diffusion model optimized for oracle bone script deciphering, Oracle Bone Script Decipher (OBSD). Using the conditional diffusion strategy, OBSD generates important clues for oracle bone script deciphering and opens up a new direction for AI-assisted analysis of ancient languages. To verify the effectiveness, the researchers conducted extensive experiments on the oracle bone script dataset, and the quantitative results proved the effectiveness of OBSD.
Paper 4: Causal Estimation of Memorisation Profiles
Paper Introduction: Understanding memory in language models has practical and social implications, such as studying the training dynamics of models or preventing copyright infringement. Previous studies have defined memory as the causal relationship between "training with an instance" and "the model's ability to predict that instance." This definition relies on a counterfactual: the ability to observe what would happen if the model had not seen that instance. Existing methods have difficulty providing computationally efficient and accurate estimates of such counterfactuals. In addition, these methods typically estimate the memory of the model architecture rather than the memory of a specific model instance.
This paper fills an important gap by proposing a new, principled, and efficient approach to estimating memory based on an econometric difference-in-difference design. With this approach, the researchers describe the model's memory profile, i.e., the trend of its memory during training, by observing the model's behavior on only a small subset of instances throughout the entire training process. In experiments using the Pythia model suite, they find that memory (i) is stronger and more persistent in larger models, (ii) is determined by the data order and learning rate, and (iii) has a stable trend across model sizes, such that memory in larger models can be predicted from smaller models.
Paper 5: Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper Description: Recent breakthroughs in Large Language Models (LLMs) have focused on a small number of data-rich languages. How can the path to breakthroughs be extended beyond other languages? This study introduces Aya, a massively multilingual generative language model that follows instructions for 101 languages, more than 50% of which are considered low-resource. Aya outperforms mT0 and BLOOMZ on most tasks, while covering twice as many languages as mT0 and BLOOMZ.
In addition, the study introduces an extensive new evaluation suite that extends the state-of-the-art in multilingual evaluation to 99 languages. Finally, the study provides a detailed investigation of the optimal fine-tuning mixture composition, data pruning, and model toxicity, bias, and safety.
Paper 6: Semisupervised Neural Proto-Language Reconstruction
Reason for the award: This groundbreaking research aims to semi-automate the task of prototypical language reconstruction in historical linguistics and proposes a new semi-supervised architecture. This method outperforms previous supervised methods by introducing a "prototype-native" reflection process in the "native-prototype" reconstruction. This paper is a good example of how modern computational models (such as neural encoder-decoders) can contribute to linguistics.
Paper 7: Natural Language Satisfiability: Exploring the Problem Distribution and Evaluating Transformer-based Language Models (unpublished)
Reason for winning: This paper clearly describes a synthetic evaluation dataset for logical reasoning. This is a good complement to the large number of reasoning datasets, where it is not clear what abilities are measured. In theory, there are indeed reasons to expect that some subsets are harder than others, and these expectations are verified in the paper. In each category, the authors pay special attention to extracting those truly challenging cases.
Test of Time Award
The ACL Test of Time Award is given to honor papers that have had a long-lasting impact on the field of natural language processing and computational linguistics. It is divided into two awards, one from 10 years ago (2014) and the other from 25 years ago (1999), with a maximum of two papers awarded each year.
Paper 1: GloVe: Global Vectors for Word Representation
Paper introduction: Methods for learning vector space representations of words have been successful in capturing fine-grained semantic and syntactic rules using vector arithmetic, but syntactic rules remain opaque. This study analyzes and clarifies what properties the model needs to have in order for syntactic rules to appear in word vectors.
This study proposes a new global log-linear regression model, GloVe, which aims to learn vector representations of words. The model combines the advantages of global matrix decomposition and local context window methods.
GloVe achieves 75% of the best performance on the word analogy task and outperforms related models on the word similarity task and named entity recognition.
Reason for winning: Word embeddings were the cornerstone of deep learning methods for natural language processing (NLP) between 2013 and 2018, and continue to have a significant impact. They not only enhance the performance of NLP tasks, but also have a significant impact on computational semantics, for example on word similarity and analogy. The two most influential word embedding methods are probably skip-gram/CBOW and GloVe. Compared with skip-gram, GloVe was proposed later. Its relative advantage lies in its conceptual simplicity, directly optimizing vector space similarity based on the distributional properties between words, rather than indirectly as a set of parameters from a simplified language modeling perspective.
Paper 2: Measures of Distributional Similarity
Paper description: The authors study distributional similarity measures with the goal of improving probability estimates of unseen co-occurrence events. Their contributions are threefold: they empirically compare a wide range of measures; they classify similarity functions based on the information they contain; and they introduce a new function that is superior in evaluating the distribution of latent agents.
Lifetime Achievement Award
ACL's Lifetime Achievement Award was given to Ralph Grishman. Ralph Grishman is a professor in the Department of Computer Science at New York University, focusing on research in the field of natural language processing (NLP). He is the founder of the Proteus Project, which has made significant contributions to information extraction (IE) and promoted the development of this field.
He also developed the Java Extraction Toolkit (JET), a widely used information extraction tool that provides a variety of language analysis components, such as sentence segmentation, named entity tagging, temporal expression tagging and normalization, part-of-speech tagging, partial parsing, and coreference analysis. These components can be combined into pipelines for different applications, either for interactive analysis of single sentences or for batch analysis of entire documents. In addition, JET provides simple tools for document annotation and display, and includes a complete process for the extraction of entities, relations, and events according to the ACE (Automatic Content Extraction) specification.
Professor Grishman's work covers several core issues in NLP and has had a profound impact on modern language processing technology.
35 outstanding papers
Best Theme Paper Award
Thesis: OLMo: Accelerating the Science of Language Models
Citation: This work is an important step toward transparency and reproducibility in the training of large language models, which is something the community needs to make progress (or at least to enable other researchers who are not industry giants to contribute to progress).
Resource Paper Award
3 papers won the Resource Paper Award.
Paper 1: Latxa: An Open Language Model and Evaluation Suite for Basque
Institution: University of the Basque Country, Spain
Reason for winning: This paper describes in detail the corpus collection and dataset evaluation. Although it is a study related to the Basque language, this methodology can be extended to the construction of large models for other low-resource languages.
Paper 2: Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Reason for winning: This paper demonstrates the importance of data management when preparing datasets for training large language models. This provides very valuable insights for a large number of people in the community.
Paper 3: AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Reason for winning: This research is very important and amazing work in building interactive environment simulation and evaluation. It will encourage everyone to produce more hardcore dynamic benchmarks for the community.
Social Impact Award
3 papers won the Social Impact Award.
论文 1:How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Reason for the award: This paper explores the AI safety topic - jailbreaking, studying a method developed within the social science research field. The research is very interesting and has the potential to have a significant impact on the community.
Paper 2: DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages
Reason for the award: Dialect variation is an understudied phenomenon in NLP and AI. However, its study is of great value from a linguistic and social perspective, and has important implications for applications. This paper proposes a very novel benchmark to study this problem in the LLM era.
Paper 3: Having Beer after Prayer? Measuring Cultural Bias in Large LanguageModels
Reason for the award: This paper demonstrates an important issue in the LLM era: cultural bias. This paper studies the Arabic culture and language environment, and the results show that we need to consider cultural differences when designing LLMs. Therefore, the same study can be replicated in other cultures to generalize and evaluate whether other cultures are also affected by this issue.