UnImplicit: Understanding Implicit and Underspecified Language

UnImplicit: The Third Workshop on
Understanding Implicit and Underspecified Language

at EACL 2024, Malta, March 21

If you have any questions please email us at unimplicitworkshop -AT- gmail.com

Real language is underspecified, vague, and ambiguous. Indeed, past work (Zipf, 1949; Piantadosi, 2012) has suggested that ambiguity may be an inextricable feature of natural language, resulting from competing communicative pressures. Resolving the meaning of language is a never-ending process of making inferences based on implicit knowledge. For example, we know that ``the girl saw the man with the telescope'' is ambiguous and could refer to two situations, while ``the girl saw the man with the hamburger'' is not, or that ``near'' in ``the house near the airport'' and ``the ant near the crumb'' does not refer to the same distance. Being able to capture this kind of knowledge is central to building systems with a human-like understanding of language, as well as to providing a full account of natural language itself.

While underspecified, ambiguous, and implicit language rarely poses a problem for language speakers, it can challenge even the best models. For example, despite recent major successes in NLP coming from large language models (LLMs), it is not clear that models capture ambiguous language in a human-like fashion (Liu, 2023; Stengel-Eskin, 2023). The same has been argued for multimodal NLP. (Pezzelle, 2023), for example, showed that CLIPScore is sensitive to underspecified captions. Tackling these kinds of linguistic phenomena represents a new frontier in NLP research, enabled by major progress on more clear-cut tasks.

Past work in underspecified language has tackled several directions. Some semantic representations have sought to explicitly represent underspecification (Copestake, 2005; Bos, 2004).

Other work has begun to recognize that perfect annotator agreement is often unrealistic, especially when using categorical labels for tasks like natural language inference (Chen, 2020; Nie, 2020; Pavlick, 2019). This workshop hopes to attract work embracing disagreement between annotators as a source of signal about underspecification and ambiguity.

In order to resolve the meaning of underspecified and ambiguous language, we often employ additional modalities and information acquired through embodied experience (Bisk, 2020). For example ``the girl saw the man with the telescope'' becomes unambiguous if paired with an image of a man holding a telescope. In contrast, NLP typically considers language in isolation, removed from the context in which it is typically found. This workshop will highlight multimodal inputs, especially visual ones, as sources of information for resolving underspecification. These inputs can themselves pose additional challenges, e.g. through ambiguous images or videos (Bhattacharya, 2019; Sanders, 2022).

The goal of the third edition of the workshop is to continue eliciting future progress on processing implicit, underspecified and ambiguous language with a strong focus on annotation ambiguity, multimodality and pragmatics. Similar to the first two editions, we would accept theoretical and practical contributions (long, short and non-archival) on all aspects related to the workshop topic.

We welcome submissions related to, but not limited to, the following topics:

Creating corpora or new annotations for underspecified, vague, or ambiguous language

Studies of annotator disagreement

Methods of resolving underspecification, vagueness, or ambiguity

Studies of how multimodal settings interact with underspecification in language

Ambiguities in non-linguistic domains, like images or videos

Perspectives on the role of vagueness and ambiguity in NLP

Alex Warstadt

ETH

"What's the point?" How relevance shapes language learning and inference in humans and machines.
logo

Humans always use language to achieve some purpose. This talk starts from this central tenet of pragmatics and follows several threads that emerge from it. The first half focuses on humans. I begin with experiments that test how we judge the relevance or utility of a new piece of information. Then, I show how reasoning about utility, formalized in a Rational Speech Acts model, can help us draw inferences about the presuppositions of our conversational partners. The second half focuses on language models. How do these models that see only distributional information learn semantic relations such as entailment? I present theoretical results to support the claim that the key to this feat is the assumption that the underlying distribution is generated by rational pragmatic agents. But while this argument goes through in theory, next-word prediction may not be the fastest or most human-like way to learn language. I end by discussing how ongoing and future work inspired by RLHF aims to introduce notions of communicative success as a new learning signal for training more data-efficient and pragmatic language models.

Malihe Alikhani

Northeastern University

From Ambiguity to Clarity: Navigating Uncertainty in Human-Machine Conversations
logo

This talk delves into the intricacies of uncertainty in human-machine dialogue, mainly focusing on the challenges and solutions related to ambiguities arising from impoverished contextual representations. We examine how linguistically informed context representations can mitigate data-related uncertainty in a deployed dialogue system similar to Alexa. We acknowledge that certain types of data-related uncertainty are unavoidable and investigate the capabilities of modern billion-scale language models in representing this form of uncertainty in conversations. Shifting our focus to epistemic uncertainty arising from misaligned background knowledge between humans and machines, we explore strategies for quantifying and reducing this form of uncertainty. Our discussion encompasses various facets of human-machine convergence, including lexical diversity, question generation, fairness, and pragmatics. By leveraging machine learning theory and cognitive science insights, we aim to quantify epistemic uncertainty and propose algorithms that improve grounding between humans and machines. This exploration sheds light on the theoretical underpinnings of uncertainty in dialogue systems and offers practical solutions for improving human-machine communication.

Benjamin Bergen

UCSD

Lexical and referential ambiguity in humans and language models
logo

Linguistic ambiguity poses processing challenges to both humans and machines. In this talk, I begin with recent research from my lab aiming to understand why lexical ambiguity exists in the first place. Ambiguity may seem like poor design for a communication system—for instance, it’s almost entirely absent from programming languages. We test and reject a well-known proposed explanation for lexical ambiguity—that homophones exist to increase efficiency for language production. We find that languages are in fact less ambiguous than one would expect by chance, and that if anything the amount and distribution of ambiguity in language benefits comprehenders, rather than producers. Next, I discuss our research on how humans represent ambiguous words, finding that human meaning representations have both categorical aspects as well as continuous ones. This differentiates human lexical representations from those of Large Language Models, which are continuous. And finally I discuss referential ambiguity effects—how humans resolve ambiguous pronouns in context. For instance, when presented with a sentence like “When the steel ball fell on the glass table, it broke.” English speakers tend to judge that the table is the more likely the referent of “it”, likely due to world knowledge. These judgments are not fully explained by predictions of pre-trained Large Language Models, which suggests that humans are likely relying on world knowledge mechanisms not available through learning from distributional statistics of language alone.

09:30	Opening (Room: Bastion 1)
9:45	Invited talk (Room: Bastion 1) Alex Warstadt
10:45	Coffee Break
11:15	In-person poster session (Room: Terrace Suite)
	Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests Brielen Madureira, David Schlangen
	More Labels or Cases? Assessing Label Variation in Natural Language Inference Cornelia Gruber, Katharina Hechinger, Matthias Assenmacher, Göran Kauermann, Barbara Plank
	Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration Xiliang Zhu, Chia-Tien Chang, Shayna Gardiner, David Rossouw, Jonas Robertson
	Assessing the Significance of Encoded Information in Contextualized Representations to Word Sense Disambiguation Deniz Ekin Yavas
	Below the Sea (with the Sharks): Probing Textual Features of Implicit Sentiment in a Literary Case-study Yuri Bizzoni, Pascale Feldkamp
	Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification Géraud Faye, Benjamin Icard, Morgane Casanova, Julien Chanson, François Maine, François Bancilhon, Guillaume Gadek, Guillaume Gravier, Paul Égré
	Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank
	Colour Me Uncertain: Representing Vagueness with Probabilistic Semantics Kin Chun Cheung, Guy Emerson
Unarchival	Similarity-weighted Construction of Contextualized Commonsense Knowledge Graphs for Knowledge-intense Argumentation Tasks Moritz Plenz, Juri Opitz, Philipp Heinisch, Philipp Cimiano, Anette Frank
	Are You Serious? Handling Disagreement When Annotating Conspiracy Theory Texts Ashley Hemm, Sandra Kübler, Michelle Seelig, John Funchion, Manohar Narayanamurthi, Kamal Premaratne, Daniel Verdear, Stefan Wuchty
	Exploiting Large Language Models and Prompt Engineering Techniques to detect and classify implicit content in Italian political discourse Walter Paci
	A Taxonomy of Ambiguity Types for NLP Margaret Y. Li, Alisa Liu, Zhaofeng Wu, Noah A. Smith
12:30	Lunch
13:45	Oral Presentations
	Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests Brielen Madureira, David Schlangen
	Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank
	Colour Me Uncertain: Representing Vagueness with Probabilistic Semantics Kin Chun Cheung, Guy Emerson
14:30	Invited talk (Room: Bastion 1) Malihe Alikhani
15:30	Coffee Break
16:00	Invited talk (Room: Bastion 1) Benjamin Bergen
17:00	(Official) closing (Room: Bastion 1)

October 20, 2023: First Call for Workshop Papers
November 15, 2023: Second Call for Workshop Papers
December 11, 2023: Third Call for Workshop Papers
December 22, 2023: Workshop paper due
January 22, 2024: Direct Submission deadline (pre-reviewed ARR & main conference)
January 25, 2024: Notification of Acceptance
February 2, 2024: Camera-ready papers due
February 7, 2024: Submission deadline for Findings papers
February 7, 2024: Proceedings due
March 21, 2024: Workshop Date

We invite two types of submissions:

Archival: long (up to 8 pages) or short (up to 4 pages) papers, with unlimited additional pages for references. These papers should report on complete, original and unpublished research and cannot be under submission elsewhere. If accepted, archival papers will appear in the workshop proceedings.
Non-archival: Extended abstracts (up to 2 pages) or copy of submission/publication, which can take two forms:
- Works in progress, that are not yet mature enough for a full submission. Up to 2 pages, with unlimited pages for references.
- Already published work, or work currently under submission elsewhere, which can be submitted as a copy of the submission/publication (please indicate the venue where it has been submitted to).

There are two possible deadlines for the submissions!:

Standard Submission: This deadline, on the 18th of December 2023, is for papers which do not have any reviews through ARR yet. Papers submitted on this deadline will be reviewed by the workshop PCs.
- All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
- Please submit your papers at https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/UnImplicit
- Please use the EACL style templates.
Direct Submission (with reviews): This submission deadline, on January 22nd 2024, is for papers that are either pre-reviewed by ARR and/or rejected from EACL (with reviews).
- For Findings papers looking for a presentation slot: The submission deadline is February 7th 2024.
- All deadlines are 11:59PM UTC-12:00 ("anywhere on Earth").
- Please submit using the following link: https://openreview.net/group?id=eacl.org/EACL/2024/Workshop/UnImplicit_ARR_Commitment
- Please use the EACL style templates.

If you have any questions please email us at unimplicitworkshop -AT- gmail.com

We welcome submissions related to, but not limited to, the following topics:

Creating corpora or new annotations for underspecified, vague, or ambiguous language

Studies of annotator disagreement

Methods of resolving underspecification, vagueness, or ambiguity

Studies of how multimodal settings interact with underspecification in language

Ambiguities in non-linguistic domains, like images or videos

Perspectives on the role of vagueness and ambiguity in NLP

Organizers

Sandro Pezzelle, University of Amsterdam
Valentina Pyatkin, AI2 and University of Washington
Elias Stengel-Eskin, UNC Chapel Hil
Alisa Liu, University of Washington
Daniel Fried, CMU

Advisory Committee

Michael Roth, Stuttgart University
Reut Tsarfaty, Bar-Ilan University
Yoav Goldberg, Bar-Ilan University and AI2

Program Committee

Sebastian Pado
Vera Demberg
Elior Sulem
Daniel Hershcovich
Sara Tonelli
Nathan Schneider
Yanai Elazar
Chris Potts
Jennifer Hu
Tiago Timponi Torrent
Michael Elhadad
Zhaofeng Wu
Lucy Li
Sofia Serrano
Dmitry Nikolaev
Nan-Jiang Jiang
Julian Michael
Roma Patel
Ece Takmaz
Aida Nematzadeh

UnImplicit: The Third Workshop on
Understanding Implicit and Underspecified Language

at EACL 2024, Malta, March 21

If you have any questions please email us at unimplicitworkshop -AT- gmail.com

Invited Speakers

Alex Warstadt

Malihe Alikhani

Benjamin Bergen

Workshop Program

Important Dates

Submission

Organizers

Organizers

Advisory Committee

Program Committee

UnImplicit: The Third Workshop on Understanding Implicit and Underspecified Language

at EACL 2024, Malta, March 21

If you have any questions please email us at unimplicitworkshop -AT- gmail.com

Invited Speakers

Alex Warstadt

Malihe Alikhani

Benjamin Bergen

Workshop Program

Important Dates

Submission

Organizers

Organizers

Advisory Committee

Program Committee

UnImplicit: The Third Workshop on
Understanding Implicit and Underspecified Language