UnImplicit: The Third Workshop on
Understanding Implicit and Underspecified Language

at EACL 2024, Malta, March 21

If you have any questions please email us at unimplicitworkshop -AT- gmail.com

logo

Real language is underspecified, vague, and ambiguous. Indeed, past work (Zipf, 1949; Piantadosi, 2012) has suggested that ambiguity may be an inextricable feature of natural language, resulting from competing communicative pressures. Resolving the meaning of language is a never-ending process of making inferences based on implicit knowledge. For example, we know that ``the girl saw the man with the telescope'' is ambiguous and could refer to two situations, while ``the girl saw the man with the hamburger'' is not, or that ``near'' in ``the house near the airport'' and ``the ant near the crumb'' does not refer to the same distance. Being able to capture this kind of knowledge is central to building systems with a human-like understanding of language, as well as to providing a full account of natural language itself.

While underspecified, ambiguous, and implicit language rarely poses a problem for language speakers, it can challenge even the best models. For example, despite recent major successes in NLP coming from large language models (LLMs), it is not clear that models capture ambiguous language in a human-like fashion (Liu, 2023; Stengel-Eskin, 2023). The same has been argued for multimodal NLP. (Pezzelle, 2023), for example, showed that CLIPScore is sensitive to underspecified captions. Tackling these kinds of linguistic phenomena represents a new frontier in NLP research, enabled by major progress on more clear-cut tasks.

Past work in underspecified language has tackled several directions. Some semantic representations have sought to explicitly represent underspecification (Copestake, 2005; Bos, 2004).

Other work has begun to recognize that perfect annotator agreement is often unrealistic, especially when using categorical labels for tasks like natural language inference (Chen, 2020; Nie, 2020; Pavlick, 2019). This workshop hopes to attract work embracing disagreement between annotators as a source of signal about underspecification and ambiguity.

In order to resolve the meaning of underspecified and ambiguous language, we often employ additional modalities and information acquired through embodied experience (Bisk, 2020). For example ``the girl saw the man with the telescope'' becomes unambiguous if paired with an image of a man holding a telescope. In contrast, NLP typically considers language in isolation, removed from the context in which it is typically found. This workshop will highlight multimodal inputs, especially visual ones, as sources of information for resolving underspecification. These inputs can themselves pose additional challenges, e.g. through ambiguous images or videos (Bhattacharya, 2019; Sanders, 2022).

The goal of the third edition of the workshop is to continue eliciting future progress on processing implicit, underspecified and ambiguous language with a strong focus on annotation ambiguity, multimodality and pragmatics. Similar to the first two editions, we would accept theoretical and practical contributions (long, short and non-archival) on all aspects related to the workshop topic.

We welcome submissions related to, but not limited to, the following topics:

  • Creating corpora or new annotations for underspecified, vague, or ambiguous language
  • Studies of annotator disagreement
  • Methods of resolving underspecification, vagueness, or ambiguity
  • Studies of how multimodal settings interact with underspecification in language
  • Ambiguities in non-linguistic domains, like images or videos
  • Perspectives on the role of vagueness and ambiguity in NLP
    "What's the point?" How relevance shapes language learning and inference in humans and machines.
    logo Humans always use language to achieve some purpose. This talk starts from this central tenet of pragmatics and follows several threads that emerge from it. The first half focuses on humans. I begin with experiments that test how we judge the relevance or utility of a new piece of information. Then, I show how reasoning about utility, formalized in a Rational Speech Acts model, can help us draw inferences about the presuppositions of our conversational partners. The second half focuses on language models. How do these models that see only distributional information learn semantic relations such as entailment? I present theoretical results to support the claim that the key to this feat is the assumption that the underlying distribution is generated by rational pragmatic agents. But while this argument goes through in theory, next-word prediction may not be the fastest or most human-like way to learn language. I end by discussing how ongoing and future work inspired by RLHF aims to introduce notions of communicative success as a new learning signal for training more data-efficient and pragmatic language models.

    Malihe Alikhani

    Northeastern University
    From Ambiguity to Clarity: Navigating Uncertainty in Human-Machine Conversations
    logo This talk delves into the intricacies of uncertainty in human-machine dialogue, mainly focusing on the challenges and solutions related to ambiguities arising from impoverished contextual representations. We examine how linguistically informed context representations can mitigate data-related uncertainty in a deployed dialogue system similar to Alexa. We acknowledge that certain types of data-related uncertainty are unavoidable and investigate the capabilities of modern billion-scale language models in representing this form of uncertainty in conversations. Shifting our focus to epistemic uncertainty arising from misaligned background knowledge between humans and machines, we explore strategies for quantifying and reducing this form of uncertainty. Our discussion encompasses various facets of human-machine convergence, including lexical diversity, question generation, fairness, and pragmatics. By leveraging machine learning theory and cognitive science insights, we aim to quantify epistemic uncertainty and propose algorithms that improve grounding between humans and machines. This exploration sheds light on the theoretical underpinnings of uncertainty in dialogue systems and offers practical solutions for improving human-machine communication.
    Lexical and referential ambiguity in humans and language models
    logo Linguistic ambiguity poses processing challenges to both humans and machines. In this talk, I begin with recent research from my lab aiming to understand why lexical ambiguity exists in the first place. Ambiguity may seem like poor design for a communication system—for instance, it’s almost entirely absent from programming languages. We test and reject a well-known proposed explanation for lexical ambiguity—that homophones exist to increase efficiency for language production. We find that languages are in fact less ambiguous than one would expect by chance, and that if anything the amount and distribution of ambiguity in language benefits comprehenders, rather than producers. Next, I discuss our research on how humans represent ambiguous words, finding that human meaning representations have both categorical aspects as well as continuous ones. This differentiates human lexical representations from those of Large Language Models, which are continuous. And finally I discuss referential ambiguity effects—how humans resolve ambiguous pronouns in context. For instance, when presented with a sentence like “When the steel ball fell on the glass table, it broke.” English speakers tend to judge that the table is the more likely the referent of “it”, likely due to world knowledge. These judgments are not fully explained by predictions of pre-trained Large Language Models, which suggests that humans are likely relying on world knowledge mechanisms not available through learning from distributional statistics of language alone.

    09:30 Opening (Room: Bastion 1)
    9:45 Invited talk (Room: Bastion 1)
    Alex Warstadt
    10:45 Coffee Break
    11:15 In-person poster session (Room: Terrace Suite)
    Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests
    Brielen Madureira, David Schlangen
    More Labels or Cases? Assessing Label Variation in Natural Language Inference
    Cornelia Gruber, Katharina Hechinger, Matthias Assenmacher, Göran Kauermann, Barbara Plank
    Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration
    Xiliang Zhu, Chia-Tien Chang, Shayna Gardiner, David Rossouw, Jonas Robertson
    Assessing the Significance of Encoded Information in Contextualized Representations to Word Sense Disambiguation
    Deniz Ekin Yavas
    Below the Sea (with the Sharks): Probing Textual Features of Implicit Sentiment in a Literary Case-study
    Yuri Bizzoni, Pascale Feldkamp
    Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification
    Géraud Faye, Benjamin Icard, Morgane Casanova, Julien Chanson, François Maine, François Bancilhon, Guillaume Gadek, Guillaume Gravier, Paul Égré
    Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations
    Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank
    Colour Me Uncertain: Representing Vagueness with Probabilistic Semantics
    Kin Chun Cheung, Guy Emerson
    UnarchivalSimilarity-weighted Construction of Contextualized Commonsense Knowledge Graphs for Knowledge-intense Argumentation Tasks
    Moritz Plenz, Juri Opitz, Philipp Heinisch, Philipp Cimiano, Anette Frank
    Are You Serious? Handling Disagreement When Annotating Conspiracy Theory Texts
    Ashley Hemm, Sandra Kübler, Michelle Seelig, John Funchion, Manohar Narayanamurthi, Kamal Premaratne, Daniel Verdear, Stefan Wuchty
    Exploiting Large Language Models and Prompt Engineering Techniques to detect and classify implicit content in Italian political discourse
    Walter Paci
    A Taxonomy of Ambiguity Types for NLP
    Margaret Y. Li, Alisa Liu, Zhaofeng Wu, Noah A. Smith
    12:30 Lunch
    13:45 Oral Presentations
    Taking Action Towards Graceful Interaction: The Effects of Performing Actions on Modelling Policies for Instruction Clarification Requests
    Brielen Madureira, David Schlangen
    Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations
    Siyao Peng, Zihang Sun, Sebastian Loftus, Barbara Plank
    Colour Me Uncertain: Representing Vagueness with Probabilistic Semantics
    Kin Chun Cheung, Guy Emerson
    14:30 Invited talk (Room: Bastion 1)
    Malihe Alikhani
    15:30 Coffee Break
    16:00 Invited talk (Room: Bastion 1)
    Benjamin Bergen
    17:00 (Official) closing (Room: Bastion 1)

  • Creating corpora or new annotations for underspecified, vague, or ambiguous language
  • Studies of annotator disagreement
  • Methods of resolving underspecification, vagueness, or ambiguity
  • Studies of how multimodal settings interact with underspecification in language
  • Ambiguities in non-linguistic domains, like images or videos
  • Perspectives on the role of vagueness and ambiguity in NLP