Publikation:

To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity

Lade...
Vorschaubild

Dateien

Sedova_2-1psffk7q1e0es9.pdf
Sedova_2-1psffk7q1e0es9.pdfGröße: 1.25 MBDownloads: 3

Datum

2024

Autor:innen

Sedova, Anastasiia
Litschko, Robert
Roth, Benjamin
Plank, Barbara

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Bookpart
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

AL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003

Zusammenfassung

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

Konferenz

The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 12. Nov. 2024 - 16. Nov. 2024, Miami, Florida, USA
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690SEDOVA, Anastasiia, Robert LITSCHKO, Diego FRASSINELLI, Benjamin ROTH, Barbara PLANK, 2024. To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, Florida, USA, 12. Nov. 2024 - 16. Nov. 2024. In: AL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003
BibTex
@inproceedings{Sedova2024Analy-72422,
  title={To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity},
  year={2024},
  doi={10.18653/v1/2024.findings-emnlp.1003},
  isbn={979-8-89176-168-1},
  address={Stroudsburg, PA, USA},
  publisher={Association for Computational Linguistics},
  booktitle={Findings of the Association for Computational Linguistics : EMNLP 2024},
  pages={17203--17217},
  editor={Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
  author={Sedova, Anastasiia and Litschko, Robert and Frassinelli, Diego and Roth, Benjamin and Plank, Barbara}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/72422">
    <dcterms:abstract>One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.</dcterms:abstract>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-02-21T11:01:59Z</dc:date>
    <dc:creator>Roth, Benjamin</dc:creator>
    <dc:rights>terms-of-use</dc:rights>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Plank, Barbara</dc:creator>
    <dc:creator>Sedova, Anastasiia</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Sedova, Anastasiia</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72422/4/Sedova_2-1psffk7q1e0es9.pdf"/>
    <dc:contributor>Roth, Benjamin</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/72422"/>
    <dc:contributor>Frassinelli, Diego</dc:contributor>
    <dc:creator>Frassinelli, Diego</dc:creator>
    <dcterms:title>To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity</dcterms:title>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:contributor>Litschko, Robert</dc:contributor>
    <dcterms:issued>2024</dcterms:issued>
    <dc:language>eng</dc:language>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-02-21T11:01:59Z</dcterms:available>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72422/4/Sedova_2-1psffk7q1e0es9.pdf"/>
    <dc:creator>Litschko, Robert</dc:creator>
    <dc:contributor>Plank, Barbara</dc:contributor>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Diese Publikation teilen