To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity

dc.contributor.authorSedova, Anastasiia
dc.contributor.authorLitschko, Robert
dc.contributor.authorFrassinelli, Diego
dc.contributor.authorRoth, Benjamin
dc.contributor.authorPlank, Barbara
dc.date.accessioned2025-02-21T11:01:59Z
dc.date.available2025-02-21T11:01:59Z
dc.date.issued2024
dc.description.abstractOne of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.
dc.description.versionpublisheddeu
dc.identifier.doi10.18653/v1/2024.findings-emnlp.1003
dc.identifier.ppn1917888767
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/72422
dc.language.isoeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subject.ddc400
dc.titleTo Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguityeng
dc.typeINPROCEEDINGS
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Sedova2024Analy-72422,
  title={To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity},
  year={2024},
  doi={10.18653/v1/2024.findings-emnlp.1003},
  isbn={979-8-89176-168-1},
  address={Stroudsburg, PA, USA},
  publisher={Association for Computational Linguistics},
  booktitle={Findings of the Association for Computational Linguistics : EMNLP 2024},
  pages={17203--17217},
  editor={Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
  author={Sedova, Anastasiia and Litschko, Robert and Frassinelli, Diego and Roth, Benjamin and Plank, Barbara}
}
kops.citation.iso690SEDOVA, Anastasiia, Robert LITSCHKO, Diego FRASSINELLI, Benjamin ROTH, Barbara PLANK, 2024. To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, Florida, USA, 12. Nov. 2024 - 16. Nov. 2024. In: AL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003deu
kops.citation.iso690SEDOVA, Anastasiia, Robert LITSCHKO, Diego FRASSINELLI, Benjamin ROTH, Barbara PLANK, 2024. To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, Florida, USA, Nov 12, 2024 - Nov 16, 2024. In: AL-ONAIZAN, Yaser, ed., Mohit BANSAL, ed., Yun-Nung CHEN, ed.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 17203-17217. ISBN 979-8-89176-168-1. Available under: doi: 10.18653/v1/2024.findings-emnlp.1003eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/72422">
    <dcterms:abstract>One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.</dcterms:abstract>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-02-21T11:01:59Z</dc:date>
    <dc:creator>Roth, Benjamin</dc:creator>
    <dc:rights>terms-of-use</dc:rights>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Plank, Barbara</dc:creator>
    <dc:creator>Sedova, Anastasiia</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:contributor>Sedova, Anastasiia</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72422/4/Sedova_2-1psffk7q1e0es9.pdf"/>
    <dc:contributor>Roth, Benjamin</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/72422"/>
    <dc:contributor>Frassinelli, Diego</dc:contributor>
    <dc:creator>Frassinelli, Diego</dc:creator>
    <dcterms:title>To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity</dcterms:title>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:contributor>Litschko, Robert</dc:contributor>
    <dcterms:issued>2024</dcterms:issued>
    <dc:language>eng</dc:language>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-02-21T11:01:59Z</dcterms:available>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72422/4/Sedova_2-1psffk7q1e0es9.pdf"/>
    <dc:creator>Litschko, Robert</dc:creator>
    <dc:contributor>Plank, Barbara</dc:contributor>
  </rdf:Description>
</rdf:RDF>
kops.conferencefieldThe 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 12. Nov. 2024 - 16. Nov. 2024, Miami, Florida, USAdeu
kops.date.conferenceEnd2024-11-16
kops.date.conferenceStart2024-11-12
kops.description.openAccessopenaccessbookpart
kops.flag.knbibliographytrue
kops.identifier.nbnurn:nbn:de:bsz:352-2-1psffk7q1e0es9
kops.location.conferenceMiami, Florida, USA
kops.sourcefieldAL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. <i>Findings of the Association for Computational Linguistics : EMNLP 2024</i>. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003deu
kops.sourcefield.plainAL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003deu
kops.sourcefield.plainAL-ONAIZAN, Yaser, ed., Mohit BANSAL, ed., Yun-Nung CHEN, ed.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 17203-17217. ISBN 979-8-89176-168-1. Available under: doi: 10.18653/v1/2024.findings-emnlp.1003eng
kops.title.conferenceThe 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)
relation.isAuthorOfPublicationbf0689a7-23f2-460a-8abb-42ea30bb2d29
relation.isAuthorOfPublication.latestForDiscoverybf0689a7-23f2-460a-8abb-42ea30bb2d29
source.bibliographicInfo.fromPage17203
source.bibliographicInfo.toPage17217
source.contributor.editorAl-Onaizan, Yaser
source.contributor.editorBansal, Mohit
source.contributor.editorChen, Yun-Nung
source.identifier.isbn979-8-89176-168-1
source.publisherAssociation for Computational Linguistics
source.publisher.locationStroudsburg, PA, USA
source.titleFindings of the Association for Computational Linguistics : EMNLP 2024

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Sedova_2-1psffk7q1e0es9.pdf
Größe:
1.25 MB
Format:
Adobe Portable Document Format
Sedova_2-1psffk7q1e0es9.pdf
Sedova_2-1psffk7q1e0es9.pdfGröße: 1.25 MBDownloads: 105

Lizenzbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
license.txt
Größe:
3.96 KB
Format:
Item-specific license agreed upon to submission
Beschreibung:
license.txt
license.txtGröße: 3.96 KBDownloads: 0