To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity

Sedova, Anastasiia; Litschko, Robert; Frassinelli, Diego; Roth, Benjamin; Plank, Barbara

doi:10.18653/v1/2024.findings-emnlp.1003

To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity

dc.contributor.author	Sedova, Anastasiia
dc.contributor.author	Litschko, Robert
dc.contributor.author	Frassinelli, Diego
dc.contributor.author	Roth, Benjamin
dc.contributor.author	Plank, Barbara
dc.date.accessioned	2025-02-21T11:01:59Z
dc.date.available	2025-02-21T11:01:59Z
dc.date.issued	2024
dc.description.abstract	One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.
dc.description.version	published	deu
dc.identifier.doi	10.18653/v1/2024.findings-emnlp.1003
dc.identifier.ppn	1917888767
dc.identifier.uri	https://kops.uni-konstanz.de/handle/123456789/72422
dc.language.iso	eng
dc.rights	terms-of-use
dc.rights.uri	https://rightsstatements.org/page/InC/1.0/
dc.subject.ddc	400
dc.title	To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity	eng
dc.type	INPROCEEDINGS
dspace.entity.type	Publication
kops.citation.bibtex	@inproceedings{Sedova2024Analy-72422, title={To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity}, year={2024}, doi={10.18653/v1/2024.findings-emnlp.1003}, isbn={979-8-89176-168-1}, address={Stroudsburg, PA, USA}, publisher={Association for Computational Linguistics}, booktitle={Findings of the Association for Computational Linguistics : EMNLP 2024}, pages={17203--17217}, editor={Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung}, author={Sedova, Anastasiia and Litschko, Robert and Frassinelli, Diego and Roth, Benjamin and Plank, Barbara} }
kops.citation.iso690	SEDOVA, Anastasiia, Robert LITSCHKO, Diego FRASSINELLI, Benjamin ROTH, Barbara PLANK, 2024. To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, Florida, USA, 12. Nov. 2024 - 16. Nov. 2024. In: AL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003	deu
kops.citation.iso690	SEDOVA, Anastasiia, Robert LITSCHKO, Diego FRASSINELLI, Benjamin ROTH, Barbara PLANK, 2024. To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP). Miami, Florida, USA, Nov 12, 2024 - Nov 16, 2024. In: AL-ONAIZAN, Yaser, ed., Mohit BANSAL, ed., Yun-Nung CHEN, ed.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 17203-17217. ISBN 979-8-89176-168-1. Available under: doi: 10.18653/v1/2024.findings-emnlp.1003	eng
kops.citation.rdf	<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/72422"> <dcterms:abstract>One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.</dcterms:abstract> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-02-21T11:01:59Z</dc:date> <dc:creator>Roth, Benjamin</dc:creator> <dc:rights>terms-of-use</dc:rights> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:creator>Plank, Barbara</dc:creator> <dc:creator>Sedova, Anastasiia</dc:creator> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:contributor>Sedova, Anastasiia</dc:contributor> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72422/4/Sedova_2-1psffk7q1e0es9.pdf"/> <dc:contributor>Roth, Benjamin</dc:contributor> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/72422"/> <dc:contributor>Frassinelli, Diego</dc:contributor> <dc:creator>Frassinelli, Diego</dc:creator> <dcterms:title>To Know or Not To Know? : Analyzing Self-Consistency of Large Language Models under Ambiguity</dcterms:title> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/> <dc:contributor>Litschko, Robert</dc:contributor> <dcterms:issued>2024</dcterms:issued> <dc:language>eng</dc:language> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-02-21T11:01:59Z</dcterms:available> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72422/4/Sedova_2-1psffk7q1e0es9.pdf"/> <dc:creator>Litschko, Robert</dc:creator> <dc:contributor>Plank, Barbara</dc:contributor> </rdf:Description> </rdf:RDF>
kops.conferencefield	The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 12. Nov. 2024 - 16. Nov. 2024, Miami, Florida, USA	deu
kops.date.conferenceEnd	2024-11-16
kops.date.conferenceStart	2024-11-12
kops.description.openAccess	openaccessbookpart
kops.flag.knbibliography	true
kops.identifier.nbn	urn:nbn:de:bsz:352-2-1psffk7q1e0es9
kops.location.conference	Miami, Florida, USA
kops.sourcefield	AL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. <i>Findings of the Association for Computational Linguistics : EMNLP 2024</i>. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003	deu
kops.sourcefield.plain	AL-ONAIZAN, Yaser, Hrsg., Mohit BANSAL, Hrsg., Yun-Nung CHEN, Hrsg.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, S. 17203-17217. ISBN 979-8-89176-168-1. Verfügbar unter: doi: 10.18653/v1/2024.findings-emnlp.1003	deu
kops.sourcefield.plain	AL-ONAIZAN, Yaser, ed., Mohit BANSAL, ed., Yun-Nung CHEN, ed.. Findings of the Association for Computational Linguistics : EMNLP 2024. Stroudsburg, PA, USA: Association for Computational Linguistics, 2024, pp. 17203-17217. ISBN 979-8-89176-168-1. Available under: doi: 10.18653/v1/2024.findings-emnlp.1003	eng
kops.title.conference	The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)
relation.isAuthorOfPublication	bf0689a7-23f2-460a-8abb-42ea30bb2d29
relation.isAuthorOfPublication.latestForDiscovery	bf0689a7-23f2-460a-8abb-42ea30bb2d29
source.bibliographicInfo.fromPage	17203
source.bibliographicInfo.toPage	17217
source.contributor.editor	Al-Onaizan, Yaser
source.contributor.editor	Bansal, Mohit
source.contributor.editor	Chen, Yun-Nung
source.identifier.isbn	979-8-89176-168-1
source.publisher	Association for Computational Linguistics
source.publisher.location	Stroudsburg, PA, USA
source.title	Findings of the Association for Computational Linguistics : EMNLP 2024

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1

Name:: Sedova_2-1psffk7q1e0es9.pdf
Größe:: 1.25 MB
Format:: Adobe Portable Document Format

Sedova_2-1psffk7q1e0es9.pdfGröße: 1.25 MBDownloads: 105

Lizenzbündel

Gerade angezeigt 1 - 1 von 1

Name:: license.txt
Größe:: 3.96 KB
Format:: Item-specific license agreed upon to submission
Beschreibung:

license.txtGröße: 3.96 KBDownloads: 0

Sammlungen

Linguistik: Publikationen