Publikation:

LEXpander : Applying colexification networks to automated lexicon expansion

Lade...
Vorschaubild

Dateien

DiNatale_2-1r66ziaw9clrb5.PDF
DiNatale_2-1r66ziaw9clrb5.PDFGröße: 1.07 MBDownloads: 26

Datum

2024

Autor:innen

Di Natale, Anna

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Hybrid
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Behavior Research Methods. Springer. 2024, 56(2), S. 952-967. ISSN 1554-351X. eISSN 1554-3528. Verfügbar unter: doi: 10.3758/s13428-023-02063-y

Zusammenfassung

Recent approaches to text analysis from social media and other corpora rely on word lists to detect topics, measure meaning, or to select relevant documents. These lists are often generated by applying computational lexicon expansion methods to small, manually curated sets of seed words. Despite the wide use of this approach, we still lack an exhaustive comparative analysis of the performance of lexicon expansion methods and how they can be improved with additional linguistic data. In this work, we present LEXpander, a method for lexicon expansion that leverages novel data on colexification, i.e., semantic networks connecting words with multiple meanings according to shared senses. We evaluate LEXpander in a benchmark including widely used methods for lexicon expansion based on word embedding models and synonym networks. We find that LEXpander outperforms existing approaches in terms of both precision and the trade-off between precision and recall of generated word lists in a variety of tests. Our benchmark includes several linguistic categories, as words relating to the financial area or to the concept of friendship, and sentiment variables in English and German. We also show that the expanded word lists constitute a high-performing text analysis method in application cases to various English corpora. This way, LEXpander poses a systematic automated solution to expand short lists of words into exhaustive and accurate word lists that can closely approximate word lists generated by experts in psychology and linguistics.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Colexification networks, Lexicon expansion, Text analysis, Word embeddings

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Datensatz
AnnaDiNatale/LEXpander: LEXpander
(V1.0.0, 2022) Di Natale, Anna; Garcia, David

Zitieren

ISO 690DI NATALE, Anna, David GARCIA, 2024. LEXpander : Applying colexification networks to automated lexicon expansion. In: Behavior Research Methods. Springer. 2024, 56(2), S. 952-967. ISSN 1554-351X. eISSN 1554-3528. Verfügbar unter: doi: 10.3758/s13428-023-02063-y
BibTex
@article{DiNatale2024LEXpa-66557,
  title={LEXpander : Applying colexification networks to automated lexicon expansion},
  year={2024},
  doi={10.3758/s13428-023-02063-y},
  number={2},
  volume={56},
  issn={1554-351X},
  journal={Behavior Research Methods},
  pages={952--967},
  author={Di Natale, Anna and Garcia, David}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/66557">
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/66557/1/DiNatale_2-1r66ziaw9clrb5.PDF"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/42"/>
    <dcterms:issued>2024</dcterms:issued>
    <dc:creator>Garcia, David</dc:creator>
    <dcterms:title>LEXpander : Applying colexification networks to automated lexicon expansion</dcterms:title>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/66557/1/DiNatale_2-1r66ziaw9clrb5.PDF"/>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-04-12T12:40:15Z</dcterms:available>
    <dcterms:abstract>Recent approaches to text analysis from social media and other corpora rely on word lists to detect topics, measure meaning, or to select relevant documents. These lists are often generated by applying computational lexicon expansion methods to small, manually curated sets of seed words. Despite the wide use of this approach, we still lack an exhaustive comparative analysis of the performance of lexicon expansion methods and how they can be improved with additional linguistic data. In this work, we present LEXpander, a method for lexicon expansion that leverages novel data on colexification, i.e., semantic networks connecting words with multiple meanings according to shared senses. We evaluate LEXpander in a benchmark including widely used methods for lexicon expansion based on word embedding models and synonym networks. We find that LEXpander outperforms existing approaches in terms of both precision and the trade-off between precision and recall of generated word lists in a variety of tests. Our benchmark includes several linguistic categories, as words relating to the financial area or to the concept of friendship, and sentiment variables in English and German. We also show that the expanded word lists constitute a high-performing text analysis method in application cases to various English corpora. This way, LEXpander poses a systematic automated solution to expand short lists of words into exhaustive and accurate word lists that can closely approximate word lists generated by experts in psychology and linguistics.</dcterms:abstract>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2023-04-12T12:40:15Z</dc:date>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dc:contributor>Garcia, David</dc:contributor>
    <dc:creator>Di Natale, Anna</dc:creator>
    <dc:language>eng</dc:language>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/66557"/>
    <dc:contributor>Di Natale, Anna</dc:contributor>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Ja
Begutachtet
Ja
Link zu Forschungsdaten
Beschreibung der Forschungsdaten
The data and materials for all experiments
Diese Publikation teilen