Publikation:

Word Embeddings for Entity-Annotated Texts

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2019

Autor:innen

Almasian, Satya
Gertz, Michael

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

AZZOPARDI, Leif, ed., Benno STEIN, ed., Norberg FUHR, ed. and others. Advances in information retrieval : 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I. Cham: Springer, 2019, pp. 307-322. Lecture Notes in Computer Science. 11437. ISSN 0302-9743. eISSN 1611-3349. ISBN 978-3-030-15711-1. Available under: doi: 10.1007/978-3-030-15712-8_20

Zusammenfassung

Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naïvely applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Word embeddings; Entity embeddings; Entity graph

Konferenz

Advances in Information Retrieval : 41st European Conference on IR Research, ECIR 2019, 14. Apr. 2019 - 18. Apr. 2019, Cologne, Germany
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690ALMASIAN, Satya, Andreas SPITZ, Michael GERTZ, 2019. Word Embeddings for Entity-Annotated Texts. Advances in Information Retrieval : 41st European Conference on IR Research, ECIR 2019. Cologne, Germany, 14. Apr. 2019 - 18. Apr. 2019. In: AZZOPARDI, Leif, ed., Benno STEIN, ed., Norberg FUHR, ed. and others. Advances in information retrieval : 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I. Cham: Springer, 2019, pp. 307-322. Lecture Notes in Computer Science. 11437. ISSN 0302-9743. eISSN 1611-3349. ISBN 978-3-030-15711-1. Available under: doi: 10.1007/978-3-030-15712-8_20
BibTex
@inproceedings{Almasian2019Embed-53927,
  year={2019},
  doi={10.1007/978-3-030-15712-8_20},
  title={Word Embeddings for Entity-Annotated Texts},
  number={11437},
  isbn={978-3-030-15711-1},
  issn={0302-9743},
  publisher={Springer},
  address={Cham},
  series={Lecture Notes in Computer Science},
  booktitle={Advances in information retrieval : 41st European Conference on IR Research, ECIR 2019, Proceedings, Part I},
  pages={307--322},
  editor={Azzopardi, Leif and Stein, Benno and Fuhr, Norberg},
  author={Almasian, Satya and Spitz, Andreas and Gertz, Michael}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/53927">
    <dc:creator>Almasian, Satya</dc:creator>
    <dc:contributor>Gertz, Michael</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:rights>terms-of-use</dc:rights>
    <dc:creator>Gertz, Michael</dc:creator>
    <dcterms:issued>2019</dcterms:issued>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-09T11:21:03Z</dc:date>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Almasian, Satya</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:creator>Spitz, Andreas</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-09T11:21:03Z</dcterms:available>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/53927"/>
    <dcterms:title>Word Embeddings for Entity-Annotated Texts</dcterms:title>
    <dc:contributor>Spitz, Andreas</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:language>eng</dc:language>
    <dcterms:abstract xml:lang="eng">Learned vector representations of words are useful tools for many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naïvely applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw-text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.</dcterms:abstract>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen