Publikation: Quotebank : A Corpus of Quotations from a Decade of News
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
We present Quotebank, an open corpus of 178 million quotations attributed to the speakers who uttered them, extracted from 162 million English news articles published between 2008 and 2020. In order to produce this Web-scale corpus, while at the same time benefiting from the performance of modern neural models, we introduce Quobert, a minimally supervised framework for extracting and attributing quotations from massive corpora. Quobert avoids the necessity of manually labeled input and instead exploits the redundancy of the corpus by bootstrapping from a single seed pattern to extract training data for fine-tuning a BERT-based model. Quobert is language- and corpus agnostic and correctly attributes 86.9% of quotations in our experiments. Quotebank and Quobert are publicly available at https://doi.org/10.5281/zenodo.4277311.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
VAUCHER, Timoté, Andreas SPITZ, Michele CATASTA, Robert WEST, 2021. Quotebank : A Corpus of Quotations from a Decade of News. WSDM '21 : The Fourteenth ACM International Conference on Web Search and Data Mining (Virtual Event), 8. März 2021 - 12. März 2021. In: LEWIN-EYTAN, Liane, ed., David CARMEL, ed., Elad YOM-TOV, ed. and others. WSDM '21 : Proceedings of the 14th ACM International Conference on Web Search and Data Mining. New York, NY: ACM, 2021, pp. 328-336. ISBN 978-1-4503-8297-7. Available under: doi: 10.1145/3437963.3441760BibTex
@inproceedings{Vaucher2021Quote-53926, year={2021}, doi={10.1145/3437963.3441760}, title={Quotebank : A Corpus of Quotations from a Decade of News}, isbn={978-1-4503-8297-7}, publisher={ACM}, address={New York, NY}, booktitle={WSDM '21 : Proceedings of the 14th ACM International Conference on Web Search and Data Mining}, pages={328--336}, editor={Lewin-Eytan, Liane and Carmel, David and Yom-Tov, Elad}, author={Vaucher, Timoté and Spitz, Andreas and Catasta, Michele and West, Robert} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/53926"> <dcterms:title>Quotebank : A Corpus of Quotations from a Decade of News</dcterms:title> <dc:creator>Catasta, Michele</dc:creator> <dcterms:issued>2021</dcterms:issued> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-09T11:00:10Z</dc:date> <dc:contributor>West, Robert</dc:contributor> <dc:creator>West, Robert</dc:creator> <dcterms:abstract xml:lang="eng">We present Quotebank, an open corpus of 178 million quotations attributed to the speakers who uttered them, extracted from 162 million English news articles published between 2008 and 2020. In order to produce this Web-scale corpus, while at the same time benefiting from the performance of modern neural models, we introduce Quobert, a minimally supervised framework for extracting and attributing quotations from massive corpora. Quobert avoids the necessity of manually labeled input and instead exploits the redundancy of the corpus by bootstrapping from a single seed pattern to extract training data for fine-tuning a BERT-based model. Quobert is language- and corpus agnostic and correctly attributes 86.9% of quotations in our experiments. Quotebank and Quobert are publicly available at https://doi.org/10.5281/zenodo.4277311.</dcterms:abstract> <dc:creator>Spitz, Andreas</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-09T11:00:10Z</dcterms:available> <dc:rights>terms-of-use</dc:rights> <dc:contributor>Vaucher, Timoté</dc:contributor> <dc:language>eng</dc:language> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Catasta, Michele</dc:contributor> <dc:creator>Vaucher, Timoté</dc:creator> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:contributor>Spitz, Andreas</dc:contributor> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/53926"/> </rdf:Description> </rdf:RDF>