Evaluation of header metadata extraction approaches and tools for scientific PDF documents

dc.contributor.authorLipinski, Mario
dc.contributor.authorYao, Kevin
dc.contributor.authorBreitinger, Corinna
dc.contributor.authorBeel, Joeran
dc.contributor.authorGipp, Bela
dc.date.accessioned2015-05-13T09:26:12Z
dc.date.available2015-05-13T09:26:12Z
dc.date.issued2013eng
dc.description.abstractThis paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted.eng
dc.description.versionpublished
dc.identifier.doi10.1145/2467696.2467753eng
dc.identifier.ppn444399178
dc.identifier.urihttp://kops.uni-konstanz.de/handle/123456789/30950
dc.language.isoengeng
dc.rightsterms-of-use
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/
dc.subjectInformation Retrieval, Metadata Extraction, Evaluation, PDFeng
dc.subject.ddc004eng
dc.titleEvaluation of header metadata extraction approaches and tools for scientific PDF documentseng
dc.typeINPROCEEDINGSeng
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Lipinski2013Evalu-30950,
  year={2013},
  doi={10.1145/2467696.2467753},
  title={Evaluation of header metadata extraction approaches and tools for scientific PDF documents},
  isbn={978-1-4503-2077-1},
  publisher={ACM},
  address={New York},
  booktitle={Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries},
  pages={385--386},
  editor={J. Stephen Downie},
  author={Lipinski, Mario and Yao, Kevin and Breitinger, Corinna and Beel, Joeran and Gipp, Bela}
}
kops.citation.iso690LIPINSKI, Mario, Kevin YAO, Corinna BREITINGER, Joeran BEEL, Bela GIPP, 2013. Evaluation of header metadata extraction approaches and tools for scientific PDF documents. JCDL '13. Indianapolis, 22. Juli 2013 - 26. Juli 2013. In: J. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753deu
kops.citation.iso690LIPINSKI, Mario, Kevin YAO, Corinna BREITINGER, Joeran BEEL, Bela GIPP, 2013. Evaluation of header metadata extraction approaches and tools for scientific PDF documents. JCDL '13. Indianapolis, Jul 22, 2013 - Jul 26, 2013. In: J. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/30950">
    <dc:contributor>Yao, Kevin</dc:contributor>
    <dcterms:abstract xml:lang="eng">This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted.</dcterms:abstract>
    <dcterms:issued>2013</dcterms:issued>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30950/1/Lipinski_0-285622.pdf"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-13T09:26:12Z</dcterms:available>
    <dc:contributor>Beel, Joeran</dc:contributor>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:title>Evaluation of header metadata extraction approaches and tools for scientific PDF documents</dcterms:title>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30950"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-13T09:26:12Z</dc:date>
    <dc:contributor>Gipp, Bela</dc:contributor>
    <dc:contributor>Breitinger, Corinna</dc:contributor>
    <dc:contributor>Lipinski, Mario</dc:contributor>
    <dc:creator>Breitinger, Corinna</dc:creator>
    <dc:creator>Gipp, Bela</dc:creator>
    <dc:language>eng</dc:language>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30950/1/Lipinski_0-285622.pdf"/>
    <dc:creator>Beel, Joeran</dc:creator>
    <dc:creator>Yao, Kevin</dc:creator>
    <dc:creator>Lipinski, Mario</dc:creator>
  </rdf:Description>
</rdf:RDF>
kops.conferencefieldJCDL '13, 22. Juli 2013 - 26. Juli 2013, Indianapolisdeu
kops.date.conferenceEnd2013-07-26eng
kops.date.conferenceStart2013-07-22eng
kops.description.openAccessopenaccessgreen
kops.flag.knbibliographyfalse
kops.identifier.nbnurn:nbn:de:bsz:352-0-285622
kops.location.conferenceIndianapoliseng
kops.sourcefieldJ. STEPHEN DOWNIE, , ed.. <i>Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries</i>. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753deu
kops.sourcefield.plainJ. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753deu
kops.sourcefield.plainJ. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753eng
kops.title.conferenceJCDL '13eng
relation.isAuthorOfPublicationebdceabd-fdd9-44b2-b7bb-57ebea6b5574
relation.isAuthorOfPublication358ad52f-dab7-4582-bf8e-8adcf477a2d4
relation.isAuthorOfPublication.latestForDiscoveryebdceabd-fdd9-44b2-b7bb-57ebea6b5574
source.bibliographicInfo.fromPage385eng
source.bibliographicInfo.toPage386eng
source.contributor.editorJ. Stephen Downieeng
source.identifier.isbn978-1-4503-2077-1eng
source.publisherACMeng
source.publisher.locationNew Yorkeng
source.titleProceedings of the 13th ACM/IEEE-CS joint conference on Digital librarieseng
temp.internal.duplicates<p>Keine Dubletten gefunden. Letzte Überprüfung: 08.04.2015 10:06:41</p>deu

Dateien

Originalbündel

Gerade angezeigt 1 - 1 von 1
Vorschaubild nicht verfügbar
Name:
Lipinski_0-285622.pdf
Größe:
690.58 KB
Format:
Adobe Portable Document Format
Beschreibung:
Lipinski_0-285622.pdf
Lipinski_0-285622.pdfGröße: 690.58 KBDownloads: 1490