Evaluation of header metadata extraction approaches and tools for scientific PDF documents
| dc.contributor.author | Lipinski, Mario | |
| dc.contributor.author | Yao, Kevin | |
| dc.contributor.author | Breitinger, Corinna | |
| dc.contributor.author | Beel, Joeran | |
| dc.contributor.author | Gipp, Bela | |
| dc.date.accessioned | 2015-05-13T09:26:12Z | |
| dc.date.available | 2015-05-13T09:26:12Z | |
| dc.date.issued | 2013 | eng |
| dc.description.abstract | This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted. | eng |
| dc.description.version | published | |
| dc.identifier.doi | 10.1145/2467696.2467753 | eng |
| dc.identifier.ppn | 444399178 | |
| dc.identifier.uri | http://kops.uni-konstanz.de/handle/123456789/30950 | |
| dc.language.iso | eng | eng |
| dc.rights | terms-of-use | |
| dc.rights.uri | https://rightsstatements.org/page/InC/1.0/ | |
| dc.subject | Information Retrieval, Metadata Extraction, Evaluation, PDF | eng |
| dc.subject.ddc | 004 | eng |
| dc.title | Evaluation of header metadata extraction approaches and tools for scientific PDF documents | eng |
| dc.type | INPROCEEDINGS | eng |
| dspace.entity.type | Publication | |
| kops.citation.bibtex | @inproceedings{Lipinski2013Evalu-30950,
year={2013},
doi={10.1145/2467696.2467753},
title={Evaluation of header metadata extraction approaches and tools for scientific PDF documents},
isbn={978-1-4503-2077-1},
publisher={ACM},
address={New York},
booktitle={Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries},
pages={385--386},
editor={J. Stephen Downie},
author={Lipinski, Mario and Yao, Kevin and Breitinger, Corinna and Beel, Joeran and Gipp, Bela}
} | |
| kops.citation.iso690 | LIPINSKI, Mario, Kevin YAO, Corinna BREITINGER, Joeran BEEL, Bela GIPP, 2013. Evaluation of header metadata extraction approaches and tools for scientific PDF documents. JCDL '13. Indianapolis, 22. Juli 2013 - 26. Juli 2013. In: J. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753 | deu |
| kops.citation.iso690 | LIPINSKI, Mario, Kevin YAO, Corinna BREITINGER, Joeran BEEL, Bela GIPP, 2013. Evaluation of header metadata extraction approaches and tools for scientific PDF documents. JCDL '13. Indianapolis, Jul 22, 2013 - Jul 26, 2013. In: J. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753 | eng |
| kops.citation.rdf | <rdf:RDF
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:bibo="http://purl.org/ontology/bibo/"
xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:void="http://rdfs.org/ns/void#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#" >
<rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/30950">
<dc:contributor>Yao, Kevin</dc:contributor>
<dcterms:abstract xml:lang="eng">This paper evaluates the performance of tools for the extraction of metadata from scientific articles. Accurate metadata extraction is an important task for automating the management of digital libraries. This comparative study is a guide for developers looking to integrate the most suitable and effective metadata extraction tool into their software. We shed light on the strengths and weaknesses of seven tools in common use. In our evaluation using papers from the arXiv collection, GROBID delivered the best results, followed by Mendeley Desktop. SciPlore Xtract, PDFMeat, and SVMHeaderParse also delivered good results depending on the metadata type to be extracted.</dcterms:abstract>
<dcterms:issued>2013</dcterms:issued>
<dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dc:rights>terms-of-use</dc:rights>
<dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
<dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30950/1/Lipinski_0-285622.pdf"/>
<dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-13T09:26:12Z</dcterms:available>
<dc:contributor>Beel, Joeran</dc:contributor>
<dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
<dcterms:title>Evaluation of header metadata extraction approaches and tools for scientific PDF documents</dcterms:title>
<bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30950"/>
<dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-13T09:26:12Z</dc:date>
<dc:contributor>Gipp, Bela</dc:contributor>
<dc:contributor>Breitinger, Corinna</dc:contributor>
<dc:contributor>Lipinski, Mario</dc:contributor>
<dc:creator>Breitinger, Corinna</dc:creator>
<dc:creator>Gipp, Bela</dc:creator>
<dc:language>eng</dc:language>
<foaf:homepage rdf:resource="http://localhost:8080/"/>
<void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
<dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30950/1/Lipinski_0-285622.pdf"/>
<dc:creator>Beel, Joeran</dc:creator>
<dc:creator>Yao, Kevin</dc:creator>
<dc:creator>Lipinski, Mario</dc:creator>
</rdf:Description>
</rdf:RDF> | |
| kops.conferencefield | JCDL '13, 22. Juli 2013 - 26. Juli 2013, Indianapolis | deu |
| kops.date.conferenceEnd | 2013-07-26 | eng |
| kops.date.conferenceStart | 2013-07-22 | eng |
| kops.description.openAccess | openaccessgreen | |
| kops.flag.knbibliography | false | |
| kops.identifier.nbn | urn:nbn:de:bsz:352-0-285622 | |
| kops.location.conference | Indianapolis | eng |
| kops.sourcefield | J. STEPHEN DOWNIE, , ed.. <i>Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries</i>. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753 | deu |
| kops.sourcefield.plain | J. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753 | deu |
| kops.sourcefield.plain | J. STEPHEN DOWNIE, , ed.. Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries. New York: ACM, 2013, pp. 385-386. ISBN 978-1-4503-2077-1. Available under: doi: 10.1145/2467696.2467753 | eng |
| kops.title.conference | JCDL '13 | eng |
| relation.isAuthorOfPublication | ebdceabd-fdd9-44b2-b7bb-57ebea6b5574 | |
| relation.isAuthorOfPublication | 358ad52f-dab7-4582-bf8e-8adcf477a2d4 | |
| relation.isAuthorOfPublication.latestForDiscovery | ebdceabd-fdd9-44b2-b7bb-57ebea6b5574 | |
| source.bibliographicInfo.fromPage | 385 | eng |
| source.bibliographicInfo.toPage | 386 | eng |
| source.contributor.editor | J. Stephen Downie | eng |
| source.identifier.isbn | 978-1-4503-2077-1 | eng |
| source.publisher | ACM | eng |
| source.publisher.location | New York | eng |
| source.title | Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries | eng |
| temp.internal.duplicates | <p>Keine Dubletten gefunden. Letzte Überprüfung: 08.04.2015 10:06:41</p> | deu |
Dateien
Originalbündel
1 - 1 von 1
Vorschaubild nicht verfügbar
- Name:
- Lipinski_0-285622.pdf
- Größe:
- 690.58 KB
- Format:
- Adobe Portable Document Format
- Beschreibung:
