KOPS - Das Institutionelle Repositorium der Universität Konstanz

SciPlore Xtract : Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)

SciPlore Xtract : Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)

Zitieren

Dateien zu dieser Ressource

Prüfsumme: MD5:a45e424798d147f0a797a2b30ac1f772

BEEL, Jöran, Bela GIPP, Ammar SHAKER, Nick FRIEDRICH, 2010. SciPlore Xtract : Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size). ECDL 2010. Glasgow, 6. Sep 2010 - 10. Sep 2010. In: MOUNIA LALMAS, , ed. and others. Research and advanced technology for digital libraries :14th European Conference, ECDL 2010, Glasgow, UK, September 6 - 10, 2010; proceedings. ECDL 2010. Glasgow, 6. Sep 2010 - 10. Sep 2010. Berlin [u.a.]:Springer, pp. 413-416. ISBN 978-3-642-15463-8

@inproceedings{Beel2010SciPl-30892, title={SciPlore Xtract : Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)}, year={2010}, doi={10.1007/978-3-642-15464-5_45}, number={6273}, isbn={978-3-642-15463-8}, address={Berlin [u.a.]}, publisher={Springer}, series={Lecture Notes in Computer Science}, booktitle={Research and advanced technology for digital libraries :14th European Conference, ECDL 2010, Glasgow, UK, September 6 - 10, 2010; proceedings}, pages={413--416}, editor={Mounia Lalmas}, author={Beel, Jöran and Gipp, Bela and Shaker, Ammar and Friedrich, Nick} }

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/rdf/resource/123456789/30892"> <dc:contributor>Gipp, Bela</dc:contributor> <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30892"/> <dc:creator>Friedrich, Nick</dc:creator> <dc:creator>Beel, Jöran</dc:creator> <dc:contributor>Friedrich, Nick</dc:contributor> <dcterms:rights rdf:resource="http://nbn-resolving.de/urn:nbn:de:bsz:352-20150305140228786-3747162-5"/> <dc:creator>Gipp, Bela</dc:creator> <dc:creator>Shaker, Ammar</dc:creator> <dcterms:title>SciPlore Xtract : Extracting Titles from Scientific PDF Documents by Analyzing Style Information (Font Size)</dcterms:title> <dc:contributor>Beel, Jöran</dc:contributor> <dc:contributor>Shaker, Ammar</dc:contributor> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-06T09:18:48Z</dc:date> <dcterms:issued>2010</dcterms:issued> <dc:language>eng</dc:language> <dcterms:abstract xml:lang="eng">Extracting titles from a PDF’s full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating power) machine learning algorithms such as Support Vector Machines and Conditional Random Fields. In this paper we present a simple rule based heuristic, which considers style information (font size) to identify a PDF’s title. In a first experiment we show that this heuristic delivers better results (77.9% accuracy) than a support vector machine by CiteSeer (69.4% accuracy) in an ‘academic search engine’ scenario and better run times (8:19 minutes vs. 57:26 minutes).</dcterms:abstract> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-05-06T09:18:48Z</dcterms:available> </rdf:Description> </rdf:RDF>

Dateiabrufe seit 06.05.2015 (Informationen über die Zugriffsstatistik)

Beel_0-285747.pdf 137

Das Dokument erscheint in:

KOPS Suche


Stöbern

Mein Benutzerkonto