Part of Speech Based Term Weighting for Information Retrieval

dc.contributor.authorLioma, Christinadeu
dc.contributor.authorBlanco, Roideu
dc.date.accessioned2011-03-23T09:58:43Zdeu
dc.date.available2011-03-23T09:58:43Zdeu
dc.date.issued2009
dc.description.abstractAutomatic language processing tools typically assign to terms so-called weights' corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the POS contexts' in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF & BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.eng
dc.description.versionpublished
dc.identifier.citationPubl. in: Advances in information retrieval: 31th European Conference on IR Research, ECIR 2009, Toulouse, France, April 6 - 9, 2009; proceedings / Mohand Boughanem ... (eds.). (= LNCS ; 5478) Berlin: Springer, 2009, pp. 412-423deu
dc.identifier.doi10.1007/978-3-642-00958-7_37
dc.identifier.urihttp://kops.uni-konstanz.de/handle/123456789/2664
dc.language.isoengdeu
dc.legacy.dateIssued2010deu
dc.rightsterms-of-usedeu
dc.rights.urihttps://rightsstatements.org/page/InC/1.0/deu
dc.subject.ddc400deu
dc.titlePart of Speech Based Term Weighting for Information Retrievaleng
dc.typeINPROCEEDINGSdeu
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Lioma2009Speec-2664,
  year={2009},
  doi={10.1007/978-3-642-00958-7_37},
  title={Part of Speech Based Term Weighting for Information Retrieval},
  number={5478},
  isbn={978-3-642-00957-0},
  publisher={Springer},
  address={Berlin},
  series={Lecture Notes in Computer Science},
  booktitle={Advances in Information Retrieval},
  pages={412--423},
  editor={Boughanem, Mohand and Berrut, Catherine and Mothe, Josiane and Soule-Dupuy, Chantal},
  author={Lioma, Christina and Blanco, Roi}
}
kops.citation.iso690LIOMA, Christina, Roi BLANCO, 2009. Part of Speech Based Term Weighting for Information Retrieval. In: BOUGHANEM, Mohand, ed., Catherine BERRUT, ed., Josiane MOTHE, ed., Chantal SOULE-DUPUY, ed.. Advances in Information Retrieval. Berlin: Springer, 2009, pp. 412-423. Lecture Notes in Computer Science. 5478. ISBN 978-3-642-00957-0. Available under: doi: 10.1007/978-3-642-00958-7_37deu
kops.citation.iso690LIOMA, Christina, Roi BLANCO, 2009. Part of Speech Based Term Weighting for Information Retrieval. In: BOUGHANEM, Mohand, ed., Catherine BERRUT, ed., Josiane MOTHE, ed., Chantal SOULE-DUPUY, ed.. Advances in Information Retrieval. Berlin: Springer, 2009, pp. 412-423. Lecture Notes in Computer Science. 5478. ISBN 978-3-642-00957-0. Available under: doi: 10.1007/978-3-642-00958-7_37eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/2664">
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:contributor>Lioma, Christina</dc:contributor>
    <dc:language>eng</dc:language>
    <dc:rights>terms-of-use</dc:rights>
    <dc:contributor>Blanco, Roi</dc:contributor>
    <dcterms:abstract xml:lang="eng">Automatic language processing tools typically assign to terms so-called weights' corresponding to the contribution of terms to information content. Traditionally, term weights are computed from lexical statistics, e.g., term frequencies. We propose a new type of term weight that is computed from part of speech (POS) n-gram statistics. The proposed POS-based term weight represents how informative a term is in general, based on the POS contexts' in which it generally occurs in language. We suggest five different computations of POS-based term weights by extending existing statistical approximations of term information measures. We apply these POS-based term weights to information retrieval, by integrating them into the model that matches documents to queries. Experiments with two TREC collections and 300 queries, using TF-IDF &amp; BM25 as baselines, show that integrating our POS-based term weights to retrieval always leads to gains (up to +33.7% from the baseline). Additional experiments with a different retrieval model as baseline (Language Model with Dirichlet priors smoothing) and our best performing POS-based term weight, show retrieval gains always and consistently across the whole smoothing range of the baseline.</dcterms:abstract>
    <dcterms:title>Part of Speech Based Term Weighting for Information Retrieval</dcterms:title>
    <dcterms:issued>2009</dcterms:issued>
    <dc:creator>Blanco, Roi</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:bibliographicCitation>Publ. in: Advances in information retrieval: 31th European Conference on IR Research, ECIR 2009, Toulouse, France, April 6 - 9, 2009; proceedings / Mohand Boughanem ... (eds.). (= LNCS ; 5478) Berlin: Springer, 2009, pp. 412-423</dcterms:bibliographicCitation>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/2664"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dc:creator>Lioma, Christina</dc:creator>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-23T09:58:43Z</dc:date>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-23T09:58:43Z</dcterms:available>
  </rdf:Description>
</rdf:RDF>
kops.identifier.nbnurn:nbn:de:bsz:352-opus-110483deu
kops.opus.id11048deu
kops.sourcefieldBOUGHANEM, Mohand, ed., Catherine BERRUT, ed., Josiane MOTHE, ed., Chantal SOULE-DUPUY, ed.. <i>Advances in Information Retrieval</i>. Berlin: Springer, 2009, pp. 412-423. Lecture Notes in Computer Science. 5478. ISBN 978-3-642-00957-0. Available under: doi: 10.1007/978-3-642-00958-7_37deu
kops.sourcefield.plainBOUGHANEM, Mohand, ed., Catherine BERRUT, ed., Josiane MOTHE, ed., Chantal SOULE-DUPUY, ed.. Advances in Information Retrieval. Berlin: Springer, 2009, pp. 412-423. Lecture Notes in Computer Science. 5478. ISBN 978-3-642-00957-0. Available under: doi: 10.1007/978-3-642-00958-7_37deu
kops.sourcefield.plainBOUGHANEM, Mohand, ed., Catherine BERRUT, ed., Josiane MOTHE, ed., Chantal SOULE-DUPUY, ed.. Advances in Information Retrieval. Berlin: Springer, 2009, pp. 412-423. Lecture Notes in Computer Science. 5478. ISBN 978-3-642-00957-0. Available under: doi: 10.1007/978-3-642-00958-7_37eng
source.bibliographicInfo.fromPage412
source.bibliographicInfo.seriesNumber5478
source.bibliographicInfo.toPage423
source.contributor.editorBoughanem, Mohand
source.contributor.editorBerrut, Catherine
source.contributor.editorMothe, Josiane
source.contributor.editorSoule-Dupuy, Chantal
source.identifier.isbn978-3-642-00957-0
source.publisherSpringer
source.publisher.locationBerlin
source.relation.ispartofseriesLecture Notes in Computer Science
source.titleAdvances in Information Retrieval

Dateien