Publikation:

Mixed monolingual homepage finding in 34 languages : the role of language script and search domain

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2009

Autor:innen

Blanco, Roi
Lioma, Christina

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Information retrieval. 2009, 12(3), pp. 324-351. ISSN 1386-4564. eISSN 1573-7659. Available under: doi: 10.1007/s10791-008-9082-8

Zusammenfassung

The information that is available or sought on the World Wide Web (Web) is increasingly multilingual. Information Retrieval systems, such as the freely available search engines on the Web, need to provide fair and equal access to this information, regardless of the language in which a query is written or where the query is posted from. In this work, we ask two questions: How do existing state of the art search engines deal with languages written in different alphabets (scripts)? Do local language-based search domains actually facilitate access to information? We conduct a thorough study on the effect of multilingual queries for homepage finding, where the aim of the retrieval system is to return only one document, namely the homepage described in the query. We evaluate the effect of multilingual queries in retrieval performance with regard to (i) the alphabet in which the queries are written (e.g., Latin, Russian, Arabic), and (ii) the language domain where the queries are posted (e.g., google.com, google.fr). We query four major freely available search engines with 764 queries in 34 different languages, and look for the correct homepage in the top retrieved results. In order to have fair multilingual experimental settings, we use an ontology that is comparable across languages and also representative of realistic Web searches: football premier leagues in different countries; the official team name represents our query, and the official team homepage represents the document to be retrieved. A series of thorough experiments involving over 10,000 runs, with queries both in their correct and in Latin characters, and also using both global-domain and local-domain searches, reveal that queries issued in the correct script of a language are more likely to be found and ranked in the top 3, while queries in non-Latin script languages which are however issued in Latin script are less likely to be found; also, queries issued to the correct local domain of a search engine, e.g., French queries to yahoo.fr, are likely to have better retrieval performance than queries issued to the global domain of a search engine. To our knowledge, this is the first Webretrieval study that uses such a wide range of languages.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
400 Sprachwissenschaft, Linguistik

Schlagwörter

multilingual information retrieval, web information retrieval, search engines and evaluation

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690BLANCO, Roi, Christina LIOMA, 2009. Mixed monolingual homepage finding in 34 languages : the role of language script and search domain. In: Information retrieval. 2009, 12(3), pp. 324-351. ISSN 1386-4564. eISSN 1573-7659. Available under: doi: 10.1007/s10791-008-9082-8
BibTex
@article{Blanco2009Mixed-2720,
  year={2009},
  doi={10.1007/s10791-008-9082-8},
  title={Mixed monolingual homepage finding in 34 languages : the role of language script and search domain},
  number={3},
  volume={12},
  issn={1386-4564},
  journal={Information retrieval},
  pages={324--351},
  author={Blanco, Roi and Lioma, Christina}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/2720">
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:rights>terms-of-use</dc:rights>
    <dcterms:title>Mixed monolingual homepage finding in 34 languages : the role of language script and search domain</dcterms:title>
    <dc:creator>Blanco, Roi</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-23T09:58:54Z</dcterms:available>
    <dc:language>eng</dc:language>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:abstract xml:lang="eng">The information that is available or sought on the World Wide Web (Web) is increasingly multilingual. Information Retrieval systems, such as the freely available search engines on the Web, need to provide fair and equal access to this information, regardless of the language in which a query is written or where the query is posted from. In this work, we ask two questions: How do existing state of the art search engines deal with languages written in different alphabets (scripts)? Do local language-based search domains actually facilitate access to information? We conduct a thorough study on the effect of multilingual queries for homepage finding, where the aim of the retrieval system is to return only one document, namely the homepage described in the query. We evaluate the effect of multilingual queries in retrieval performance with regard to (i) the alphabet in which the queries are written (e.g., Latin, Russian, Arabic), and (ii) the language domain where the queries are posted (e.g., google.com, google.fr). We query four major freely available search engines with 764 queries in 34 different languages, and look for the correct homepage in the top retrieved results. In order to have fair multilingual experimental settings, we use an ontology that is comparable across languages and also representative of realistic Web searches: football premier leagues in different countries; the official team name represents our query, and the official team homepage represents the document to be retrieved. A series of thorough experiments involving over 10,000 runs, with queries both in their correct and in Latin characters, and also using both global-domain and local-domain searches, reveal that queries issued in the correct script of a language are more likely to be found and ranked in the top 3, while queries in non-Latin script languages which are however issued in Latin script are less likely to be found; also, queries issued to the correct local domain of a search engine, e.g., French queries to yahoo.fr, are likely to have better retrieval performance than queries issued to the global domain of a search engine. To our knowledge, this is the first Webretrieval study that uses such a wide range of languages.</dcterms:abstract>
    <dcterms:bibliographicCitation>Publ. in: Information retrieval 12 (2009), 3, pp. 324-351</dcterms:bibliographicCitation>
    <dcterms:issued>2009</dcterms:issued>
    <dc:contributor>Lioma, Christina</dc:contributor>
    <dc:contributor>Blanco, Roi</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/45"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/2720"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2011-03-23T09:58:54Z</dc:date>
    <dc:creator>Lioma, Christina</dc:creator>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Begutachtet
Diese Publikation teilen