Publikation:

Citation-based plagiarism detection : practicability on a large-scale scientific corpus

Lade...
Vorschaubild

Dateien

Gipp_0-283268.pdf
Gipp_0-283268.pdfGröße: 816.32 KBDownloads: 834

Datum

2014

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

DOI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Journal of the Association for Information Science and Technology. 2014, 65(8), pp. 1527-1540. ISSN 2330-1635. eISSN 2330-1643. Available under: doi: 10.1002/asi.23228

Zusammenfassung

The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structural and idea plagiarism, remain undetected. A recently proposed language-independent approach to plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

information retrieval; full text searching; plagiarism

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690GIPP, Bela, Norman MEUSCHKE, Corinna BREITINGER, 2014. Citation-based plagiarism detection : practicability on a large-scale scientific corpus. In: Journal of the Association for Information Science and Technology. 2014, 65(8), pp. 1527-1540. ISSN 2330-1635. eISSN 2330-1643. Available under: doi: 10.1002/asi.23228
BibTex
@article{Gipp2014Citat-30291,
  year={2014},
  doi={10.1002/asi.23228},
  title={Citation-based plagiarism detection : practicability on a large-scale scientific corpus},
  number={8},
  volume={65},
  issn={2330-1635},
  journal={Journal of the Association for Information Science and Technology},
  pages={1527--1540},
  author={Gipp, Bela and Meuschke, Norman and Breitinger, Corinna}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/30291">
    <dc:creator>Breitinger, Corinna</dc:creator>
    <dc:contributor>Gipp, Bela</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30291/1/Gipp_0-283268.pdf"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:creator>Gipp, Bela</dc:creator>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:rights>terms-of-use</dc:rights>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T10:34:12Z</dc:date>
    <dcterms:issued>2014</dcterms:issued>
    <dcterms:title>Citation-based plagiarism detection : practicability on a large-scale scientific corpus</dcterms:title>
    <dcterms:abstract xml:lang="eng">The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structural and idea plagiarism, remain undetected. A recently proposed language-independent approach to plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.</dcterms:abstract>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30291"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:language>eng</dc:language>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30291/1/Gipp_0-283268.pdf"/>
    <dc:contributor>Meuschke, Norman</dc:contributor>
    <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/>
    <dc:contributor>Breitinger, Corinna</dc:contributor>
    <dc:creator>Meuschke, Norman</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T10:34:12Z</dcterms:available>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen