Type of Publication: | Journal article |
URI (citable link): | http://nbn-resolving.de/urn:nbn:de:bsz:352-0-283268 |
Author: | Gipp, Bela; Meuschke, Norman; Breitinger, Corinna |
Year of publication: | 2014 |
Published in: | Journal of the Association for Information Science and Technology ; 65 (2014), 8. - pp. 1527-1540. - ISSN 2330-1635. - eISSN 2330-1643 |
DOI (citable link): | https://dx.doi.org/10.1002/asi.23228 |
Summary: |
The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structural and idea plagiarism, remain undetected. A recently proposed language-independent approach to plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.
|
Subject (DDC): | 004 Computer Science |
Keywords: | information retrieval; full text searching; plagiarism |
Link to License: | In Copyright |
GIPP, Bela, Norman MEUSCHKE, Corinna BREITINGER, 2014. Citation-based plagiarism detection : practicability on a large-scale scientific corpus. In: Journal of the Association for Information Science and Technology. 65(8), pp. 1527-1540. ISSN 2330-1635. eISSN 2330-1643. Available under: doi: 10.1002/asi.23228
@article{Gipp2014Citat-30291, title={Citation-based plagiarism detection : practicability on a large-scale scientific corpus}, year={2014}, doi={10.1002/asi.23228}, number={8}, volume={65}, issn={2330-1635}, journal={Journal of the Association for Information Science and Technology}, pages={1527--1540}, author={Gipp, Bela and Meuschke, Norman and Breitinger, Corinna} }
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/rdf/resource/123456789/30291"> <dc:contributor>Gipp, Bela</dc:contributor> <dcterms:title>Citation-based plagiarism detection : practicability on a large-scale scientific corpus</dcterms:title> <dc:creator>Meuschke, Norman</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T10:34:12Z</dcterms:available> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/rdf/resource/123456789/36"/> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/rdf/resource/123456789/36"/> <dc:creator>Gipp, Bela</dc:creator> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dc:contributor>Meuschke, Norman</dc:contributor> <dc:language>eng</dc:language> <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30291"/> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:creator>Breitinger, Corinna</dc:creator> <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30291/1/Gipp_0-283268.pdf"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T10:34:12Z</dc:date> <dc:rights>terms-of-use</dc:rights> <dcterms:abstract xml:lang="eng">The automated detection of plagiarism is an information retrieval task of increasing importance as the volume of readily accessible information on the web expands. A major shortcoming of current automated plagiarism detection approaches is their dependence on high character-based similarity. As a result, heavily disguised plagiarism forms, such as paraphrases, translated plagiarism, or structural and idea plagiarism, remain undetected. A recently proposed language-independent approach to plagiarism detection, Citation-based Plagiarism Detection (CbPD), allows the detection of semantic similarity even in the absence of text overlap by analyzing the citation placement in a document's full text to determine similarity. This article evaluates the performance of CbPD in detecting plagiarism with various degrees of disguise in a collection of 185,000 biomedical articles. We benchmark CbPD against two character-based detection approaches using a ground truth approximated in a user study. Our evaluation shows that the citation-based approach achieves superior ranking performance for heavily disguised plagiarism forms. Additionally, we demonstrate CbPD to be computationally more efficient than character-based approaches. Finally, upon combining the citation-based with the traditional character-based document similarity visualization methods in a hybrid detection prototype, we observe a reduction in the required user effort for document verification.</dcterms:abstract> <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/30291/1/Gipp_0-283268.pdf"/> <dcterms:issued>2014</dcterms:issued> <dc:contributor>Breitinger, Corinna</dc:contributor> <foaf:homepage rdf:resource="http://localhost:8080/jspui"/> </rdf:Description> </rdf:RDF>
Gipp_0-283268.pdf | 612 |