KOPS - Das Institutionelle Repositorium der Universität Konstanz

Citation-based Plagiarism Detection : Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications

Citation-based Plagiarism Detection : Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications

Zitieren

Dateien zu dieser Ressource

Dateien Größe Format Anzeige

Zu diesem Dokument gibt es keine Dateien.

GIPP, Bela, 2013. Citation-based Plagiarism Detection : Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications [Dissertation]. Magdeburg: Otto-von-Guericke-Universität Madgeburg

@phdthesis{Gipp2013Citat-31399, title={Citation-based Plagiarism Detection : Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications}, year={2013}, address={Magdeburg}, school={Otto-von-Guericke-Universität Madgeburg}, author={Gipp, Bela}, note={The book version of the thesis is available from Springer Vieweg Research: http://dx.doi.org/10.1007/978-3-658-06394-8} }

Gipp, Bela Gipp, Bela 2013 eng This doctoral thesis addresses a problem in information retrieval, which has recently captured the attention of media – the software-based detection of disguised plagiarism forms. State-of-the-art plagiarism detection approaches are capable of identifying copy & paste, and to some extent, lightly disguised plagiarism. However, even today’s best performing systems cannot reliably identify more heavily disguised forms of plagiarism, including paraphrases, translated plagiarism, or idea plagiarism. This weakness of current systems results in a large percentage of disguised scientific plagiarism going undetected. While the easily recognizable copy & paste-type plagiarism typically occurs among students and has no serious consequences for society, disguised plagiarism in the sciences, such as plagiarized medical studies in which results are copied without the corresponding experiments having been performed, can jeopardize patient safety. To address the weakness of plagiarism detection systems, this thesis introduces Citation-based Plagiarism Detection (CbPD). Unlike existing character-based approaches, which perform text comparisons, CbPD does not consider text similarity alone, but uses citation patterns within documents as a unique, language-independent "semantic fingerprint" to identify potentially suspicious similarity among texts. The idea for CbPD originated from the observation that plagiarists commonly disguise academic misconduct by paraphrasing copied text, but typically do not substitute or significantly rearrange the citations. Motivated by these findings, the author developed various CbPD algorithms tailored to the different forms of plagiarism, and implemented them in the first citation-based plagiarism detection prototype capable of detecting heavily disguised plagiarism. The advantages of the CbPD approach were demonstrated in evaluations using three document collections. CbPD’s applicability for detecting strongly disguised plagiarism was first demonstrated using the plagiarized thesis of former German Minister of Defense, K.-T. zu Guttenberg. While conventional approaches failed to detect a single instance of translated plagiarism in this thesis, CbPD identified 13 of the 16 translations. The effectiveness of the approach was further demonstrated when applied to other authors and plagiarism forms in the VroniPlag Wiki. The practicality of the CbPD approach was demonstrated by the successful identification of several plagiarism cases in the biomedical publication collection PubMed Central Open Access Subset. As a result of a user study utilizing the CbPD prototype, several plagiarism investigations have thus far been initiated. One medical study and a plagiarized medical case report have since been retracted. The evaluation also showed CbPD’s visualization of citation pattern similarities to facilitate the verification of plagiarism. Additionally, it could be shown that CbPD has a superior computational efficiency compared to existing approaches, and produced significantly fewer false positives. CbPD is not a substitute for, but rather a complement to existing approaches. A combination of CbPD with current approaches into a hybrid system promises to ensure optimal detection of both short literal plagiarism, as well as heavily disguised or translated plagiarism. 2015-07-08T11:17:53Z 2015-07-08T11:17:53Z Citation-based Plagiarism Detection : Applying Citation Pattern Analysis to Identify Currently Non-Machine-Detectable Disguised Plagiarism in Scientific Publications

Das Dokument erscheint in:

KOPS Suche


Stöbern

Mein Benutzerkonto