Publikation: Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
This paper proposes a hybrid approach to plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms. Currently available software for plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised plagiarism, such as paraphrases, translations, or idea plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic plagiarism detection to become feasible even on large collections for the first time.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
MEUSCHKE, Norman, Bela GIPP, 2014. Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space. 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL). London, 8. Sept. 2014 - 12. Sept. 2014. In: Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) : 8th - 12th September 2014, City University London, United Kingdom. Piscataway, NJ: IEEE, 2014, pp. 197-200. ISBN 978-1-4799-5569-5. Available under: doi: 10.1109/JCDL.2014.6970168BibTex
@inproceedings{Meuschke2014Reduc-30317, year={2014}, doi={10.1109/JCDL.2014.6970168}, title={Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space}, isbn={978-1-4799-5569-5}, publisher={IEEE}, address={Piscataway, NJ}, booktitle={Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) : 8th - 12th September 2014, City University London, United Kingdom}, pages={197--200}, author={Meuschke, Norman and Gipp, Bela} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/30317"> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T15:48:08Z</dc:date> <dc:creator>Meuschke, Norman</dc:creator> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T15:48:08Z</dcterms:available> <dc:language>eng</dc:language> <dcterms:abstract xml:lang="eng">This paper proposes a hybrid approach to plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms. Currently available software for plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised plagiarism, such as paraphrases, translations, or idea plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic plagiarism detection to become feasible even on large collections for the first time.</dcterms:abstract> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:contributor>Gipp, Bela</dc:contributor> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dcterms:issued>2014</dcterms:issued> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30317"/> <dc:creator>Gipp, Bela</dc:creator> <dc:contributor>Meuschke, Norman</dc:contributor> <dcterms:title>Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space</dcterms:title> </rdf:Description> </rdf:RDF>