Publikation:

Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space

Lade...
Vorschaubild

Dateien

Zu diesem Dokument gibt es keine Dateien.

Datum

2014

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

URI (zitierfähiger Link)
ArXiv-ID

Internationale Patentnummer

Angaben zur Forschungsförderung

Projekt

Open Access-Veröffentlichung
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Beitrag zu einem Konferenzband
Publikationsstatus
Published

Erschienen in

Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) : 8th - 12th September 2014, City University London, United Kingdom. Piscataway, NJ: IEEE, 2014, pp. 197-200. ISBN 978-1-4799-5569-5. Available under: doi: 10.1109/JCDL.2014.6970168

Zusammenfassung

This paper proposes a hybrid approach to plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms. Currently available software for plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised plagiarism, such as paraphrases, translations, or idea plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic plagiarism detection to become feasible even on large collections for the first time.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
004 Informatik

Schlagwörter

Citation Analysis, Disguised Plagiarism, Information Retrieval, Large Scale Collections, Plagiarism Detection, Semantic Analysis

Konferenz

2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), 8. Sept. 2014 - 12. Sept. 2014, London
Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690MEUSCHKE, Norman, Bela GIPP, 2014. Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space. 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL). London, 8. Sept. 2014 - 12. Sept. 2014. In: Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) : 8th - 12th September 2014, City University London, United Kingdom. Piscataway, NJ: IEEE, 2014, pp. 197-200. ISBN 978-1-4799-5569-5. Available under: doi: 10.1109/JCDL.2014.6970168
BibTex
@inproceedings{Meuschke2014Reduc-30317,
  year={2014},
  doi={10.1109/JCDL.2014.6970168},
  title={Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space},
  isbn={978-1-4799-5569-5},
  publisher={IEEE},
  address={Piscataway, NJ},
  booktitle={Proceedings of the 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL) : 8th - 12th September 2014, City University London, United Kingdom},
  pages={197--200},
  author={Meuschke, Norman and Gipp, Bela}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/30317">
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T15:48:08Z</dc:date>
    <dc:creator>Meuschke, Norman</dc:creator>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2015-03-16T15:48:08Z</dcterms:available>
    <dc:language>eng</dc:language>
    <dcterms:abstract xml:lang="eng">This paper proposes a hybrid approach to plagiarism detection in academic documents that integrates detection methods using citations, semantic argument structure, and semantic word similarity with character-based methods to achieve a higher detection performance for disguised plagiarism forms. Currently available software for plagiarism detection exclusively performs text string comparisons. These systems find copies, but fail to identify disguised plagiarism, such as paraphrases, translations, or idea plagiarism. Detection approaches that consider semantic similarity on word and sentence level exist and have consistently achieved higher detection accuracy for disguised plagiarism forms compared to character-based approaches. However, the high computational effort of these semantic approaches makes them infeasible for use in real-world plagiarism detection scenarios. The proposed hybrid approach uses citation-based methods as a preliminary heuristic to reduce the retrieval space with a relatively low loss in detection accuracy. This preliminary step can then be followed by a computationally more expensive semantic and character-based analysis. We show that such a hybrid approach allows semantic plagiarism detection to become feasible even on large collections for the first time.</dcterms:abstract>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:contributor>Gipp, Bela</dc:contributor>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dcterms:issued>2014</dcterms:issued>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <bibo:uri rdf:resource="http://kops.uni-konstanz.de/handle/123456789/30317"/>
    <dc:creator>Gipp, Bela</dc:creator>
    <dc:contributor>Meuschke, Norman</dc:contributor>
    <dcterms:title>Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space</dcterms:title>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Diese Publikation teilen