Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations
Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations
No Thumbnail Available
Files
There are no files associated with this item.
Date
2019
Editors
Journal ISSN
Electronic ISSN
ISBN
Bibliographical data
Publisher
Series
DOI (citable link)
ArXiv-ID
International patent number
Link to the license
oops
EU project number
Project
Open Access publication
Collections
Title in another language
Publication type
Contribution to a conference collection
Publication status
Published
Published in
2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois / Bonn, Maria et al. (ed.). - Piscataway, NJ : IEEE, 2019. - pp. 120-129. - ISBN 978-1-72811-547-4
Abstract
Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines. The data and code of our study are openly available at https://purl.org/hybridPD.
Summary in another language
Subject (DDC)
004 Computer Science
Keywords
Conference
2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Jun 2, 2019 - Jun 6, 2019, Urbana-Champaign, Illinois
Review
undefined / . - undefined, undefined. - (undefined; undefined)
Cite This
ISO 690
MEUSCHKE, Norman, Vincent STANGE, Moritz SCHUBOTZ, Michael KRAMER, Bela GIPP, 2019. Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). Urbana-Champaign, Illinois, Jun 2, 2019 - Jun 6, 2019. In: BONN, Maria, ed. and others. 2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois. Piscataway, NJ:IEEE, pp. 120-129. ISBN 978-1-72811-547-4. Available under: doi: 10.1109/JCDL.2019.00026BibTex
@inproceedings{Meuschke2019-06-27T16:07:47ZImpro-50945, year={2019}, doi={10.1109/JCDL.2019.00026}, title={Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations}, isbn={978-1-72811-547-4}, publisher={IEEE}, address={Piscataway, NJ}, booktitle={2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois}, pages={120--129}, editor={Bonn, Maria}, author={Meuschke, Norman and Stange, Vincent and Schubotz, Moritz and Kramer, Michael and Gipp, Bela} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/50945"> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:language>eng</dc:language> <dc:creator>Meuschke, Norman</dc:creator> <dcterms:title>Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations</dcterms:title> <dc:creator>Kramer, Michael</dc:creator> <dc:contributor>Stange, Vincent</dc:contributor> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/50945"/> <dcterms:issued>2019-06-27T16:07:47Z</dcterms:issued> <dc:contributor>Kramer, Michael</dc:contributor> <dc:contributor>Meuschke, Norman</dc:contributor> <dc:creator>Stange, Vincent</dc:creator> <dc:creator>Schubotz, Moritz</dc:creator> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-09-22T08:24:30Z</dc:date> <dc:contributor>Schubotz, Moritz</dc:contributor> <dcterms:abstract xml:lang="eng">Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines. The data and code of our study are openly available at https://purl.org/hybridPD.</dcterms:abstract> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Gipp, Bela</dc:contributor> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-09-22T08:24:30Z</dcterms:available> <dc:creator>Gipp, Bela</dc:creator> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> </rdf:Description> </rdf:RDF>
Internal note
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Examination date of dissertation
Method of financing
Comment on publication
Alliance license
Corresponding Authors der Uni Konstanz vorhanden
International Co-Authors
Bibliography of Konstanz
No