Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations

dc.contributor.authorMeuschke, Norman
dc.contributor.authorStange, Vincent
dc.contributor.authorSchubotz, Moritz
dc.contributor.authorKramer, Michael
dc.contributor.authorGipp, Bela
dc.date.accessioned2020-09-22T08:24:30Z
dc.date.available2020-09-22T08:24:30Z
dc.date.issued2019-06-27T16:07:47Zeng
dc.description.abstractIdentifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines. The data and code of our study are openly available at https://purl.org/hybridPD.eng
dc.description.versionpublishedde
dc.identifier.arxiv1906.11761eng
dc.identifier.doi10.1109/JCDL.2019.00026eng
dc.identifier.urihttps://kops.uni-konstanz.de/handle/123456789/50945
dc.language.isoengeng
dc.subject.ddc004eng
dc.titleImproving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citationseng
dc.typeINPROCEEDINGSde
dspace.entity.typePublication
kops.citation.bibtex
@inproceedings{Meuschke2019-06-27T16:07:47ZImpro-50945,
  year={2019},
  doi={10.1109/JCDL.2019.00026},
  title={Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations},
  isbn={978-1-72811-547-4},
  publisher={IEEE},
  address={Piscataway, NJ},
  booktitle={2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois},
  pages={120--129},
  editor={Bonn, Maria},
  author={Meuschke, Norman and Stange, Vincent and Schubotz, Moritz and Kramer, Michael and Gipp, Bela}
}
kops.citation.iso690MEUSCHKE, Norman, Vincent STANGE, Moritz SCHUBOTZ, Michael KRAMER, Bela GIPP, 2019. Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). Urbana-Champaign, Illinois, 2. Juni 2019 - 6. Juni 2019. In: BONN, Maria, ed. and others. 2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois. Piscataway, NJ: IEEE, 2019, pp. 120-129. ISBN 978-1-72811-547-4. Available under: doi: 10.1109/JCDL.2019.00026deu
kops.citation.iso690MEUSCHKE, Norman, Vincent STANGE, Moritz SCHUBOTZ, Michael KRAMER, Bela GIPP, 2019. Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations. 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL). Urbana-Champaign, Illinois, Jun 2, 2019 - Jun 6, 2019. In: BONN, Maria, ed. and others. 2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois. Piscataway, NJ: IEEE, 2019, pp. 120-129. ISBN 978-1-72811-547-4. Available under: doi: 10.1109/JCDL.2019.00026eng
kops.citation.rdf
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/50945">
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:language>eng</dc:language>
    <dc:creator>Meuschke, Norman</dc:creator>
    <dcterms:title>Improving Academic Plagiarism Detection for STEM Documents by Analyzing Mathematical Content and Citations</dcterms:title>
    <dc:creator>Kramer, Michael</dc:creator>
    <dc:contributor>Stange, Vincent</dc:contributor>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/50945"/>
    <dcterms:issued>2019-06-27T16:07:47Z</dcterms:issued>
    <dc:contributor>Kramer, Michael</dc:contributor>
    <dc:contributor>Meuschke, Norman</dc:contributor>
    <dc:creator>Stange, Vincent</dc:creator>
    <dc:creator>Schubotz, Moritz</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-09-22T08:24:30Z</dc:date>
    <dc:contributor>Schubotz, Moritz</dc:contributor>
    <dcterms:abstract xml:lang="eng">Identifying academic plagiarism is a pressing task for educational and research institutions, publishers, and funding agencies. Current plagiarism detection systems reliably find instances of copied and moderately reworded text. However, reliably detecting concealed plagiarism, such as strong paraphrases, translations, and the reuse of nontextual content and ideas is an open research problem. In this paper, we extend our prior research on analyzing mathematical content and academic citations. Both are promising approaches for improving the detection of concealed academic plagiarism primarily in Science, Technology, Engineering and Mathematics (STEM). We make the following contributions: i) We present a two-stage detection process that combines similarity assessments of mathematical content, academic citations, and text. ii) We introduce new similarity measures that consider the order of mathematical features and outperform the measures in our prior research. iii) We compare the effectiveness of the math-based, citation-based, and text-based detection approaches using confirmed cases of academic plagiarism. iv) We demonstrate that the combined analysis of math-based and citation-based content features allows identifying potentially suspicious cases in a collection of 102K STEM documents. Overall, we show that analyzing the similarity of mathematical content and academic citations is a striking supplement for conventional text-based detection approaches for academic literature in the STEM disciplines. The data and code of our study are openly available at https://purl.org/hybridPD.</dcterms:abstract>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Gipp, Bela</dc:contributor>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-09-22T08:24:30Z</dcterms:available>
    <dc:creator>Gipp, Bela</dc:creator>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
  </rdf:Description>
</rdf:RDF>
kops.conferencefield2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2. Juni 2019 - 6. Juni 2019, Urbana-Champaign, Illinoisdeu
kops.date.conferenceEnd2019-06-06eng
kops.date.conferenceStart2019-06-02eng
kops.flag.knbibliographyfalse
kops.location.conferenceUrbana-Champaign, Illinoiseng
kops.sourcefieldBONN, Maria, ed. and others. <i>2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois</i>. Piscataway, NJ: IEEE, 2019, pp. 120-129. ISBN 978-1-72811-547-4. Available under: doi: 10.1109/JCDL.2019.00026deu
kops.sourcefield.plainBONN, Maria, ed. and others. 2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois. Piscataway, NJ: IEEE, 2019, pp. 120-129. ISBN 978-1-72811-547-4. Available under: doi: 10.1109/JCDL.2019.00026deu
kops.sourcefield.plainBONN, Maria, ed. and others. 2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinois. Piscataway, NJ: IEEE, 2019, pp. 120-129. ISBN 978-1-72811-547-4. Available under: doi: 10.1109/JCDL.2019.00026eng
kops.title.conference2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)eng
relation.isAuthorOfPublicatione3f81adb-a670-4c4c-bade-6781b8f996b0
relation.isAuthorOfPublicationca8bde37-42cc-450d-af71-d88f15f7f7c2
relation.isAuthorOfPublication63951a1b-b477-40b3-acbb-e5d19d711255
relation.isAuthorOfPublicationb2f34194-94d0-4e3c-a6ac-b9cc19ddfc15
relation.isAuthorOfPublication358ad52f-dab7-4582-bf8e-8adcf477a2d4
relation.isAuthorOfPublication.latestForDiscoverye3f81adb-a670-4c4c-bade-6781b8f996b0
source.bibliographicInfo.fromPage120eng
source.bibliographicInfo.toPage129eng
source.contributor.editorBonn, Maria
source.flag.etalEditortrueeng
source.identifier.isbn978-1-72811-547-4eng
source.publisherIEEEeng
source.publisher.locationPiscataway, NJeng
source.title2019 ACM/IEEE Joint Conference on Digital Libraries : JCDL 2019 : proceedings : 2-6 June 2019, Urbana-Champaign, Illinoiseng

Dateien