Publikation:

Discovery and recognition of formula concepts using machine learning

Lade...
Vorschaubild

Dateien

Scharpf_2-ff4tuct3qxt3.pdf
Scharpf_2-ff4tuct3qxt3.pdfGröße: 4.81 MBDownloads: 1

Datum

2023

Herausgeber:innen

Kontakt

ISSN der Zeitschrift

Electronic ISSN

ISBN

Bibliografische Daten

Verlag

Schriftenreihe

Auflagebezeichnung

ArXiv-ID

Internationale Patentnummer

Link zur Lizenz

Angaben zur Forschungsförderung

Deutsche Forschungsgemeinschaft (DFG): 350192710
Deutsche Forschungsgemeinschaft (DFG): 437179652

Projekt

Open Access-Veröffentlichung
Open Access Hybrid
Core Facility der Universität Konstanz

Gesperrt bis

Titel in einer weiteren Sprache

Publikationstyp
Zeitschriftenartikel
Publikationsstatus
Published

Erschienen in

Scientometrics. Springer. 2023, 128(9), S. 4971-5025. ISSN 0138-9130. eISSN 1588-2861. Verfügbar unter: doi: 10.1007/s11192-023-04667-9

Zusammenfassung

Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a ‘Formula Concept’ that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68% for retrieving equivalent representations of frequent formulas and a recall of 72% for extracting the formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering, as well as document similarity assessments for plagiarism detection or recommender systems.

Zusammenfassung in einer weiteren Sprache

Fachgebiet (DDC)
510 Mathematik

Schlagwörter

Mathematical information retrieval, Machine learning, Wikidata

Konferenz

Rezension
undefined / . - undefined, undefined

Forschungsvorhaben

Organisationseinheiten

Zeitschriftenheft

Zugehörige Datensätze in KOPS

Zitieren

ISO 690SCHARPF, Philipp, Moritz SCHUBOTZ, Howard S. COHL, Corinna BREITINGER, Bela GIPP, 2023. Discovery and recognition of formula concepts using machine learning. In: Scientometrics. Springer. 2023, 128(9), S. 4971-5025. ISSN 0138-9130. eISSN 1588-2861. Verfügbar unter: doi: 10.1007/s11192-023-04667-9
BibTex
@article{Scharpf2023-09Disco-72154,
  title={Discovery and recognition of formula concepts using machine learning},
  year={2023},
  doi={10.1007/s11192-023-04667-9},
  number={9},
  volume={128},
  issn={0138-9130},
  journal={Scientometrics},
  pages={4971--5025},
  author={Scharpf, Philipp and Schubotz, Moritz and Cohl, Howard S. and Breitinger, Corinna and Gipp, Bela}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/72154">
    <dc:contributor>Breitinger, Corinna</dc:contributor>
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dc:language>eng</dc:language>
    <dc:rights>Attribution 4.0 International</dc:rights>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72154/1/Scharpf_2-ff4tuct3qxt3.pdf"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-31T11:24:08Z</dcterms:available>
    <dc:creator>Gipp, Bela</dc:creator>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by/4.0/"/>
    <dc:creator>Schubotz, Moritz</dc:creator>
    <dcterms:title>Discovery and recognition of formula concepts using machine learning</dcterms:title>
    <dc:creator>Breitinger, Corinna</dc:creator>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/72154/1/Scharpf_2-ff4tuct3qxt3.pdf"/>
    <dcterms:issued>2023-09</dcterms:issued>
    <dcterms:abstract>Citation-based Information Retrieval (IR) methods for scientific documents have proven effective for IR applications, such as Plagiarism Detection or Literature Recommender Systems in academic disciplines that use many references. In science, technology, engineering, and mathematics, researchers often employ mathematical concepts through formula notation to refer to prior knowledge. Our long-term goal is to generalize citation-based IR methods and apply this generalized method to both classical references and mathematical concepts. In this paper, we suggest how mathematical formulas could be cited and define a Formula Concept Retrieval task with two subtasks: Formula Concept Discovery (FCD) and Formula Concept Recognition (FCR). While FCD aims at the definition and exploration of a ‘Formula Concept’ that names bundled equivalent representations of a formula, FCR is designed to match a given formula to a prior assigned unique mathematical concept identifier. We present machine learning-based approaches to address the FCD and FCR tasks. We then evaluate these approaches on a standardized test collection (NTCIR arXiv dataset). Our FCD approach yields a precision of 68% for retrieving equivalent representations of frequent formulas and a recall of 72% for extracting the formula name from the surrounding text. FCD and FCR enable the citation of formulas within mathematical documents and facilitate semantic search and question answering, as well as document similarity assessments for plagiarism detection or recommender systems.</dcterms:abstract>
    <dc:creator>Cohl, Howard S.</dc:creator>
    <dc:contributor>Scharpf, Philipp</dc:contributor>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/39"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/39"/>
    <dc:contributor>Gipp, Bela</dc:contributor>
    <dc:contributor>Cohl, Howard S.</dc:contributor>
    <dc:contributor>Schubotz, Moritz</dc:contributor>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2025-01-31T11:24:08Z</dc:date>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/72154"/>
    <dc:creator>Scharpf, Philipp</dc:creator>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
  </rdf:Description>
</rdf:RDF>

Interner Vermerk

xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter

Kontakt
URL der Originalveröffentl.

Prüfdatum der URL

Prüfungsdatum der Dissertation

Finanzierungsart

Kommentar zur Publikation

Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Ja
Diese Publikation teilen