Multi-task learning for pKa prediction
Dateien
Datum
Autor:innen
Herausgeber:innen
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
Internationale Patentnummer
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Sammlungen
Core Facility der Universität Konstanz
Titel in einer weiteren Sprache
Publikationstyp
Publikationsstatus
Erschienen in
Zusammenfassung
Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.
Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
Schlagwörter
Konferenz
Rezension
Zitieren
ISO 690
SKOLIDIS, Grigorios, Katja HANSEN, Guido SANGUINETTI, Matthias RUPP, 2012. Multi-task learning for pKa prediction. In: Journal of Computer-Aided Molecular Design. Springer. 2012, 26(7), pp. 883-895. ISSN 0920-654X. eISSN 1573-4951. Available under: doi: 10.1007/s10822-012-9582-xBibTex
@article{Skolidis2012-07Multi-52163, year={2012}, doi={10.1007/s10822-012-9582-x}, title={Multi-task learning for pK<sub>a</sub> prediction}, number={7}, volume={26}, issn={0920-654X}, journal={Journal of Computer-Aided Molecular Design}, pages={883--895}, author={Skolidis, Grigorios and Hansen, Katja and Sanguinetti, Guido and Rupp, Matthias} }
RDF
<rdf:RDF xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:void="http://rdfs.org/ns/void#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/52163"> <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-12-17T14:24:15Z</dcterms:available> <dc:contributor>Rupp, Matthias</dc:contributor> <dcterms:abstract xml:lang="eng">Many compound properties depend directly on the dissociation constants of its acidic and basic groups. Significant effort has been invested in computational models to predict these constants. For linear regression models, compounds are often divided into chemically motivated classes, with a separate model for each class. However, sometimes too few measurements are available for a class to build a reasonable model, e.g., when investigating a new compound series. If data for related classes are available, we show that multi-task learning can be used to improve predictions by utilizing data from these other classes. We investigate performance of linear Gaussian process regression models (single task, pooling, and multi-task models) in the low sample size regime, using a published data set (n = 698, mostly monoprotic, in aqueous solution) divided beforehand into 15 classes. A multi-task regression model using the intrinsic model of co-regionalization and incomplete Cholesky decomposition performed best in 85 % of all experiments. The presented approach can be applied to estimate other molecular properties where few measurements are available.</dcterms:abstract> <foaf:homepage rdf:resource="http://localhost:8080/"/> <dc:creator>Hansen, Katja</dc:creator> <dcterms:title>Multi-task learning for pK<sub>a</sub> prediction</dcterms:title> <dc:creator>Sanguinetti, Guido</dc:creator> <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/52163"/> <dc:creator>Skolidis, Grigorios</dc:creator> <dc:creator>Rupp, Matthias</dc:creator> <dc:contributor>Sanguinetti, Guido</dc:contributor> <dc:language>eng</dc:language> <dcterms:rights rdf:resource="https://rightsstatements.org/page/InC/1.0/"/> <dc:rights>terms-of-use</dc:rights> <dc:contributor>Skolidis, Grigorios</dc:contributor> <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/> <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> <dc:contributor>Hansen, Katja</dc:contributor> <dcterms:issued>2012-07</dcterms:issued> <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2020-12-17T14:24:15Z</dc:date> <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/> </rdf:Description> </rdf:RDF>