Person: Meinl, Thorsten
A KNIME-Based Analysis of the Zebrafish Photomotor Response Clusters the Phenotypes of 14 Classes of Neuroactive Molecules
2016-06-01, Copmans, Daniëlle, Meinl, Thorsten, Dietz, Christian, van Leeuwen, Matthijs, Ortmann, Julia, Berthold, Michael R., de Witte, Peter A. M.
Recently, the photomotor response (PMR) of zebrafish embryos was reported as a robust behavior that is useful for high-throughput neuroactive drug discovery and mechanism prediction. Given the complexity of the PMR, there is a need for rapid and easy analysis of the behavioral data. In this study, we developed an automated analysis workflow using the KNIME Analytics Platform and made it freely accessible. This workflow allows us to simultaneously calculate a behavioral fingerprint for all analyzed compounds and to further process the data. Furthermore, to further characterize the potential of PMR for mechanism prediction, we performed PMR analysis of 767 neuroactive compounds covering 14 different receptor classes using the KNIME workflow. We observed a true positive rate of 25% and a false negative rate of 75% in our screening conditions. Among the true positives, all receptor classes were represented, thereby confirming the utility of the PMR assay to identify a broad range of neuroactive molecules. By hierarchical clustering of the behavioral fingerprints, different phenotypical clusters were observed that suggest the utility of PMR for mechanism prediction for adrenergics, dopaminergics, serotonergics, metabotropic glutamatergics, opioids, and ion channel ligands.
Get your chemistry right with KNIME
2013, Meinl, Thorsten, Landrum, Gregory
Integrated data analysis with KNIME
2012, Meinl, Thorsten, Jagla, Bernd, Berthold, Michael R.
Maximum-Score Diversity Selection
2010, Meinl, Thorsten
This thesis discusses the problem of Maximum-Score Diversity Selection (MSDS). Pure diversity selection, as it is often performed e.g. in early drug discovery, is the selection of a subset of available objects that is as diverse as possible. MSDS adds a second objective, which additionally tries to maximize the "score'' of the subset, which usually is the sum of scores of all elements in the subset. Thus, this problem is a classical multi-objective optimization problem since both objectives -- maximizing score and maximizing diversity -- tend to conflict with each other. In this thesis several methods are presented, developed, and evaluated to efficiently solve this special multi-objective optimization problem. After a more detailed discussion about the application of MSDS in drug discovery, the question of suitable definitions of diversity is considered. This is essential for later application domains, where users have only a vague feeling of diversity. Then the Maximum-Score Diversity Selection problem is formalized and shown to be an NP-hard optimization problem. Therefore no exact solution can be computed efficiently for all but the smallest cases. After putting MSDS into the context of multi-objective optimization, the usage of evolutionary algorithms -- specifically genetic algorithms -- for solving the problem is evaluated. This also includes the presentation of novel genetic operators for evolving subsets or combinations of objects. However, being a universal tool, genetic algorithms may not be the best technique for the actual problem. Hence, several problem-specific heuristics are discussed, two of them motivated by the transformation of MSDS into a graph-theoretic problem used in the NP-hardness proof, and a novel heuristics methods, known as Score Erosion. The comparison of all approaches on various synthetic and real-world datasets reveals that all heuristics find solutions of similar quality, given the right measures of diversity, with Score Erosion being the fastest of all presented algorithm as a result of its linear time complexity. Also the questions are investigated as to how the structure of the search space influences the results and whether the application of MSDS pays off in practice.
Screening Chemicals for Receptor-Mediated Toxicological and Pharmacological Endpoints : Using Public Data to Build Screening Tools within a KNIME Workflow
2015, Steinmetz, Fabian P., Mellor, Claire L., Meinl, Thorsten, Cronin, Marc T. D.
Assessing compounds for their pharmacological and toxicological properties is of great importance for industry and regulatory agencies. In this study an approach using open source software and open access databases to build screening tools for receptor-mediated effects is presented. The retinoic acid receptor (RAR), as a pharmacologically and toxicologically relevant target, was chosen for this study. RAR agonists are used in the treatment of a number of dermal conditions and specific types of cancer, such as acute promyelocytic leukemia. However, when administered chronically, there is strong evidence that RAR agonists cause hepatosteatosis and liver injury. After compiling information on ligand-protein-interactions, common substructures and physico-chemical properties of ligands were identified manually and coded into SMARTS strings. Based on these SMARTS strings and calculated physico-chemical features, a rule-based screening workflow was built within the KNIME platform. The workflow was evaluated on two datasets: one with RAR agonists exclusively and another large, chemically diverse dataset containing only a few RAR agonists. Possible modifications and applications of screening workflows, dependent on their purpose, are presented.
Flexible and transparent computational workflows for the prediction of target organ toxicity
2013, Richarz, Andrea-Nicole, Enoch, Steven J., Hewitt, Mark, Madden, Judith C., Przybylak, Katarzyna, Yang, Chihae, Berthold, Michael R., Meinl, Thorsten, Ohl, Peter, Cronin, Mark T. D.
In silico modeling of target organ toxicity has been held back in part by an inability to capture all relevant information into a meaningful reductionist approach. It has also been considered at times too simplistic, using data of often variable quality and seldom allowing the user to assess the relevance to the intended use. The purpose of this study was to develop a novel computational toxicology workflow system, to allow the users greater control and understanding of the target organ toxicity prediction. The workflows were built on the KNIME open-access platform which allows pipelining via a graphical user interface. Various building blocks, known as nodes, were incorporated, to access chemical inventories and/or databases, to profile structures and calculate properties and to report prediction results. The “basic” user sees a web-interface, whilst a “trained” user can go behind this to interrogate the nodes and, if required, link to additional data sources or investigate and update the models. The workflow was developed to address in particular the prediction of target organ toxicity of cosmetic ingredients. It comprises an inventory of over 4,400 unique chemical structures (cosmetic ingredients and related substances). The database contains repeat dose toxicity data for over 1,100 compounds including NOEL values. Thus, a user is able to search for similar compounds in the inventory file or database. The compound is then profiled using relevant structural alerts and chemotypes, currently comprising 108 alerts for protein reactivity, 85 for DNA binding, 32 for phospholipidosis and 16 for other liver toxicity endpoints. The workflows are flexible and transparent, they are successful in guiding a user through the process of making a prediction of target organ toxicity. Supported by the EU FP7 COSMOS Project.
Workflow tools for managing biological and chemical data
2012, Meinl, Thorsten, Wiswedel, Bernd, Berthold, Michael R.
KNIME-CDK : Workflow-driven cheminformatics
2013, Beisken, Stephan, Meinl, Thorsten, Wiswedel, Bernd, Figueiredo, Luis F. de, Berthold, Michael R., Steinbeck, Christoph
Cheminformaticians have to routinely process and analyse libraries of small molecules. Among other things, that includes the standardization of molecules, calculation of various descriptors, visualisation of molecular structures, and downstream analysis. For this purpose, scientific workflow platforms such as the Konstanz Information Miner can be used if provided with the right plug-in. A workflow-based cheminformatics tool provides the advantage of ease-of-use and interoperability between complementary cheminformatics packages within the same framework, hence facilitating the analysis process.
KNIME-CDK comprises functions for molecule conversion to/from common formats, generation of signatures, fingerprints, and molecular properties. It is based on the Chemistry Development Toolkit and uses the Chemical Markup Language for persistence. A comparison with the cheminformatics plug-in RDKit shows that KNIME-CDK supports a similar range of chemical classes and adds new functionality to the framework. We describe the design and integration of the plug-in, and demonstrate the usage of the nodes on ChEBI, a library of small molecules of biological interest.
KNIME-CDK is an open-source plug-in for the Konstanz Information Miner, a free workflow platform. KNIME-CDK is build on top of the open-source Chemistry Development Toolkit and allows for efficient cross-vendor structural cheminformatics. Its ease-of-use and modularity enables researchers to automate routine tasks and data analysis, bringing complimentary cheminformatics functionality to the workflow environment.
What's new in KNIME?
2012, Meinl, Thorsten
Maximum-Score Diversity Selection for Early Drug Discovery
2011-02-28, Meinl, Thorsten, Ostermann, Claude, Berthold, Michael R.
Diversity selection is a common task in early drug discovery. One drawback of current approaches is that usually only the structural diversity is taken into account and activity information is ignored. In this article we present a modified version of diversity selection - which we term "Maximum-Score Diversity Selection" - that additionally takes the estimated or predicted activities of the molecules into account. We show that finding an optimal solution to this problem is computationally very expensive (it is NP-hard) and therefore heuristic approaches are needed.
After a discussion of existing approaches we present our new method which is computationally far more efficient but at the same time produces comparable results. We conclude by validating these theoretical differences on several datasets.