Although basic principles promoting laboratory animal welfare were developed almost 50 years ago, only nowadays with better understanding of toxicological mechanisms and new biological technologies tools are at hand to efficiently implement these principles into toxicology.
For most of the important toxicological endpoints in vitro approaches were or are developed. Once a test is developed and is promising, the final step to assess its reliability and relevance is usually a multi-laboratory validation study. However, validation, having proven to be an appropriate tool, still requires optimization and further development.
In the first part of this thesis, the field of evidence-based medicine in clinical diagnostics was identified as a topic closely related to toxicology and validation, whose methodologies are worthwhile to explore and, if indicated to be adapted. Basically two concepts, the prevalence and the imperfectness of the reference standard, were transferred to the field of toxicology. Considering typical toxicological situations, several ways to implement these concepts, especially assessment of toxicological tests, were explored. For example prevalence, a key determinant for the practical value of a diagnostic measure for a disease in clinical diagnostics, was considered in the context of a toxicological health effect, or the impact of the imperfectness of the reference standard, i.e. usually an in vivo experiment, on the prevalence of a toxicological effect was modelled.

Prevalence according to the reference standard was evaluated in detail for the skin irritation potential of chemical substances. Two databases containing in vivo data of the rabbit experiment demanded by regulators were analysed focusing on different prevalence related aspects. The prevalence according to regulatory classification schemes was described in detail, where less than 10% of more than 3000 substances showed irritation potential requiring a hazard label. Analysis of the within-experiment reproducibility of the rabbit experiment revealed that this is only a minor source of variability. The prevalence was combined with the results of the modelling of the within-test variability in order to evaluate the utility of the approach and compare two classification schemes.

A large-scale international validation study for novel pyrogenicity tests focusing on the chosen reference standard was guided with regard to design and biometry and was thoroughly evaluated. In this study six in vitro test based on the human fever reaction were validated against the pharmacopoeial rabbit pyrogen test. This concentration of 0.5 Endotoxin units/ml, which was retrospectively derived from rabbit data, was defined as the threshold level novel pyrogen test would have to meet. Making use of this threshold a very challenging study was designed to evaluate the performance of the novel test for low, but crucial contamination levels. In the validation phase four tests performed reliably with predictive capacity outperforming the rabbit test, for which corresponding results were modelled for the chosen study design.

The evaluation of the rabbit pyrogen test initiated by the validation study triggered an in-depth analysis. Based on a probabilistic model of the rabbit fever reaction, this analysis focused on international harmonization of pharmacopoeial approaches and on modelling of new experimental designs of the rabbit test. Comparing three pharmacopoeial tests strongly demonstrated need for harmonization. Probability curves for pyrogenic classification differed substantially between Pharmacopoeias. Modelling the corresponding animal consumption revealed substantial differences between the pharmacopoeial tests. Using the rabbit model, two new test designs were developed. While maintaining safety levels of current pharmacopoeial tests, animal consumption could be reduced by 30% taking prevalence information on pyrogenicity into account.

In summary, with several studies implementing different principles of evidence-based medicine the usefulness of these principles for in vitro toxicology was demonstrated. Prevalence is a most important information for a comprehensive and complete test assessment. Similar, detailed evaluation of reference tests in validation studies is mandatory to allow an optimal relevance assessment for both reference and in vitro test. Introducing and employing these approaches systematically into the field of in vitro toxicology would constitute a first step towards an evidence-based toxicology.

