Diversity selection is a common task in early drug discovery. One drawback of current approaches is that usually only the structural diversity is taken into account and activity information is ignored. In this article we present a modified version of diversity selection - which we term "Maximum-Score Diversity Selection" - that additionally takes the estimated or predicted activities of the molecules into account. We show that finding an optimal solution to this problem is computationally very expensive (it is NP-hard) and therefore heuristic approaches are needed. After a discussion of existing approaches we present our new method which is computationally far more efficient but at the same time produces comparable results. We conclude by validating these theoretical differences on several datasets.

