Accurate and Interpretable Computational Modeling of Chemical Mutagenicity.

We describe a method for modeling chemical mutagenicity in terms of simple rules based on molecular features. A classification model was built using a rule-based ensemble method called RuleFit, developed by Friedman and Popescu. We show how performance compares favorably against literature methods. Performance was measured through the use of cross-validation and testing on external test sets. All data sets used are publicly available. The method automatically generated transparent rules in terms of molecular structure that agree well with known toxicology. While we have focused on chemical mutagenicity in demonstrating this method, we anticipate that it may be more generally useful in modeling other molecular properties such as other types of chemical toxicity.


James J. Langham, Ajay N. Jain
September 5, 2008
