In Silico Screening: selection of ligands
Neural Network Ligand Selection
Targets are first docked with 10,000 compounds randomly selected from the ZINC database. The results are used as a training set for a back propagation neural network(1) that evaluates the nine curated characteristics and calculated binding energies of the compounds in order to learn to distinguish binding from non-binding compounds. This program can then evaluate the characteristics of undocked compunds to predict their calculated binding energies. Libraries of commercially available or FDA approved drug compounds are thus screened with the neural net to select a set of probable ligands. Molecular docking is then performed with several thousand selected compounds using Dock5.2 on a BlueGene supercomputer.
The neural network has improved the screening throughput significantly. For example, to determine the top 1000 of 2 million randomly chosen ligands by computing the actual docking energy for every ligand will take approximately 1000 times longer than docking 1000 random ligands, training a neural network with the results, using it to screen all 3 million compounds in ZINC, and docking the 1000 ligands predicted by the neural network to be the best binders. Direct comparison of the results of these two methods show that, for three proteins tested, 60% of the best ligands identified by docking were also identified by the trained neural network, over a range of sample sizes from 1000 to 20,000 compounds. This represents a three-fold improvement over a random selection of compounds (see graph). Preliminary testing with these three proteins has demonstrated that to minimize the RMS error between docking energies computed by DOCK5 and those predicted by the neural network, the training set should include at least 1000 randomly chosen ligands. Performance does not improve significantly by increasing the size of the training set from about 2000 to 60,000 compounds, and the minimum error is found with a training set of about 1500 (see graph). Additional targets are in the process of being analyzed in this way to determine whether this small training set size remains optimal over a range of different target proteins.
The average binding energies in the table below link to graphs which show the improvement of binding energies when the compound library is pre-screened with the neural network. These data represent the proof of principle test of the neural network program for selection of a set of compounds enriched for probable ligands. The test targets are a soluble toxin, Phospholipase C; an intracellular enzyme in folate biosynthesis, DHNA; and an intracellular, rate-limiting enzyme in CoA biosynthesis, PPAT.
| Protein targets | Calculated Binding Energy, mean +/- SD (kJ/mol) | Structures and Binding Energies | |||
|---|---|---|---|---|---|
| Functional Role | PDB | Random Ligands | Selected Ligands | Random Ligands | Selected Ligands |
| Phospholipase C | 1AH7 | -34.20 +/- 6.41 | -61.21 +/- 13.97 | ||
| 7,8-Dihydroneopterin aldolase | 1DHN | -31.99 +/- 4.84 | -48.72 +/- 6.21 | ||
| Phosphopantetheine adenylyltransferase | 1TFU | -38.57 +/- 5.33 | -56.49 +/- 5.49 | ||
1. Margaryan GR (2006) An Artifical Neural Network Approach to Prescreening of Large Chemical Datasets, Master's Thesis, University of Chicago.