Article Info
Comparison of Similarity Method to Improve Retrieval Performance for Chemical Data
Suhaila Zainudin, Nevy Rahmi Nurjana
dx.doi.org/10.17576/apjitm-2018-0701-08
Abstract
Drug discovery is the process through which new drugs are discovered. One of the most common techniques in drug discovery is similarity searching based on virtual screening that involves comparing the similarity between molecule structures in chemical database using established similarity methods. The objective of this study is to identify the similarity of the structure in chemical dataset using Mean Pairwise Similarity (MPS) calculation and to determine the best coefficient to be used in similarity searching which involves of molecular descriptor ECFP2 fingerprint and three types of similarity coefficient which are Tanimoto, Soergel and Euclidean. From the results, it was deduced that Tanimoto and Soergel coefficients has a better performance than Euclidean coefficient. For future work, different combinations of fingerprints such as Daylight, BCI, Unity MDL and similarity coefficient can be studied further.
keyword
mean pairwise similarity; virtual screening; similarity searching; retrieval; chemoinformatics
Area
Data Mining and Optimization