Protocols Used in QSAR
QSAR stands for quantitative structure activity relationship. Used to show how a chemical structure can be quantitatively correlated to a specific process, QSAR is frequently used for designing medications and drugs. This methodology is used to guide chemical synthesis, as there are many thousands of different possible structures for one molecule and scientists might otherwise waste a lot of time testing every single structure. Statistics play an important part in the methodology of QSAR.-
Multiple Linear Regression
-
Multiple linear regression analysis is used when there are several studies used that are all related to one specific chemical structure. An equation is developed that, for example, can show a relationship between reducing tumor size and the presence of a hydrophobic group on the fourth position of a phenyl ring. This analysis reduces many different compounds so the correlation can be interpreted using only a few parameters that are the most important.
Pattern Recognition
-
Pattern recognition is used to define parameters that result when specific chemical structures are clustered together. There are several different types of pattern recognition analysis, which include principal component Analysis, computer automated structure evaluation and automated data analysis by pattern recognition techniques. Pattern recognition statistics use original data and will correlate the different structures with the biological results based on different dimensions, with the most significant being in the first calculated component.
Comparative Molecular Field Analysis
-
This version of QSAR takes partial least squares analysis paired with cross validation to create predictions for the biological activity. In this type of methodology, the scientist will assign specific rules for alignment of molecular structure. Each of the molecules is fit into a grid and then different interactions are calculated based on interactions from a probe atom to the different grids. There are many different equations that can result. Unlike other type of regression, this produces equations in which there are much more parameters than possible compounds.
Apex - 3D
-
Apex-3D uses a system that automatically selects the best alignment and conformation for the structures based on the needed biological response from previous experiments. It is possible to uncover the effect of different binding orientation, antagonist activity versus agonist activity and effect of different receptors. This generates 2D and 3D topographical matrices and can generate information for pairs of molecules rather than single molecules.
Genetic Function Approximation
-
Genetic function approximation is used when there are few samples and many different variables being investigated, and when the data sets do not have linear relationships. This uses best-rated models and worst-rated models calculated from the raw data. It builds better and better quality models by replacing the worst-rated models. The many different multiple fits are then provided to the scientist, who then selects the final model. The similarities between the models are studied to provide information on the structural and biological correlations.
-