Artificial Intelligence

  • Cheminformatics
  • Materials Informatics
  • Process Informatics

banner

Artificial Intelligence (AI) and Cheminformatics

 Artificial intelligence generally refers to technology that extracts patterns and knowledge from data and performs learning and inference, as exemplified by large-scale language models such as GhatGPT in recent years. The technology (science) that applies AI in the field of chemistry is called cheminformatics, the technology used in the materials field in particular is know as materials informatics, and the technology used in process chemistry is called process informatics. Cheminformatics is one of the indispensable technologies in modern chemical research, solving various chemical problems using models that learn and convert various types of information related to chemistry into knowledge.

What is cheminformatics?

 “Cheminformatics” is a combination of the words “chemistry” and “informatics” and was proposed by F. K. Brown in 1998. It is a research field that aims to accelerate research and development in the field of chemistry by applying machine learning (AI), and is used for example in the following fields:

  • Molecular design of functional molecules
  • Creation of synthetic pathways
  • Prediction of compound properties and interactions (structure-activity relationships)
  • Classification of chemical structures and reactions
  • Protein function prediction
  • Reduction of the number of experiments (experimental design)

Representative methods used in chemoinformatics

Multivariate analysis (regression analysis)

 Multivariate analysis is a method for predicting numerical values by constructing a model equation from multivariate data obtained experimentally or computationally. For example, if you plot the maximum temperature as x and ice cream sales at a convenience store as y on a daily basis, you would expect linearity to emerge. Here, the equation of a line characterized by two parameters w and b is: y=wx+by = wx + b .
This is called a model or mathematical model, w is the regression coefficient (weight), and b is the bias. Constructing a mathematical model in this way to predict one dependent variable y from one explanatory variable x is called simple regression analysis.

 The above method can also be applied to construct a mathematical model when dealing with multiple explanatory variables. This is called multiple regression analysis. In multiple regression analysis, for example, if two explanatory variables x1 and x2 are strongly correlated with each other, multicollinearity will occur and the accuracy of the regression coefficients will deteriorate. A typical method to prevent this is Partial Least Squares, which calculates the principal components without using the data directly (Principal Component Analysis) and then performs regression on those principal components.

Classification

 While the purpose of regression analysis is to predict values, classification is a method to determine which group data belongs to. Representative methods include the k-nearest neighbor method, which arranges training data according to rules and judges data with similar attributes as being in the same group. Other method is the support vector machine (SVM), which classifies classes into two using the idea of ​​maximizing margins. Classification can also be performed using decision trees, which use a tree structure to classify with conditional branching. Yet another option is the random forest method, which uses multiple decision trees to vote on the results of each estimation.

Neural networks

 A neural network is a mathematical model that mimics the function of neurons in the human brain. This mechanism is able to find complex decision boundaries by inserting an intermediate layer between the input layer and the output layer. It can perform both classification and regression. Deep learning is a method that uses multiple intermediate layers.

cta-img

Inquiries and quotes regarding artificial intelligence

By configuring a computational strategy suited to each individual task, it is possible to theoretically approach a variety of problems. Please feel free to contact us.

Inquiries and quotes