Identification of Polygonatum odoratum based on support vector machine
Zhong Li1, Jie Zheng2, Qin Long1, Yi Li1, Huaying Zhou3, Tasi Liu4, Bin Han1
1 Department of Traditional Chinese Medicine Resources, College of Traditional Chinese Medicine, Guangdong Pharmaceutical University, Guangzhou, China
2 Department of Pharmaceutical Engineering, College of Chemical Engineering and Light Industry, Guangdong University of Technology, Guangzhou, China
3 Department of Computer Science, College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou, China
4 Department of Traditional Chinese Medicine Resources, College of Traditional Chinese Medicine, Hunan University of Chinese Medicine, Changsha, China
|Date of Submission||27-Sep-2019|
|Date of Decision||31-Oct-2019|
|Date of Acceptance||21-Apr-2020|
|Date of Web Publication||20-Oct-2020|
College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou,510006
Source of Support: None, Conflict of Interest: None
| Abstract|| |
Background: The dried rhizome of Polygonatum odoratum (Mill.) Druce has been widely used in traditional medicinal preparations in China, Japan, and Korea. In China, it is distributed in Hunan, Guangdong, and Liaoning provinces, and its quality differs from habitat to habitat. In addition, P. odoratumhas some adulterants, such as Polygonatum inflatumKom, Polygonatum prattii Baker, and Polygonatum cyrtonema Hua. The morphological traits and chemical composition of the aforementioned adulterants have many similarities with those of P. odoratum. Therefore, it is possible that people often use adulterants instead of P. odoratum for clinical treatment. Objectives: We aimed to establish a reliable and accurate classification model of P. odoratum based on the support vector machine (SVM) and identify it from different habitats; we also aimed to identify its adulterants. Materials and Methods: In this study, we first determined the ultraviolet (UV) absorption spectrum of the water extract of the rhizome from 162 samples (including P. odoratum from Hunan, Guangdong, Heilongjiang, Yunnan, and Liaoning Provinces and adulterant species including P. inflatum, P. prattii,P. cyrtonema, and Disporopsis pernyi (Hua) Diels) by UV-visible spectrophotometry. The UV absorption data were preprocessed with the SVM model before establishing the habitat and other details. Results: According to our results, the SVM model showed a prediction accuracy of 100%. The model accurately identified five different habitats and four different adulterants of P. odoratum. Pretreatment of samples with UV spectrum might be useful in the accurate identification of P. odoratum. Conclusion: The SVM model seems very prospective in identifying herbs with multiple habitats and its adulterants.
Keywords: Adulterants, identification, Polygonatum odoratum, support vector machine, ultraviolet
|How to cite this article:|
Li Z, Zheng J, Long Q, Li Y, Zhou H, Liu T, Han B. Identification of Polygonatum odoratum based on support vector machine. Phcog Mag 2020;16:538-42
|How to cite this URL:|
Li Z, Zheng J, Long Q, Li Y, Zhou H, Liu T, Han B. Identification of Polygonatum odoratum based on support vector machine. Phcog Mag [serial online] 2020 [cited 2021 Jul 25];16:538-42. Available from: http://www.phcog.com/text.asp?2020/16/71/538/298653
- In this study, R language optimized the ultraviolet (UV) spectral data of water extracts of Polygonatum odoratum and helped to establish the support vector machine (SVM) to identify and classify P. odoratum from different habitats and its various adulterants. Our results showed that the prediction accuracy of the SVM model was 100%, and the method of SVM pretreatment UV spectrum could be used to identify P. odoratum.
Abbreviations used: P.: Polygonatum; UV: Ultraviolet; SVM: Support vector machine; TCM: Traditional Chinese medicine; Fig: Figure.
| Introduction|| |
Polygonatum odoratum (Mill.) Druce, native to many parts of the world, belongs to the Liliaceae family. It has been widely used in the preparations of traditional Chinese medicine (TCM) as a component of medications intended to treat diabetes, Qi-tonify, and clear the heat. P. odoratum is distributed in many provinces of China, such as Hunan, Guangdong, and Liaoning, and its quality differs based on its habitat. In addition, the morphological traits and chemical composition of the adulterants of P. odoratum have many similarities with P. odoratum. For example, the rhizome of Polygonatum inflatum is used as P. odoratum in Northeast China and Polygonatum prattii is mistaken for P. odoratum in Sichuan and Yunnan Provinces. Polygonatum cyrtonema is mixed with P. odoratum to be used as medicine in other parts of China. Therefore, the adulterants of P. odoratum have been wrongly used for thousands of years in China. However, only P. odoratum has been included in the “Pharmacopoeia of The People's Republic of China” as an authentic Chinese medical herb. It is a great challenge to distinguish different species under the same genus – Polygonatum – as their dried and sliced rhizomes are very much similar looking. So far, there is no effective method to identify the difference between P. odoratum and its adulterants. However, it is highly essential to identify each one of them with a reliable and accurate method when applied as a medicine; the medicinal ingredients of the misnamed P. odoratum might cause severe consequences to the patients.
In recent years, many new technologies and methods have been established to identify and characterize Chinese herbal medicines; support vector machine (SVM) is a well-known classification paradigm in machine learning. It is a supervised learning model with associated learning algorithms that analyze data for classification and regression analysis. It has become a hot spot for new research due to its inherent outstand learning ability, and has been widely used to classify TCM, elucidate structure–activity relationships,,
We combined ultraviolet (UV) spectrometry with SVM to identify P. odoratum samples from different habitats such as Hunan, Guangdong, Heilongjiang, Yunnan, and Liaoning Provinces and from different neighboring species such as P. inflatum, Polygonatum prattii, P. cyrtonema, and Disporopsis pernyi (Hua) Diels. In this study, we aimed to developa fast-identifying model based on SVM.
| Materials and Methods|| |
Samples were collected from Hunan, Guangdong, Heilongjiang, Yunnan, and Liaoning Provinces in China. [Table 1] shows information regarding each sample. All these samples were authenticated by Associate Professor Zhong Li (College of TCM, Guangdong Pharmaceutical University, Guangzhou, China).
|Table 1: Samples information of Polygonatum odoratum and its adulterants|
Click here to view
Preparation of water extract
The coarse powder of each sample was accurately weighed (2.0 g) and placed in a 50 mL volumetric flask by adding 20 mL distilled water, and then, the samples were ultrasonicated for 20 min at room temperature and filtered through a 20um quantitative fitler paper. From this stock solution, 5 mL of the filtrate was taken into a 100 mL volumetric flask and was diluted up to the mark with distilled water and mixed well. This working solution was used to test the absorbance.
Ultraviolet absorption spectroscopy
The absorbance of the samples was measured at 200–400 nm using a UV–visible spectrophotometer, and the sampling interval was set to 1 nm. During SVM modeling, the absorbance of each sample will affect the classification in a different way; therefore, it is necessary to centralize and standardize the absorbance of different ranges of wavelengths. This will ensure that all data participate in the construction of the classifier model on the same scale. The spectral data are centralized and standardized using the scale function of R software. In addition, the data also include redundant information which might lead to errors in the SVM modeling; therefore, it is necessary to optimize the wavelength range. In this study, based on the SVM variable selection method, the absorbance data in the wavelength range of 200–400 nm are sorted according to the closeness of the SVM classification index and the optimal wavelength absorbance data are selected to establish the classifier model.
Creating the support vector machine model
From the total sample, we randomly selected 80% of the samples as the training set and the remaining 20% of the samples as the testing set. Data on the training set were used to build the SVM model, and then, the data on the testing set were used to validate the model. The program interface of Lib SVM in e1071 provided by R Software was used to create SVM classifier modeling.
| Results|| |
Ultraviolet spectral absorption of Polygonatum odoratum from different habitats and different species
As shown in [Figure 1], the UV spectral absorption of water extracts of P. odoratum from different regions is very similar; therefore, it is difficult to find the region of origin of P. odoratum only by the spectral analysis. Furthermore, the UV spectral data of P. odoratum and other neighboring species are also very similar [Figure 2]. The traditional method of tasting or visualizing the product was therefore not helping to differentiate between P. odoratum and its adulterants, which has caused major confusion for thousands of years. Therefore, it is important to develop an advanced method to identify P. odoratum and improve itsproduction standard and quality.
|Figure 1: Ultraviolet spectral absorption of Polygonatum odoratum from different producing areas|
Click here to view
|Figure 2: Ultraviolet spectral absorption of different Polygonatum species|
Click here to view
Selection of ultraviolet wavelength
The variable selection algorithm is performed by SVM, and the absorbances of all wavelengths are sorted according to the importance degree. [Table 2] shows the sorting results. AvgRank is the sorting index – the smaller the value of AvgRank, the closer the relation to its classification. In this study, the top 40 wavelengths of absorbance data were used for SVM modeling analysis.
A total of 162 samples from 9 producing places were randomly divided into the training set and testing set by “Sample” function in Software R [Table 3].
Selection of radial basis function kernel γ and error warning factor C
We screened the radial basis function kernel γ while creating the SVM model. According to the results, when γ value changes from 0.125 to 16, it has no impact on the results of the training set, whereas the accuracy for the testing set decreases [Table 4]. This is because γ controls the amplitude of the radial base function, which controls the generalizability of SVM. Based on this, γ = 0.125 was selected.
|Table 4: Support vector machine predicating ability on different radial basis function kernel γ|
Click here to view
Based on the same philosophy, error warning factor C was also screened to optimize the SVM model. [Table 5] shows the results. The smaller the value of C, the smaller will be the penalty, which makes the training error larger. The structural risk to the system is confined by empiric risk and confidence level; therefore, a large training error may cause an increase in the structural risk and worsen the generalizability of the system. Therefore, the value of C has a tremendous influence on the system's generalizability. Based on the data presented in [Table 5], it is obvious that when C is between 2 and 16, the classification accuracy is stabilized. Therefore, we selected the value of C = 2 for this model.
|Table 5: Support vector machine predicating ability on different error warning factor C|
Click here to view
Identification results by support vector machine
The optimized SVM classifier was built with the optimized parameters obtained from the previous analysis on the radial base kernel γ and error warning factor C. [Table 6] shows the prediction result of 130 training samples by SVM, and the prediction accuracy rate was 100%. With the optimized factors, all 32 samples of the test set were validated. The identification accuracy by SVM was 100% [Table 7].
|Table 6: Predication result of training samples by support vector machine|
Click here to view
|Table 7: Predication result of testing samples by support vector machine|
Click here to view
| Discussion|| |
UV absorption spectroscopy is commonly used to identify the structure of compounds or in the determination of the composition. Due to the different saturation values of each of the chemical components contained in TCM, the peak of the absorption curve, the shape of the peak, and the strength of the peak are different. In addition, UV absorption spectroscopy is a simple and effective method used to identify distantly related to traditional Chinese herbal medicines, but it is not effective when identifying adulterants. SVM can solve this problem more effectively. Based on the UV spectral data of P. odoratum samples, SVM successfully identified and differentiated between the P. odoratum samples from different habitats and its adulterants.
Proper identification and classification of TCM is a common problem. The majority of the data obtained are unlabeled which lead to problems in identification and classification. Therefore, the question of how to use these data effectively and improve the accuracy of modeling techniques needs an urgent answer.
In this study, we used the R language to screen the best spectral pretreatment. It is found that the absorbance data of the first 40 wavelengths can effectively be used in the modeling and analysis of herbal medicines. The predicted accuracy of the established SVM classifier is 100%, which is, in turn, based on the predicted accuracy of the whole model. This shows that SVM has the advantages of a faster learning rate, high accuracy, and high generalizability. These features can help to solve the quality problem of TCM originating from different habitats and provide a new method for the effective identification of complex TCM.
Financial support and sponsorship
This work was supported by Guangdong Science and Technology Department Project (Grant No. 2016A020226018), Ministry of National Science and Technology Support Program Project of China (No. 2011BA101B09), and Central support for a local college project (Grant No. 51348000).
Conflicts of interest
There are no conflicts of interest.
| References|| |
Chinese Pharmacopoeia Commission. Pharmacopoeia of The People's Republic of China (Part 2). Beijing: Chemical Industry Press; 2015.
Tahir M, Jan B, Hayat M, Shah SU, Amin M. Efficient computational model for classification of protein localization images using extended threshold adjacency statistics and support vector machines. Comput Methods Programs Biomed 2018;157:205-15.
Chen C, Chen LX, Zou XY, Cai PX. Predicting protein structural class based on multi-features fusion. J Theor Biol 2008;253:388-92.
Jun WY, Yue Y, Yu ZJ, Song LX, Jiang WY, Tao ZW. Geographical origin discrimination of herba epimedii by near infrared spectroscopy. Lishizhen Med Mater Medica Res 2017;28:1902-5.
Ruiz IL, Gómez-Nieto MÁ. Advantages of Relative Versus Absolute Data for the Development of Quantitative Structure-Activity Relationship Classification Models.J Chem Inf Model;2017:2776-88.
Luque Ruiz I, Gómez-Nieto MÁ. Robust QSAR prediction models for volume of distribution at steady state in humans using relative distance measurements. SAR QSAR Environ Res 2018;29:529-50.
Tierney L. The R Statistical Computing Environment. In: Statistical Challenges in Modern Astronomy V. New York: Springer; 2012.
Zhang C, Shen T, Liu F, He Y. Identification of coffee varieties using laser-induced breakdown spectroscopy and chemometrics. Sensors (Basel) 2017;18:95.
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F. Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), Tu wien.UTC; 2019.
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7]