decorative imagedecorative image

Selection of breast features for young women in northwestern China based on the random forest algorithm

Jie Zhou*Qian MaoJun ZhangNewman LauJianming Chen
Textile Research Journal

In the research of breast morphology, numerous breast features are measured, whereas only a few parameters are adopted for classification. Therefore, how to extract the key variables from the multi-dimensional features in a rational way is an issue that is focused upon. This study aimed to reduce the complexity of the dimensionality reduction for further improving the objectivity and interpretability of the selected breast features. Since the random forest (RF) algorithm can quantify the feature importance during training, the method was adopted to determine the optimal breast features for classification and recognition in this paper. Firstly, the anthropometric data of 360 females from northwestern China aged from 19 to 27 years were measured by non-contact three-dimensional body scanning technology and the contact manual measurement method. Then, the k-means clustering was applied to categorize breast shapes, and the RF algorithm was utilized to quantify and rank the importance of 25 breast features. Finally, to verify the availability of the RF algorithm on breast feature selection, the t-distributed stochastic neighbor embedding method was adopted to visualize the distribution of breast shape clusters into two dimensions. Meanwhile, four neural networks were determined to recognize the breast morphology. The results demonstrate that fewer breast features can effectively increase the accuracy of breast shape classification and recognition. The best performance of breast shape classification and recognition is obtained when the number of breast features is 13. In this case, the average Hamming loss of four neural networks is the smallest (0.1136). Interestingly, the bust circumference and the horizontal curve of breasts across the bust points are found to be the most important of the 25 breast features in this paper. The importance of the breast curve features is higher than that of the breast cross-sectional features, while the breast positioning features have the lowest importance. Meanwhile, the RF algorithm is verified to be more effective than traditional dimensionality reduction methods, such as principal component analysis, hierarchical clustering, and recursive feature elimination. The approach developed in this paper can be generalized to the dimensionality reduction of other body morphology.