Skip to content Skip to sidebar Skip to footer

Does Kmeans Normalize Features Automatically In Sklearn

I was wondering if KMeans automatically normalizes the features before doing clustering. There seems to be no option to provide an input to ask for normalization.

Solution 1:

One differentiates data preprocessing (normalization, binning, weighting etc) and machine learning algorithms application. Use sklearn.preprocessing for data preprocessing. Moreover, data can be preprocessed in chain by different preprocessors.

As for K-means, often it is not sufficient to normalize only mean. One normalizes data equalizing variance along different features as K-means is sensitive to variance in data, and features with larger variance have more emphasis on result. So for K-means, I would recommend using StandardScaler for data preprocessing.

Don't forget also that k-means results are sensitive to the order of observations, and it is worth to run algorithm several times, shuffling data in between, averaging resulting clusters and running final evaluations with those averaged clusters centers as starting points.

Post a Comment for "Does Kmeans Normalize Features Automatically In Sklearn"