一、Kmeans重要属性:inertia_
重要属性:inertia_,查看总距离平方和
二、尝试在代码框执行以下代码:
(1) 导入需要的模块、库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
plt.style.use('ggplot')
(2)自建数据集
#生成500*2的数据集,每一组数据可以有4个中心点,即数据集有4个标签
X, y = make_blobs(n_samples=500,
n_features=2,centers=4,random_state=1)
(3)使用labels_查看聚好的类别
n_clusters = 3 # 设定聚类簇数
cluster = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
y_pred = cluster.labels_
y_pred
输出为: array([0, 0, 2, 1, 2, 1, 2, 2, 2, 2, 0, 0, 2, 1, 2, 0, 2, 0, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 1, 2…])
(4)使用cluster_centers_查看簇的质心坐标
centroid = cluster.cluster_centers_
centroid # 3个簇的质心坐标
输出为:array([[-7.09306648, -8.10994454], [-1.54234022, 4.43517599], [-8.0862351 , -3.5179868 ]])
(5)使用inertia_查看总距离平方和
inertia = cluster.inertia_
inertia #查看总距离平方和
输出为:1903.4503741659223
如果我们把猜测的簇数换成4, Inertia 会怎么样?
n_clusters=4
cluster_ = KMeans(n_clusters=4, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
输出为:908.3855684760613
n_clusters=5
cluster_ = KMeans(n_clusters=5, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
输出为:811.0841324482415