6、算法实操：Kmeans属性inertia_ - 第二节：透过随机数据集洞察聚类的本质

反馈

6、算法实操：Kmeans属性inertia_

代码运行题

一、Kmeans重要属性：inertia_

重要属性：inertia_，查看总距离平方和

二、尝试在代码框执行以下代码:

（1）导入需要的模块、库

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from sklearn.datasets import make_blobs 
from sklearn.cluster import KMeans
plt.style.use('ggplot')

（2）自建数据集

#生成500*2的数据集，每一组数据可以有4个中心点，即数据集有4个标签
X, y = make_blobs(n_samples=500,
                  n_features=2,centers=4,random_state=1)

（3）使用labels_查看聚好的类别

n_clusters = 3 # 设定聚类簇数
cluster = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
y_pred = cluster.labels_
y_pred

输出为： array([0, 0, 2, 1, 2, 1, 2, 2, 2, 2, 0, 0, 2, 1, 2, 0, 2, 0, 1, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 1, 2…])

（4）使用cluster_centers_查看簇的质心坐标

centroid = cluster.cluster_centers_
centroid  # 3个簇的质心坐标

输出为：array([[-7.09306648, -8.10994454], [-1.54234022, 4.43517599], [-8.0862351 , -3.5179868 ]])

（5）使用inertia_查看总距离平方和

inertia = cluster.inertia_
inertia  #查看总距离平方和

输出为：1903.4503741659223

如果我们把猜测的簇数换成4， Inertia 会怎么样？
n_clusters=4

cluster_ = KMeans(n_clusters=4, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_

输出为：908.3855684760613
n_clusters=5

cluster_ = KMeans(n_clusters=5, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_

输出为：811.0841324482415

练习指导

查看提示

findNaN.py

Ipython Shell