机器学习 | 算法模型 —— 聚类:FCM模糊聚类算法

谁借莪1个温暖的怀抱¢ 2023-02-22 13:34 183阅读 0赞

1.FCM模糊聚类原理

模糊c均值聚类FCM算法融合了模糊理论的精髓,相较于k-means的硬聚类,FCM算法(Fuzzy C-Means,FCM)提供了更加灵活的聚类结果。因为大部分情况下,数据集中的对象不能划分成为明显分离的簇,将一个对象划分到一个特定的簇有些生硬,不符合人的客观认知。因此,对每个对象和每个簇赋予一个权值,指明对象属于该簇的程度即可。当然,基于概率的方法也可以给出这样的权值,但是有时候我们很难确定一个合适的统计模型,因此使用具有自然地、非概率特性的FCM聚类算法就是一个比较好的选择。

2.FCM模糊聚类流程

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FkbWluX21heGlu_size_16_color_FFFFFF_t_70

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FkbWluX21heGlu_size_16_color_FFFFFF_t_70 1

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FkbWluX21heGlu_size_16_color_FFFFFF_t_70 2

watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2FkbWluX21heGlu_size_16_color_FFFFFF_t_70 3

3.FCM模糊聚类程序

  1. from pylab import *
  2. from numpy import *
  3. import pandas as pd
  4. import numpy as np
  5. import operator
  6. import math
  7. import matplotlib.pyplot as plt
  8. import random
  9. # 数据保存在.csv文件中
  10. df_full = pd.read_csv("iris.csv")
  11. columns = list(df_full.columns)
  12. features = columns[:len(columns) - 1]
  13. # class_labels = list(df_full[columns[-1]])
  14. df = df_full[features]
  15. # 维度
  16. num_attr = len(df.columns) - 1
  17. # 分类数
  18. k = 3
  19. # 最大迭代数
  20. MAX_ITER = 100
  21. # 样本数
  22. n = len(df) # the number of row
  23. # 模糊参数
  24. m = 2.00
  25. # 初始化模糊矩阵U
  26. def initializeMembershipMatrix():
  27. membership_mat = list()
  28. for i in range(n):
  29. random_num_list = [random.random() for i in range(k)]
  30. summation = sum(random_num_list)
  31. temp_list = [x / summation for x in random_num_list] # 首先归一化
  32. membership_mat.append(temp_list)
  33. return membership_mat
  34. # 计算类中心点
  35. def calculateClusterCenter(membership_mat):
  36. cluster_mem_val = zip(*membership_mat)
  37. cluster_centers = list()
  38. cluster_mem_val_list = list(cluster_mem_val)
  39. for j in range(k):
  40. x = cluster_mem_val_list[j]
  41. xraised = [e ** m for e in x]
  42. denominator = sum(xraised)
  43. temp_num = list()
  44. for i in range(n):
  45. data_point = list(df.iloc[i])
  46. prod = [xraised[i] * val for val in data_point]
  47. temp_num.append(prod)
  48. numerator = map(sum, zip(*temp_num))
  49. center = [z / denominator for z in numerator] # 每一维都要计算。
  50. cluster_centers.append(center)
  51. return cluster_centers
  52. # 更新隶属度
  53. def updateMembershipValue(membership_mat, cluster_centers):
  54. # p = float(2/(m-1))
  55. data = []
  56. for i in range(n):
  57. x = list(df.iloc[i]) # 取出文件中的每一行数据
  58. data.append(x)
  59. distances = [np.linalg.norm(list(map(operator.sub, x, cluster_centers[j]))) for j in range(k)]
  60. for j in range(k):
  61. den = sum([math.pow(float(distances[j] / distances[c]), 2) for c in range(k)])
  62. membership_mat[i][j] = float(1 / den)
  63. return membership_mat, data
  64. # 得到聚类结果
  65. def getClusters(membership_mat):
  66. cluster_labels = list()
  67. for i in range(n):
  68. max_val, idx = max((val, idx) for (idx, val) in enumerate(membership_mat[i]))
  69. cluster_labels.append(idx)
  70. return cluster_labels
  71. def fuzzyCMeansClustering():
  72. # 主程序
  73. membership_mat = initializeMembershipMatrix()
  74. curr = 0
  75. while curr <= MAX_ITER: # 最大迭代次数
  76. cluster_centers = calculateClusterCenter(membership_mat)
  77. membership_mat, data = updateMembershipValue(membership_mat, cluster_centers)
  78. cluster_labels = getClusters(membership_mat)
  79. curr += 1
  80. print(membership_mat)
  81. return cluster_labels, cluster_centers, data, membership_mat
  82. def xie_beni(membership_mat, center, data):
  83. sum_cluster_distance = 0
  84. min_cluster_center_distance = inf
  85. for i in range(k):
  86. for j in range(n):
  87. sum_cluster_distance = sum_cluster_distance + membership_mat[j][i] ** 2 * sum(
  88. power(data[j, :] - center[i, :], 2)) # 计算类一致性
  89. for i in range(k - 1):
  90. for j in range(i + 1, k):
  91. cluster_center_distance = sum(power(center[i, :] - center[j, :], 2)) # 计算类间距离
  92. if cluster_center_distance < min_cluster_center_distance:
  93. min_cluster_center_distance = cluster_center_distance
  94. return sum_cluster_distance / (n * min_cluster_center_distance)
  95. labels, centers, data, membership = fuzzyCMeansClustering()
  96. print(labels)
  97. print(centers)
  98. center_array = array(centers)
  99. label = array(labels)
  100. datas = array(data)
  101. # Xie-Beni聚类有效性
  102. print("聚类有效性:", xie_beni(membership, center_array, datas))
  103. xlim(0, 10)
  104. ylim(0, 10)
  105. # 做散点图
  106. fig = plt.gcf()
  107. fig.set_size_inches(16.5, 12.5)
  108. f1 = plt.figure(1)
  109. plt.scatter(datas[nonzero(label == 0), 0], datas[nonzero(label == 0), 1], marker='o', color='r', label='0', s=10)
  110. plt.scatter(datas[nonzero(label == 1), 0], datas[nonzero(label == 1), 1], marker='+', color='b', label='1', s=10)
  111. plt.scatter(datas[nonzero(label == 2), 0], datas[nonzero(label == 2), 1], marker='*', color='g', label='2', s=10)
  112. plt.scatter(center_array[:, 0], center_array[:, 1], marker='x', color='m', s=30)
  113. plt.show()

发表评论

表情:
评论列表 (有 0 条评论,183人围观)

还没有评论,来说两句吧...

相关阅读