首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于自适应相似矩阵的谱聚类算法
引用本文:王贝贝,杨 明,燕慧超,孙笑仙.一种基于自适应相似矩阵的谱聚类算法[J].河北工业科技,2018,35(2):77-83.
作者姓名:王贝贝  杨 明  燕慧超  孙笑仙
作者单位:中北大学理学院;中国传媒大学理工学部;
基金项目:国家自然科学基金(61601412,61571404,61471325);山西省自然科学基金(2015021099)
摘    要:为了消除在构建谱聚类算法的相似矩阵时,高斯核函数中尺度参数的波动影响,构建了一种自适应相似矩阵,并应用到谱聚类算法中。自适应相似矩阵中数据点间的距离度量采用测地距离算法,相距较近的两点间的距离近似于欧氏距离,相距较远的两点则先根据欧氏距离得到每个数据点的k个近邻点,然后累加近邻点的测地距离,由此得到每对数据点间的最短距离。两点间的局部密度用共享近邻的定义来表示,更好地刻画了数据集的本征结构。在5个人工数据集和国际通用UCI数据库中的5个真实数据集上进行实验。实验结果表明,所提算法的聚类准确率高于对比算法的准确率,对复杂分布数据有很强的自适应能力。研究成果为数据挖掘及机器学习提供了思路和方法。

关 键 词:应用数学  相似矩阵  谱聚类  密度  测地距离
收稿时间:2017/10/31 0:00:00
修稿时间:2017/12/28 0:00:00

A spectral clustering algorithm based on adaptive similarity matrix
WANG Beibei,YANG Ming,YAN Huichao and SUN Xiaoxian.A spectral clustering algorithm based on adaptive similarity matrix[J].Hebei Journal of Industrial Science & Technology,2018,35(2):77-83.
Authors:WANG Beibei  YANG Ming  YAN Huichao and SUN Xiaoxian
Abstract:In order to eliminate the fluctuation of the scale parameters in gaussian kernel function in constructing the similarity matrix of spectral clustering algorithm, a self-adaptive similarity matrix is constructed and applied in the spectral clustering algorithm. Geodesic distance measure is used in distance measure between data points in the adaptive similarity matrix. Distance between points closer to each other is approximately equal to the Euclidean distance, while for distance between two points farther away, each data''s k-nearest neighbors are firstly obtained by Euclidean distance, then the geodesic distances of the nearest neighbors are accumulated, thus, the shortest distance between each pair of data can be get. The local density of two points is defined by the shared neighbor, reflecting the eigen structure of the data set better. Finally, experiments on both five artificial data sets and five UCI data sets show that the proposed method is more accurate than the others, and has a strong adaptive ability for complex distribution data. The research provides idea and method for data mining and machine learning.
Keywords:applied mathematics  similar matrix  spectral clustering  density  geodesic distance
本文献已被 CNKI 等数据库收录!
点击此处可从《河北工业科技》浏览原始摘要信息
点击此处可从《河北工业科技》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号