Ruby简单实现Kmeans聚类算法
http://fuliang.iteye.com/blog/891991
K-means是一个简单容易实现的聚类算法,我们以对一个图片的颜色的RGB值进行聚类为例,
实现这个算法。
K-means算法是一个EM的迭代过程:
1.随机选择k个作为聚类中心
2.E step:
对每一个点,计算它到每一个聚类中心的距离,把这个点分配到最近的聚类中心代表的
聚类中。
3.M step:
重新计算每个聚类的中心:每个聚类中心为该聚类所有点的均值。
重复2~3直到达到最大的迭代次数或者聚类不再发生变化。
Ruby代码
#!/usr/bin/ruby # autor: fuliang http://fuliang.iteye.com/ class RGB attr_accessor :r , :g , :b def initialize(r=0,g=0,b=0) @r , @g , @b = r,g,b end def +(rgb) @r += rgb.r @g += rgb.g @b += rgb.b self end def /(n) @r /= n @g /= n @b /= n self end end def random_k_centers(instances,k) rand_indxes = (0...instances.size).sort_by{rand}[0...k] instances.values_at(*rand_indxes) end def distance(ins1,ins2) #采用余弦值,因为255,255,255和200,200,200颜色基本类似 dot_product = ins1.r * ins2.r + ins1.g * ins2.g + ins1.b * ins2.b mod1 = Math.sqrt(ins1.r * ins1.r + ins1.g * ins1.g + ins1.b * ins1.b) mod2 = Math.sqrt(ins2.r * ins2.r + ins2.g * ins2.g + ins2.b * ins2.b) return 1 - dot_product / (mod1 * mod2) end def k_means_cluster(instances,k,max_iter=100) random_centers = random_k_centers(instances,k) p random_centers instance_cluster_map = {} change_count = 0 clusters = [] 0.upto(max_iter) do |iter_num| clusters = [] puts "iterate #{iter_num} ..." change_count = 0 # E-step instances. each do |instance| min_distance = 1.0/0 min_indx = 0 random_centers.each_with_index do |center,index| current_distance = distance(center,instance) if min_distance > current_distance then min_indx = index min_distance = current_distance end end if instance_cluster_map[instance] != min_indx then #trace the change change_count += 1 instance_cluster_map[instance] = min_indx end clusters[min_indx] ||= [] clusters[min_indx] << instance end puts "change_count=#{change_count}" break if change_count.zero? #M-step clusters.each_with_index do |cluster,index| center = RGB. new cluster. each do |instance| center += instance end center /= cluster.size random_centers[index] = center # update center end end return clusters end instances = [] File .open( "rgbs.txt" ).each_line do |line| line.split(/\t/). each do | rgb | r,g,b = rgb.split(/,/).collect{|e| e.to_i} instances << RGB. new (r,g,b) end end clusters = k_means_cluster(instances,5,100) k_candidates = [] clusters. each do |cluster| sum = cluster.inject(RGB. new ) {|sum,ins| sum + ins} candidate = sum / cluster.size k_candidates << candidate end p k_candidates
可以使用聚类算法对这个图片进行有损压缩。
查看更多关于Ruby简单实现Kmeans聚类算法的详细内容...
声明:本文来自网络,不代表【好得很程序员自学网】立场,转载请注明出处:http://www.haodehen.cn/did42000