Clusterer
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, with k < n.
More about K Means algorithm: en.wikipedia.org/wiki/K-means_algorithm
Build a new clusterer, using data examples found in data_set. Items will be clustered in “number_of_clusters” different clusters.
# File lib/ai4r/clusterers/k_means.rb, line 52 def build(data_set, number_of_clusters) @data_set = data_set @number_of_clusters = number_of_clusters @iterations = 0 calc_initial_centroids while(not stop_criteria_met) calculate_membership_clusters recompute_centroids end return self end
This function calculates the distance between 2 different instances. By default, it returns the euclidean distance to the power of 2. You can provide a more convinient distance implementation:
1- Overwriting this method
2- Providing a closure to the :distance_function parameter
# File lib/ai4r/clusterers/k_means.rb, line 81 def distance(a, b) return @distance_function.call(a, b) if @distance_function return euclidean_distance(a, b) end
# File lib/ai4r/clusterers/k_means.rb, line 88 def calc_initial_centroids @centroids = [] tried_indexes = [] while @centroids.length < @number_of_clusters && tried_indexes.length < @data_set.data_items.length random_index = rand(@data_set.data_items.length) if !tried_indexes.include?(random_index) tried_indexes << random_index if !@centroids.include? @data_set.data_items[random_index] @centroids << @data_set.data_items[random_index] end end end @number_of_clusters = @centroids.length end
# File lib/ai4r/clusterers/k_means.rb, line 109 def calculate_membership_clusters @clusters = Array.new(@number_of_clusters) do Ai4r::Data::DataSet.new :data_labels => @data_set.data_labels end @data_set.data_items.each do |data_item| @clusters[eval(data_item)] << data_item end end
Generated with the Darkfish Rdoc Generator 2.