Files

Ai4r::Data::Proximity

This module provides classical distance functions

Public Class Methods

euclidean_distance(a, b) click to toggle source

Euclidean distance, or L2 norm. Parameters a and b are vectors with continuous attributes. Euclidean distance tends to form hyperspherical clusters(Clustering, Xu and Wunsch, 2009). Translations and rotations do not cause a distortion in distance relation (Duda et al, 2001) If attributes are measured with different units, attributes with larger values and variance will dominate the metric.

# File lib/ai4r/data/proximity.rb, line 36
def self.euclidean_distance(a, b)
  Math.sqrt(squared_euclidean_distance(a, b))
end
hamming_distance(a,b) click to toggle source

The Hamming distance between two attributes vectors of equal length is the number of attributes for which the corresponding vectors are different This distance function is frequently used with binary attributes, though it can be used with other discrete attributes.

# File lib/ai4r/data/proximity.rb, line 69
def self.hamming_distance(a,b)
  count = 0
  a.each_index do |i|
    count += 1 if a[i] != b[i]
  end
  return count
end
manhattan_distance(a, b) click to toggle source

city block, Manhattan distance, or L1 norm. Parameters a and b are vectors with continuous attributes.

# File lib/ai4r/data/proximity.rb, line 43
def self.manhattan_distance(a, b)
  sum = 0.0
  a.each_with_index do |item_a, i|
    item_b = b[i]
    sum += (item_a - item_b).abs
  end
  return sum
end
simple_matching_distance(a,b) click to toggle source

The “Simple matching” distance between two attribute sets is given by the number of values present on both vectors. If sets a and b have lengths da and db then:

S = 2/(da + db) * Number of values present on both sets
D = 1.0/S - 1

Some considerations:

  • a and b must not include repeated items

  • all attributes are treated equally

  • all attributes are treated equally

# File lib/ai4r/data/proximity.rb, line 88
def self.simple_matching_distance(a,b)
  similarity = 0.0
  a.each {|item| similarity += 2 if b.include?(item)}
  similarity /= (a.length + b.length)
  return 1.0/similarity - 1
end
squared_euclidean_distance(a, b) click to toggle source

This is a faster computational replacement for eclidean distance. Parameters a and b are vectors with continuous attributes.

# File lib/ai4r/data/proximity.rb, line 18
def self.squared_euclidean_distance(a, b)
  sum = 0.0
  a.each_with_index do |item_a, i|
    item_b = b[i]
    sum += (item_a - item_b)**2
  end
  return sum
end
sup_distance(a, b) click to toggle source

Sup distance, or L-intinity norm Parameters a and b are vectors with continuous attributes.

# File lib/ai4r/data/proximity.rb, line 54
def self.sup_distance(a, b)
  distance = 0.0
  a.each_with_index do |item_a, i|
    item_b = b[i]
    diff = (item_a - item_b).abs
    distance = diff if diff > distance
  end
  return distance
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.