Parent

Methods

WebRobots

Public Class Methods

new(user_agent, options = nil) click to toggle source

Creates a WebRobots object for a robot named user_agent, with optional options.

  • :http_get => a custom method, proc, or anything that responds to .call(uri), to be used for fetching robots.txt. It must return the response body if successful, return an empty string if the resource is not found, and return nil or raise any error on failure. Redirects should be handled within this proc.

  • :crawl_delay => determines how to react to Crawl-delay directives. If :sleep is given, WebRobots sleeps as demanded when allowed?(url)/disallowed?(url) is called. This is the default behavior. If :ignore is given, WebRobots does nothing. If a custom method, proc, or anything that responds to .call(delay, last_checked_at), it is called.

# File lib/webrobots.rb, line 28
def initialize(user_agent, options = nil)
  @user_agent = user_agent

  options ||= {}
  @http_get = options[:http_get] || method(:http_get)
  crawl_delay_handler =
    case value = options[:crawl_delay] || :sleep
    when :ignore
      nil
    when :sleep
      method(:crawl_delay_handler)
    else
      if value.respond_to?(:call)
        value
      else
        raise ArgumentError, "invalid Crawl-delay handler: #{value.inspect}"
      end
    end

  @parser = RobotsTxt::Parser.new(user_agent, crawl_delay_handler)
  @parser_mutex = Mutex.new

  @robotstxt = create_cache()
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.