How to get all links on a page with Hpricot

I couldn’t find a good simple explanation of this online, I had to look at some examples and figure it out so I thought I’d post this for myself and others. Here is how to get all links on a page using Hpricot:


  def get_links(doc)
    urls = []
    unfiltered_links = (doc/"a")
    unfiltered_links.each { |alink|
      urls < <  alink.attributes['href']
    }
    return urls
  end

3 Responses to “How to get all links on a page with Hpricot”

  1. devJ March 25, 2008 at 4:11 pm #

    do you have any examples to find all forms (GET and POST) in a web page

  2. p3t0r April 12, 2008 at 3:42 am #

    If you use the Enumeration# (or collect) method the code would be much easier:

    def get_links(doc)
    (doc/”a”).map{|alink| alink.attributes['href']}
    end

  3. p3t0r April 12, 2008 at 3:42 am #

    I meant to say ‘Enumeration#map’

Leave a Reply