How to get all links on a page with Hpricot

I couldn’t find a good simple explanation of this online, I had to look at some examples and figure it out so I thought I’d post this for myself and others. Here is how to get all links on a page using Hpricot:


  def get_links(doc)
    urls = []
    unfiltered_links = (doc/"a")
    unfiltered_links.each { |alink| 
      urls < <  alink.attributes['href']
    }     
    return urls
  end

3 thoughts on “How to get all links on a page with Hpricot

  1. p3t0r

    If you use the Enumeration# (or collect) method the code would be much easier:

    def get_links(doc)
    (doc/”a”).map{|alink| alink.attributes[‘href’]}
    end

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *