I couldn’t find a good simple explanation of this online, I had to look at some examples and figure it out so I thought I’d post this for myself and others. Here is how to get all links on a page using Hpricot:
def get_links(doc)
urls = []
unfiltered_links = (doc/"a")
unfiltered_links.each { |alink|
urls < < alink.attributes['href']
}
return urls
end
do you have any examples to find all forms (GET and POST) in a web page
If you use the Enumeration# (or collect) method the code would be much easier:
def get_links(doc)
(doc/”a”).map{|alink| alink.attributes[‘href’]}
end
I meant to say ‘Enumeration#map’