Category Archives: Geek

XPath tip for Hpricot – no tbody

When using a plugin in Firefox to get the XPath of an item, be aware that Firefox inserts tbody tags into the source. So, if you get an XPath like

/html/body/div[@id='content']/table/tbody/tr/td[1]/div/table/tbody/tr/td[1]/table/tbody/tr[2]/td[3]

Do a View Source in Firefox and see if the tbody tags are actually in the source or not. If not, remove the tbody tags in the XPath then the modified XPath will work with Hpricot.

Geek Crafts

Haven’t fully finished tweaking the theme yet, but check out my latest site: GeekCrafts.com. I’ve got a great writer named Shayne that’s gonna be writing for it. We kick it off in earnest this weekend.

[tags]geek, craft, crafts, scifi, geeky, nerdy, nerd, diy, make[/tags]

How to get all links on a page with Hpricot

I couldn’t find a good simple explanation of this online, I had to look at some examples and figure it out so I thought I’d post this for myself and others. Here is how to get all links on a page using Hpricot:


  def get_links(doc)
    urls = []
    unfiltered_links = (doc/"a")
    unfiltered_links.each { |alink| 
      urls < <  alink.attributes['href']
    }     
    return urls
  end