How to query google with hpricot from your command line

so i found myself the other day sitting there in front of my command line just about to look up something for reference from google. No big deal, google search box is part of every major browser and myriads of other options of OS integrated searches are available as well. But i’m so old school, the command line is where i spend most of my time, which means using the web browser is a context switch nonetheless. The Mac OS X became so appealing exactly because it is basically Unix with a commandline shell and a nice GUI in front of it, and not a GUI instead of a shell.

I put together a little ruby script(called ‘g’) to run some of the most basic google queries from the command line, like:

$ g ruby command line tools <return>
 -> shows index page with results found for further query

or a direct jump to the ”I’m Feeling Lucky” result:

$ g :lucky ruby -python google <return>
 -> directly jump open the browser on the first link found

I use ':' to modify the default behaviour of the script. Normaly you do this with options but I prefered ':' over '-' and reserve the dash for the query itself. :lucky is just overwriting the default behaviour here. The next modifiers to implement where :count and :fight modifiers like:

$ g :count your search here
about 1,330,000,000 results for <your search here> (0.09 seconds) 

$ g :fight "left" "right"

:count is obvious and :fight of course got its inspiration from the awesom

I used Hpricot because i got used to it and it does actually everything i need in an elegant and concise way. The script does NOT use the Google Ajax Search API, but does scrape its results from the normal HTML response page. I simple didn’t want to make my simple script dependent on signing up for an Google API Key.

With the right stuff in place,

require 'rubygems'
require 'cgi'
require 'open-uri'
require 'hpricot'

the :lucky screen scraping for example basically boils down to:

q = %w{meine kleine suchanfrage}.map { |w| CGI.escape(w) }.join("+")
url = "{q}"
doc = Hpricot(open(url).read)
lucky_url = (doc/"div[@class='g'] a").first["href"]
system 'open #{lucky_url}'

and you can easily spot a problem here. system 'open ...' is hardly cross platform, but on Mac OSX it opens the default browser with the given URL. To give users a chance to customize things a little i put settings in a defaults value hash which will be overwritten at startup by values loaded from a users preference file in their home directory. My built-in default values are:

C = {
    :count  => 4,                       # number of results showed
    :indent => 8, :tw => 70,            # indentation and descripton width 
    :goog   => "", # where to ask?
    :open   => "system 'open ${url}'",  # loads into HTTP browser on Mac OSX

and the user preferences are loaded from: ~/.g