Focus on scRUBYt! v0.4.11 the powerful web scraping tool

scRUBYt! is a simple but powerful web scraping toolkit written in Ruby. It’s purpose is to free you from the drudgery of web page crawling, looking up HTML tags, attributes, XPaths, form names and other typical low-level web scraping stuff by figuring these out from your examples copy’n’pasted from the Web page or straight from Firebug.

scRUBYt! has only 2 dependencies, hpricot and mechanize (optionally FireWatir for AJAX scraping).

PNG - 3.1 kb

Changements :

  • [NEW] possibility to use FireWatir as the agent for scraping (credit: Glenn Gillen, Glen Gillen and... did I mention Glenn already?)
  • [FIX] navigation doesn’t crash if a 404/500 is returned (credit: Glen Gillen)
  • [NEW] navigation action: click_by_xpath to click arbitrary elements
  • [MOD] dropped dependencies: RubyInline, ParseTree, Ruby2Ruby (hooray for win32 users)
  • [NEW] scraping through frames (e.g. google analytics)
  • [MOD] exporting temporarily doesn’t work - for now, generated XPaths are printed to the screen
  • [MOD] possibility to wait after clicking link/filling textfield (to be able to scrape inserted AJAX stuff)
  • [NEW] possibility to fetch from a string, by specifying nil as the url and the html string with the :html option
  • [FIX] firewatir slowness (credit: jak4)
  • [FIX] lot of bugfixes and stability fixes

scRUBYt! is free, open source software licenced under GNU General Public License, version 2. scRUBYt! is developed by Peter Szinek, Glenn Gillen and a team of core contributors

Post scriptum

Compliance Mandates

  • Application Scanner :

    PCI/DSS 6.3, SOX A12.4, GLBA 16 CFR 314.4(b) and (2), HIPAA 164.308(a)(1)(i), FISMA RA-5, SA-11, SI-2, ISO 27001/27002 12.6, 15.2.2


Related Articles

Application Scanner
Configurations checks
Information Gathering