Link Scraping from any URL
December 5, 2008
Link Scraping from any URL:
1. Need to install Ruby1.8.6
2. Need to install gems –> gem install mechanize (How to install gems ?)
Ruby Code:
require ‘rubygems’
require ‘mechanize’
agent = WWW::Mechanize.new
url = “http://google.com”
@overall=[]
@first=[]
@second=[]
page = agent.get(url)
page.links.each do |one|
href=one.uri
#puts href.class
if href.class == URI::Generic
#puts href.to_s[0,1] ==
@first << href=url+href.to_s if “#{href.to_s[0,1]}” == “/”
else #href.class == URI::HTTP
@first << href
end
end
@first.uniq!
puts @first
Comments are always welcome
regards,
P.Raveendran