Link Scraping from any URL

Link Scraping from any URL:

1. Need to install Ruby1.8.6

2. Need to install gems –> gem install mechanize (How to install gems ?)

Ruby Code:

require ‘rubygems’
require ‘mechanize’
agent = WWW::Mechanize.new
url = “http://google.com”

@overall=[]

@first=[]
@second=[]

page = agent.get(url)
page.links.each do |one|
href=one.uri
#puts href.class
if href.class == URI::Generic
#puts href.to_s[0,1] ==
@first << href=url+href.to_s if “#{href.to_s[0,1]}” == “/”
else #href.class == URI::HTTP
@first << href
end
end
@first.uniq!
puts @first

Comments are always welcome

regards,

P.Raveendran

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s