nokogiri · open-uri · raveendran · Ruby · scrap

Nokogiri – Rubygem

Sample code to save the website in to html in local machine.

Benefit : We can scrap the content from the target website.

Code

require ‘open-uri’
require ‘nokogiri’

link=”http://google.com”

doc = Nokogiri::HTML(open(link))

file1=File.open(“test.html”,’w’)

file1.puts doc

file1.close

ruby excercise

Link Scraping from any URL

Link Scraping from any URL:

1. Need to install Ruby1.8.6

2. Need to install gems –> gem install mechanize (How to install gems ?)

Ruby Code:

require ‘rubygems’
require ‘mechanize’
agent = WWW::Mechanize.new
url = “http://google.com”

@overall=[]

@first=[]
@second=[]

page = agent.get(url)
page.links.each do |one|
href=one.uri
#puts href.class
if href.class == URI::Generic
#puts href.to_s[0,1] ==
@first << href=url+href.to_s if “#{href.to_s[0,1]}” == “/”
else #href.class == URI::HTTP
@first << href
end
end
@first.uniq!
puts @first

Comments are always welcome

regards,

P.Raveendran