I am trying to scrape a website using Nokogiri and download documents thats are posted on the website. I can scrape other websites like this one: Matatiela Website and get the documents from it. But when I try to scrape this website: Mbhashe Website I can't get the documents because I have to first triger the onclick event in order to get to the document.
The problem now, I don't know how to triger the onclick event in order to get to the document. I have tried this code that I worked on with my friend but it didn't work:
if url.include?('http://www.alfredduma.gov.za/bids-tender-notices/')
file = anchor['onclick'].to_s.gsub("location.href=","").gsub(";return false;","").gsub("'","")
f = mech.get(file)
fileNmae = f.header['content-disposition']
fileNmae = fileNmae.match('"(.*?)"').andand[1].to_s
fileNmae = municipalityName+ " -" +fileNmae.gsub("_"," ")
downld(municipalityName,file,filepath,fileNmae,provinceName)
end
This code didn't work. But bellow is the code that is similar to the one i used to scrape Matatiela website but it's not working on the website of Mbhashe. Can you please help me because it does not return anything.
["https://www.mbhashemun.gov.za/procurement/tenders/","div.tb > div.tbrow > a","http://www.mbhashemun.gov.za","Mbhashe municipality","Eastern Cape"]
My Myfuction gets the css from this array.
if baseurl.include?('ttps://www.mbhashemun.gov.za/procurement/tenders/')
puts "downloading from mbhashemun"
parenturl = anchor['href']
puts parenturl
puts baseurl
tenderurl = parenturl
begin
if tenderurl.include?('http://www.mbhashemun.gov.za/web/2018/11/upgrade-and-maintenance-of-data-centre-for-a-period-of-three-03-years/')
puts "the document is currently not available"
else
puts tenderurl
passingparentUrl = HTTParty.get(tenderurl)
parsedparentUrl = Nokogiri::HTML(passingparentUrl)
downloadtenderurl = parsedparentUrl.at_css('div.media div.media-body > div.wpfilebase-attachment > div.wpfilebase-rightcol > div.wpfilebase-filetitle > a')[:href]
puts downloadtenderurl
bean = downloadtenderurl
puts bean
myfunction = bean.split('/').last
puts Myfunction
if File.exists?(File.join('public/uploads', Myfunction))
puts "the file exist in upload folder and in the database already"
else
mech.pluggable_parser.default = Mechanize::Download
mech.get(bean).save(File.join('public/uploads', monwai))
Tender.create municipality_name: municipalityName ,tender_description:Myfunction ,tender_document: Myfunction ,provincename: provinceName
end
end
rescue Exception => e
puts e
end
end
The code supposed to go throught the website and download the documents and save them on the public/uploads folder on the app.
Aucun commentaire:
Enregistrer un commentaire