vendredi 6 novembre 2020

How can I scrape a Website with onClick event listener using Nokogiri

I am trying to scrape a website using Nokogiri and download documents thats are posted on the website. I can scrape other websites like this one: Matatiela Website and get the documents from it. But when I try to scrape this website: Mbhashe Website I can't get the documents because I have to first triger the onclick event in order to get to the document.

The problem now, I don't know how to triger the onclick event in order to get to the document. I have tried this code that I worked on with my friend but it didn't work:

if url.include?('http://www.alfredduma.gov.za/bids-tender-notices/')
   file = anchor['onclick'].to_s.gsub("location.href=","").gsub(";return false;","").gsub("'","")
   f = mech.get(file)
   fileNmae = f.header['content-disposition']
   fileNmae = fileNmae.match('"(.*?)"').andand[1].to_s
   fileNmae = municipalityName+ " -" +fileNmae.gsub("_"," ")
   downld(municipalityName,file,filepath,fileNmae,provinceName)
end

This code didn't work. But bellow is the code that is similar to the one i used to scrape Matatiela website but it's not working on the website of Mbhashe. Can you please help me because it does not return anything.

["https://www.mbhashemun.gov.za/procurement/tenders/","div.tb > div.tbrow > a","http://www.mbhashemun.gov.za","Mbhashe municipality","Eastern Cape"]

My Myfuction gets the css from this array.

if baseurl.include?('ttps://www.mbhashemun.gov.za/procurement/tenders/')
              
                            puts "downloading from mbhashemun"
                            parenturl = anchor['href']
                            puts parenturl
                            puts baseurl
                            tenderurl = parenturl
                          begin
                            if tenderurl.include?('http://www.mbhashemun.gov.za/web/2018/11/upgrade-and-maintenance-of-data-centre-for-a-period-of-three-03-years/')
                                   puts "the document is currently not available"
                            else
                                    puts tenderurl
                                    passingparentUrl = HTTParty.get(tenderurl)
                                    parsedparentUrl = Nokogiri::HTML(passingparentUrl)
                                    downloadtenderurl = parsedparentUrl.at_css('div.media div.media-body > div.wpfilebase-attachment > div.wpfilebase-rightcol > div.wpfilebase-filetitle > a')[:href]
                                    puts downloadtenderurl
                                    bean =  downloadtenderurl
                                    puts bean
                                    myfunction =  bean.split('/').last
                                    puts Myfunction
                                    if File.exists?(File.join('public/uploads', Myfunction))
                                       puts "the file exist in upload folder and in the database already"
                                    else
                                       mech.pluggable_parser.default = Mechanize::Download
                                       mech.get(bean).save(File.join('public/uploads', monwai))
                                       Tender.create  municipality_name: municipalityName ,tender_description:Myfunction ,tender_document: Myfunction ,provincename: provinceName
                                    end
                               end
                            rescue Exception => e
                               puts e
                            end
                         end 

The code supposed to go throught the website and download the documents and save them on the public/uploads folder on the app.

Aucun commentaire:

Enregistrer un commentaire