jquery - Executing javascript code to accept terms and open next page -
i crawl javacode rendered website requires clicking on 'accept terms' button enter. using scrapy , splash , have tried execute javascript code using both splash endpoints 'render.html' , 'execute'. in both cases the output start page. why dosent work expected?
url = start page "accept terms" button.
url/index.aspx = page want render.
using render.html:
yield scrapy.request('url', self.parse, meta={ 'splash': { 'endpoint':'render.html','args': {'js_source': 'document.getelementbyid("acceptterms").click();', 'html': 1, 'wait': 0.5}}})
or using execute , lua:
lua_source_string = 'function main(splash) splash:go("url/index.aspx") splash:wait(0.5) splash:runjs("document.getelementbyid(\'acceptterms\').click();") return splash:html() end' yield scrapy.request('url', self.parse, meta={ 'splash': { 'endpoint':'execute','args': {'lua_source' : lua_source_string}}})
'url' page rendered.
if follow example http://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/ , use following lua string jquery so:
lua_source_string = 'function main(splash) splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js") splash:go("url/index.aspx") splash:wait(0.5) splash:runjs("$(\'#acceptterms\').click();") return splash:html() end'
or using jquery code so:
lua_source_string = 'function main(splash) splash:autoload("i/am/restricted/to/only/two/links/see/above/jquery.min.js") splash:go("url/index.aspx") splash:wait(0.5) splash:runjs("$(\'#acceptterms\').trigger(\'click\');") return splash:html() end'
i same results. rendered page 'url'.
i had same problem. suggest use workaround:
function setup_casperjs(splash) -- preload casperjs client utils. -- __utils__ object compatible casperjs splash:autoload("https://raw.githubusercontent.com/n1k0/casperjs/master/modules/clientutils.js") splash:autoload([[ window.__utils__ = new clientutils({}); ]]) end function main(splash) setup_casperjs(splash) assert(splash:go(splash.args.url)) assert(splash:runjs("__utils__.click('#acceptterms')")) splash:wait(0.5) return splash:html() end
see https://github.com/scrapinghub/splash/issues/200#issuecomment-112552839 more detailed explanation.
Comments
Post a Comment