jquery - Get FULL HTML content web page (including javascript content) -


after hours of trying , reading, i'm bit lost title subject.

my problem : trying full html content (javascript html appended/added content) of single web page. have try :

  • i used jsoup, had change because of fact jsoup doesn't handle javascript content.
  • i used hmtlutil many errors on loading of targeted webpage (like css error, runtimeerror, ecmaerror, etc.)
  • i used basic functionnality of chrome save full content webpage , used jsoup library content wanted find. way have content wish get.

so now, question is, how can imitate "save as" function of browser or how can i, in general, full html content first , then use jsoup scan static final html content ?

thanks lot advise , !

i wanted to. try explain thoose need help!


so ! process composed 2 steps :

  • first, final content html (including javascript html content, etc.) if visiting web page , save file.html
  • then, going use jsoup library wanted content in saved file, file.hmtl.

1 - html content , save it

for step, need download phantomjs , use content. here code target page. change mytargetedpage.com url of page want , name of file mysavefile.html.

var page = require('webpage').create(); var fs = require('fs'); page.open('http://mytargetedpage.com', function () {     page.evaluate();     fs.write('mysavefile.html', page.content, 'w');     phantom.exit(); }); 

as can see, file saved same content load in browser.

2 - extract content wanted

now, use java , library jsoup or specific content. in example, want part of web page :

/* html content */ <span class="my class" data="data1"></span> /* html content */ <span class="my class" data="data2"></span> /* html content */ 

to this, code fine (don't forget edit thepathtoyoursavedfile.html :

public static void main(string[] args) throws exception {     string url = "thepathtoyoursavedfile.html";      document document = jsoup.connect(url).useragent("mozilla").get();      elements spanlist= document.select("span");     (element span: spanlist) {        if(span.attr("class").equals("my class")){            string data = span.attr("data");            system.out.println("data : "+data);                     }     }        } 

enjoy !


Comments

Popular posts from this blog

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -

javascript - Blogger related post gadget image Resize s72-c [ Need Expert Help ] -