javascript - Making AJAX calls with Python -
i trying value of href
attribute of anchor element web page using self-made python script. however, of contents of div
element inside anchor element sits received web page using ajax jquery calls when web page loads. div
element contains 90% of web page's content. how can contents of div
element , value of href
attribute of anchor element?
later, after value of 'href' attribute, want contents of web page link points to. unfortunately, call made ajax (jquery). when click on in web browser, address of web page not change in address bar, means contents of web page received loaded same web page (inside above mentioned div
) element.
after this, using beautifulsoup parse web page. so, how able python? sort of modules need use? , general pseudo-code required?
by way, anchor element has onclick
event handler triggers corresponding jquery function loads contents div
element inside web page.
moreover, anchor element not associated id, if needed solution.
you'd want use headless web browser. take @ ghost.py or phantompy.
i realized phantompy no longer being actively developed, here's example ghost.py.
i created html page blank. javascript adds couple links div
.
<html> <body> <div id="links"> <!-- links go here --> </div> </body> <script type="text/javascript"> var div = document.getelementbyid('links'); var link = document.createelement('a'); link.innerhtml = 'duckduckgo'; link.setattribute('href', 'http://duckduckgo.com'); div.appendchild(link); </script> </html>
so if scrape page right beautiful soup using soup.find_all('a')
wouldn't links, because there aren't any.
but can use headless browser render content us.
>>> ghost import ghost >>> bs4 import beautifulsoup >>> >>> ghost = ghost() >>> >>> ghost.open('http://localhost:8000') >>> >>> soup = beautifulsoup(ghost.content) >>> soup.find_all('a') [<a href="http://duckduckgo.com">duckduckgo</a>]
if have clicking link change content on page, this. check out sample use case on project's website.
Comments
Post a Comment