python 3.x - Wrapping Selenium Driver (And Other Blocking Calls) with Asyncio's Run_In_Executor -


i'm experimenting first small scraper in python, , want use asyncio fetch multiple websites simultaneously. i've written function works aiohttp, since aiohttp.request() not execute javascript isn't ideal scraping dynamic web pages. motivates trying use selenium phantomjs headless browser.

there couple snippets of code demonstrating use of baseeventloop.run_in_executor - such here - documentation sparse , copy , paste mojo not strong enough.

if kind enough expand on use of asyncio wrap blocking calls in general, or explain what's going on in specific case i'd appreciate it! here i've knocked far:

@asyncio.coroutine def fetch_page_pjs(self, url):     '''     (self, string, int) -> none     performs async website content retrieval     '''     loop = asyncio.get_event_loop()     try:         future = loop.run_in_executor(none, self.driver.get, url)         print(url)         response = yield future         print(response)         if response.status == 200:             body = beautifulsoup(self.driver.page_source)             self.results.append((url, body))         else:             self.results.append((url, ''))     except:         self.results.append((url, '')) 

response returns 'none' - why?

this not asyncio or run_in_executor issue. selenium api not able used way. first driver.get doesn't return anything. see docs selenium. second, not possible status codes selenium directly, see this stack overflow question

this code worked me:

@asyncio.coroutine def fetch_page_pjs(self, url):     '''     (self, string, int) -> none     performs async website content retrieval     '''     loop = asyncio.get_event_loop()     try:         future = loop.run_in_executor(none, self.driver.get, url)         print(url)         yield future         body = beautifulsoup(self.driver.page_source)         self.results.append((url, body))      except:         self.results.append((url, '')) 

Comments

Popular posts from this blog

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -

javascript - Blogger related post gadget image Resize s72-c [ Need Expert Help ] -