python - Properly watch websites for updates -
i wrote script i'm using push updates pushbullet channels whenever new nexus factory image released. separate channel exists each of first 11 devices on page, , i'm using rather convoluted script watch updates. full setup here (specifically this script), i'll briefly summarize script below. my question this: not correct way doing this, it's susceptible multiple points of failure. better method of doing this? prefer stick python, i'm open other languages if simpler/better.
(this question prompted fact updated apache 2.4 config tonight , apparently triggered slight change in output of local files watched urlwatch
, 11 channels got erroneous update pushed them.)
basic script functionality (some nonessential parts not included):
- create dictionary of each device codename associated full model name
- get existing nexus factory images page using requests
- make bs4 object source code
- for each of 11 devices in dictionary (loop), following:
- open/create page in public web directory device
- write source page, filtered using bs4:
str(soup.select("h2#" + dev + " ~ table")[0])
- call
urlwatch
on page check updates, save output temp file - if temp file size > 0 page has changed, push update appropriate channel
- remove webpage , temp file
a thought had while typing question: possible solution save each current version string (for example: 5.1.0 (lmy47i)
) pickled variable, if urlwatch
detects difference compare new version string pickled 1 , push if they're different? throw regex matching in ensure new format matches old format , has updated data, @ least temporary measure try prevent future false alarms?
scraping inherently fragile, if don't change source format should pretty straightforward in case. should parse webpage data structure. using bs4 fine this. end result should python dictionary:
{ 'mantaray': { '4.2.2 (jdq39)': {'link': 'https://...'}, '4.3 (jwr66y)': {'link': 'https://...'}, }, ... }
save structure json.dumps
. every time parse page can generate similar data structure , compare 1 have on disk (update saved 1 each time after done).
then part left comparing datastructure. can iterate models , check each version have in current version of page exists in previous version. if not, have new version.
you can potentially generate easy use api using https://www.kimonolabs.com/ instead of doing parsing yourself.
Comments
Post a Comment