For Loop to pass a Variable through a URL in Python -
i new python , trying learn on own doing simple web scraping football stats.
i have been successful in getting data single page @ time, have not been able figure out how add loop code scrape multiple pages @ once (or multiple positions/years/conferences matter).
i have searched fair amount on , other websites can't seem right.
here's code:
import csv import requests beautifulsoup import beautifulsoup url = 'http://www.nfl.com/stats/categorystats?seasontype=reg&d-447263-n=1&d-447263-o=2&d-447263-p=1&d-447263-s=passing_yards&tabseq=0&season=2014&submit=go&experience=&archive=false&statisticcategory=passing&conference=null&qualified=false' response = requests.get(url) html = response.content soup = beautifulsoup(html) table = soup.find('table', attrs={'class': 'data-table1'}) list_of_rows = [] row in table.findall('tr'): list_of_cells = [] cell in row.findall('td'): text = cell.text.replace(''', '') list_of_cells.append(text) list_of_rows.append(list_of_cells) #for line in list_of_rows: print ', '.join(line) outfile = open("./2014.csv", "wb") writer = csv.writer(outfile) writer.writerow(["rk", "player", "team", "pos", "comp", "att", "pct", "att/g", "yds", "avg", "yds/g", "td", "int", "1st", "1st%", "lng", "20+", "40+", "sck", "rate"]) writer.writerows(list_of_rows) outfile.close()
here's attempt @ adding variable url , building loop:
import csv import requests beautifulsoup import beautifulsoup pagelist = ["1", "2", "3"] x = 0 while (x < 500): url = "http://www.nfl.com/stats/categorystats?seasontype=reg&d-447263-n=1&d-447263-o=2&d-447263-p="+str(x)).read(),'html'+"&d-447263-s=rushing_attempts_per_game_avg&tabseq=0&season=2014&submit=go&experience=&archive=false&statisticcategory=rushing&conference=null&qualified=false" response = requests.get(url) html = response.content soup = beautifulsoup(html) table = soup.find('table', attrs={'class': 'data-table1'}) list_of_rows = [] row in table.findall('tr'): list_of_cells = [] cell in row.findall('td'): text = cell.text.replace(''', '') list_of_cells.append(text) list_of_rows.append(list_of_cells) #for line in list_of_rows: print ', '.join(line) outfile = open("./2014.csv", "wb") writer = csv.writer(outfile) writer.writerow(["rk", "player", "team", "pos", "att", "att/g", "yds", "avg", "yds/g", "td", "long", "1st", "1st%", "20+", "40+", "fum"]) writer.writerows(list_of_rows) x = x + 0 outfile.close()
thanks in advance.
here's revised code seems deleting each page writes csv file.
import csv import requests beautifulsoup import beautifulsoup url_template = 'http://www.nfl.com/stats/categorystats?tabseq=0&season=2014&seasontype=reg&experience=&submit=go&archive=false&d-447263-p=%s&conference=null&statisticcategory=passing&qualified=false' p in ['1','2','3']: url = url_template % p response = requests.get(url) html = response.content soup = beautifulsoup(html) table = soup.find('table', attrs={'class': 'data-table1'}) list_of_rows = [] row in table.findall('tr'): list_of_cells = [] cell in row.findall('td'): text = cell.text.replace(''', '') list_of_cells.append(text) list_of_rows.append(list_of_cells) #for line in list_of_rows: print ', '.join(line) outfile = open("./2014passing.csv", "wb") writer = csv.writer(outfile) writer.writerow(["rk", "player", "team", "pos", "comp", "att", "pct", "att/g", "yds", "avg", "yds/g", "td", "int", "1st", "1st%", "lng", "20+", "40+", "sck", "rate"]) writer.writerows(list_of_rows) outfile.close()
assuming want change page number, , use string formatting:
url_template = 'http://www.nfl.com/stats/categorystats?seasontype=reg&d-447263-n=1&d-447263-o=2&d-447263-p=%s&d-447263-s=passing_yards&tabseq=0&season=2014&submit=go&experience=&archive=false&statisticcategory=passing&conference=null&qualified=false' page in [1,2,3]: url = url_template % page response = requests.get(url) # rest of processing code can go here outfile = open("./2014.csv", "ab") writer = csv.writer(outfile) writer.writerow(...) writer.writerows(list_of_rows) outfile.close()
note should open file in append mode ("ab") instead of write mode ("wb"), latter overwrites existing contents, you've experienced. using append mode, new contents written @ end of file.
this outside scope of question, , more of friendly code improvement suggestion, script become easier think if split smaller functions each 1 thing, e.g., data site, write csv, etc..
Comments
Post a Comment