For Loop to pass a Variable through a URL in Python -

i new python , trying learn on own doing simple web scraping football stats.

i have been successful in getting data single page @ time, have not been able figure out how add loop code scrape multiple pages @ once (or multiple positions/years/conferences matter).

i have searched fair amount on , other websites can't seem right.

here's code:

import csv import requests beautifulsoup import beautifulsoup  url = 'http://www.nfl.com/stats/categorystats?seasontype=reg&d-447263-n=1&d-447263-o=2&d-447263-p=1&d-447263-s=passing_yards&tabseq=0&season=2014&submit=go&experience=&archive=false&statisticcategory=passing&conference=null&qualified=false' response = requests.get(url) html = response.content  soup = beautifulsoup(html) table = soup.find('table', attrs={'class': 'data-table1'})  list_of_rows = [] row in table.findall('tr'):     list_of_cells = []     cell in row.findall('td'):         text = cell.text.replace('&#39', '')         list_of_cells.append(text)     list_of_rows.append(list_of_cells)  #for line in list_of_rows: print ', '.join(line)  outfile = open("./2014.csv", "wb") writer = csv.writer(outfile) writer.writerow(["rk", "player", "team", "pos", "comp", "att", "pct", "att/g", "yds", "avg", "yds/g", "td", "int", "1st", "1st%", "lng", "20+", "40+", "sck", "rate"]) writer.writerows(list_of_rows)  outfile.close()

here's attempt @ adding variable url , building loop:

import csv import requests beautifulsoup import beautifulsoup  pagelist = ["1", "2", "3"]  x = 0 while (x < 500):     url = "http://www.nfl.com/stats/categorystats?seasontype=reg&d-447263-n=1&d-447263-o=2&d-447263-p="+str(x)).read(),'html'+"&d-447263-s=rushing_attempts_per_game_avg&tabseq=0&season=2014&submit=go&experience=&archive=false&statisticcategory=rushing&conference=null&qualified=false"      response = requests.get(url)     html = response.content     soup = beautifulsoup(html)     table = soup.find('table', attrs={'class': 'data-table1'})     list_of_rows = []     row in table.findall('tr'):         list_of_cells = []         cell in row.findall('td'):             text = cell.text.replace('&#39', '')             list_of_cells.append(text)     list_of_rows.append(list_of_cells)      #for line in list_of_rows: print ', '.join(line)       outfile = open("./2014.csv", "wb")     writer = csv.writer(outfile)     writer.writerow(["rk", "player", "team", "pos", "att", "att/g", "yds", "avg", "yds/g", "td", "long", "1st", "1st%", "20+", "40+", "fum"])     writer.writerows(list_of_rows)     x = x + 0     outfile.close()

thanks in advance.

here's revised code seems deleting each page writes csv file.

import csv import requests beautifulsoup import beautifulsoup  url_template = 'http://www.nfl.com/stats/categorystats?tabseq=0&season=2014&seasontype=reg&experience=&submit=go&archive=false&d-447263-p=%s&conference=null&statisticcategory=passing&qualified=false'  p in ['1','2','3']:     url = url_template % p     response = requests.get(url)     html = response.content     soup = beautifulsoup(html)     table = soup.find('table', attrs={'class': 'data-table1'})      list_of_rows = []     row in table.findall('tr'):         list_of_cells = []         cell in row.findall('td'):             text = cell.text.replace('&#39', '')             list_of_cells.append(text)         list_of_rows.append(list_of_cells)      #for line in list_of_rows: print ', '.join(line)          outfile = open("./2014passing.csv", "wb")         writer = csv.writer(outfile)         writer.writerow(["rk", "player", "team", "pos", "comp", "att", "pct", "att/g", "yds", "avg", "yds/g", "td", "int", "1st", "1st%", "lng", "20+", "40+", "sck", "rate"])         writer.writerows(list_of_rows)  outfile.close()

assuming want change page number, , use string formatting:

url_template = 'http://www.nfl.com/stats/categorystats?seasontype=reg&d-447263-n=1&d-447263-o=2&d-447263-p=%s&d-447263-s=passing_yards&tabseq=0&season=2014&submit=go&experience=&archive=false&statisticcategory=passing&conference=null&qualified=false' page in [1,2,3]:   url = url_template % page   response = requests.get(url)   # rest of processing code can go here   outfile = open("./2014.csv", "ab")   writer = csv.writer(outfile)   writer.writerow(...)   writer.writerows(list_of_rows)   outfile.close()

note should open file in append mode ("ab") instead of write mode ("wb"), latter overwrites existing contents, you've experienced. using append mode, new contents written @ end of file.

this outside scope of question, , more of friendly code improvement suggestion, script become easier think if split smaller functions each 1 thing, e.g., data site, write csv, etc..

Search This Blog

Braziel

For Loop to pass a Variable through a URL in Python -

Comments

Post a Comment

Popular posts from this blog

javascript - Add class to another page attribute using URL id - Jquery -

IF statement in MySQL trigger -

c++ - What does MSC in "// appease MSC" comments mean? -