python - Randomly sample rows from a file based on times in columns -


this bit complex, , appreciate help! trying randomly sample rows .csv file. essentially, want resulting file of unique locations (locations specified easting , northing columns of data file, below). want randomly pull 1 location per 12 hour period per sessiondate in file (12 hour periods divided into: between 0631 , 1829 hours , between 1830 , 0630 hours; given start: , end: in data file, below); if 2 locations within 6 hours of each other (based on start: time), location tossed, , new location randomly drawn, , sampling continue until no new locations drawn (i.e., sampling without replacement). have been trying python, experience limited. tried first putting each row dictionary, , each row list, follows:

import random import csv  f = open('file.csv', "u") list = []  line in f:     list.append(line.split(',')) 

i'm unsure go here - how sample these lists way need to, write them output file 'unique' locations.

here top few lines of data file:

sessiondate start:  end:    easting northing 27-apr-07   18:00   21:45   174739  9785206 28-apr-07   18:00   21:30   171984  9784738 28-apr-07   18:00   21:30   171984  9784738 28-apr-07   18:00   21:30   171984  9784738 28-apr-07   18:00   21:30   171984  9784738 

it gets bit complicated of observations span midnight, may on different dates, can within 6 hours of each other (which why have criterion), example:

sessiondate start:  end:    easting northing 27-apr-07   22:30   23:25   171984  9784738 28-apr-07   0:25    1:30    174739  9785206 

here's solution - made few changes data (location make easier eyeball results). create dict of dates pointing dict of locations points list of selected rows.

data  = """sessiondate start:  end:    easting northing 27-apr-07   18:00   21:45    1 27-apr-07   18:00   21:30   g  2 28-apr-07   18:00   21:30   b  2 28-apr-07   18:00   21:30   b  2 28-apr-07   18:00   21:30   b  2 29-apr-07   8:00    11:30   c 3 29-apr-07   20:00   21:30   c  3 29-apr-07   20:00   21:30   c  3 30-apr-07   8:00   10:30   d  4 30-apr-07   16:00   17:30   e  5 30-apr-07   14:00   21:30   f  6 30-apr-07   18:00   21:30   f  6 """  selected = {} line in data.split("\n"):     if "session" in line:         continue     if not line:         continue      tmp = [x x in line.split() if x]     raw_dt = " ".join([tmp[0], tmp[1]]).strip()     curr_dt = datetime.strptime(raw_dt, "%d-%b-%y %h:%m")     loc = (tmp[-2], tmp[-1])      found = false     dt in selected:         diff = dt - curr_dt         if dt < curr_dt:             diff = curr_dt - dt         # print dt, curr_dt, diff, diff <= timedelta(hours=12), loc, loc in selected[dt]                     if diff <= timedelta(hours=12):             if loc not in selected[dt]:                 selected[dt].setdefault(loc, []).append(tmp)                 found = true             else:                 found = true     if not found:         if curr_dt not in selected:             selected[curr_dt] = {}         if loc not in selected[curr_dt]:             selected[curr_dt][loc] = [tmp,]  # if output needs sorted rows = sorted(x k in selected l in selected[k] x in selected[k][l]) row in rows:     print " ".join(row) 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -