python - How to sort by two lines and print the line with the highest value -


i have input file,

1a  traes_1as_6052071d9.1   99.01   101 99.0     1a  traes_1ds_6ba87d1da.1   96.04   101 99.0     1a  traes_1bs_480915ad0.1   94.06   101 99.0     1b  traes_1as_49d585ba6.2   99.01   101 72.0     1b  traes_1bs_47f027bbe.2   98.02   101 89.0     1b  traes_1ds_3f816b920.1   97.03   101 92.0     1c  traes_1as_3451447e0.1   99.01   101 97.0 1c  traes_1bs_9f243cea6.2   92.93   99  97.0     1c  traes_1ds_2a6443f45.1   89.90   99  97.0     

i need to

  1. group , iterate inside each line[0],
  2. sort line[4] lowest highest value , take highest value
  3. if similar, print results choosing 1 has highest value in line[2], output file looks this:

required output:

1a  traes_1as_6052071d9.1   99.01   101 99.0     1b  traes_1ds_3f816b920.1   97.03   101 92.0     1c  traes_1as_3451447e0.1   99.01   101 97.0     

this try, takes according highest line[4]:

import csv itertools import groupby operator import itemgetter open('my_file','rb') f1: open('out_file', 'wb') f2:     reader = csv.reader(f1, delimiter='\t')     writer1 = csv.writer(f2, delimiter='\t')     group, rows in groupby(reader, itemgetter(0)):         seen = set()         rows = sorted(rows, key=lambda r: float(r[4]))         row in rows:             max(rows, key=lambda r: float(r[4]))             writer1.writerow(row) 

just have key function max return tuple of (r[4], r[2])

slightly simplified example (without output file)

with open('data.txt','rb') f1:     reader = csv.reader(f1, delimiter='\t')     group, rows in groupby(reader, itemgetter(0)):         best = max(rows, key=lambda r: (float(r[4]), float(r[2])))         print best 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -