python - How to sort by two lines and print the line with the highest value -
i have input file,
1a traes_1as_6052071d9.1 99.01 101 99.0 1a traes_1ds_6ba87d1da.1 96.04 101 99.0 1a traes_1bs_480915ad0.1 94.06 101 99.0 1b traes_1as_49d585ba6.2 99.01 101 72.0 1b traes_1bs_47f027bbe.2 98.02 101 89.0 1b traes_1ds_3f816b920.1 97.03 101 92.0 1c traes_1as_3451447e0.1 99.01 101 97.0 1c traes_1bs_9f243cea6.2 92.93 99 97.0 1c traes_1ds_2a6443f45.1 89.90 99 97.0
i need to
- group , iterate inside each
line[0]
, - sort
line[4]
lowest highest value , take highest value - if similar, print results choosing 1 has highest value in
line[2]
, output file looks this:
required output:
1a traes_1as_6052071d9.1 99.01 101 99.0 1b traes_1ds_3f816b920.1 97.03 101 92.0 1c traes_1as_3451447e0.1 99.01 101 97.0
this try, takes according highest line[4]
:
import csv itertools import groupby operator import itemgetter open('my_file','rb') f1: open('out_file', 'wb') f2: reader = csv.reader(f1, delimiter='\t') writer1 = csv.writer(f2, delimiter='\t') group, rows in groupby(reader, itemgetter(0)): seen = set() rows = sorted(rows, key=lambda r: float(r[4])) row in rows: max(rows, key=lambda r: float(r[4])) writer1.writerow(row)
just have key
function max
return tuple of (r[4], r[2])
slightly simplified example (without output file)
with open('data.txt','rb') f1: reader = csv.reader(f1, delimiter='\t') group, rows in groupby(reader, itemgetter(0)): best = max(rows, key=lambda r: (float(r[4]), float(r[2]))) print best
Comments
Post a Comment