python - Extracting time series for best performing agents in a pandas DataFrame -
i apologise if question basic, i've gone through documentation , having trouble figuring out simple canonical way deal problem is, although because new pandas.
i have dataframe df
representing time series data (where time measured step
s) score
s of series of systems, parametrised hyperparameters parama
, paramb
.
a synthetic sample of data follows:
| parama | paramb | score | step | ---------------------------------- | | c | .8 | 10 | | b | e | .2 | 10 | | | f | .1 | 40 | | c | c | .9 | 10 | | b | e | .3 | 20 | | b | c | .3 | 10 | | | c | .7 | 20 | | c | f | .4 | 60 | | ... | ... | ... | ... |
i want following things:
1. find top scoring models
for each model (i.e. each possible pair of hyperparameters in dataset), find row top score. doing with:
df.groupby([df.parama, df.paramb]).score.max()
2. extract time series top scorers
i want time series top k
models obtained maxima in previous step.
obviously can manually top scoring models in previous step, k
dataframes using k
separate queries, feels slow , inelegant. i'm assuming there's smarter way of doing this.
again, apologies if simple, assistance in solving problem clever way, rather brute-force way have in mind, appreciated.
to answer question one, first create unique set of models, initialize empty dictionary using parameter set of each model, , fill dictionary index of maximum score:
pairs = pd.series(zip(df.parama, df.paramb)).unique() models = {(a, b): none a, b in pairs} a, b in pairs: models[(a, b)] = df.loc[((df.parama == a) & (df.paramb == b)), 'score'].idxmax() >>> models {('a', 'c'): 0, ('a', 'f'): 2, ('b', 'c'): 5, ('b', 'e'): 4, ('c', 'c'): 3, ('c', 'f'): 7}
to top k
models highest scores:
k = 5 # top number of models return m = [(v, k_) k_, v in models.iteritems()] m.sort(reverse=true) top_models = [model[1] model in m[:k]] >>> top_models [('c', 'f'), ('b', 'c'), ('b', 'e'), ('c', 'c'), ('a', 'f')]
Comments
Post a Comment