python - Extracting time series for best performing agents in a pandas DataFrame -

i apologise if question basic, i've gone through documentation , having trouble figuring out simple canonical way deal problem is, although because new pandas.

i have dataframe df representing time series data (where time measured steps) scores of series of systems, parametrised hyperparameters parama , paramb.

a synthetic sample of data follows:

| parama | paramb | score | step | ---------------------------------- |      |    c   |   .8  |  10  | |   b    |    e   |   .2  |  10  | |      |    f   |   .1  |  40  | |   c    |    c   |   .9  |  10  | |   b    |    e   |   .3  |  20  | |   b    |    c   |   .3  |  10  | |      |    c   |   .7  |  20  | |   c    |    f   |   .4  |  60  | |  ...   |   ...  |  ...  |  ... |

i want following things:

1. find top scoring models

for each model (i.e. each possible pair of hyperparameters in dataset), find row top score. doing with:

df.groupby([df.parama, df.paramb]).score.max()

2. extract time series top scorers

i want time series top k models obtained maxima in previous step.

obviously can manually top scoring models in previous step, k dataframes using k separate queries, feels slow , inelegant. i'm assuming there's smarter way of doing this.

again, apologies if simple, assistance in solving problem clever way, rather brute-force way have in mind, appreciated.

to answer question one, first create unique set of models, initialize empty dictionary using parameter set of each model, , fill dictionary index of maximum score:

pairs = pd.series(zip(df.parama, df.paramb)).unique() models = {(a, b): none a, b in pairs} a, b in pairs:     models[(a, b)] = df.loc[((df.parama == a) & (df.paramb == b)), 'score'].idxmax()  >>> models {('a', 'c'): 0,  ('a', 'f'): 2,  ('b', 'c'): 5,  ('b', 'e'): 4,  ('c', 'c'): 3,  ('c', 'f'): 7}

to top k models highest scores:

k = 5  # top number of models return m = [(v, k_) k_, v in models.iteritems()] m.sort(reverse=true) top_models = [model[1] model in m[:k]] >>> top_models [('c', 'f'), ('b', 'c'), ('b', 'e'), ('c', 'c'), ('a', 'f')]

Search This Blog

Braziel