python - Select rows in one DataFrame based on rows in another -
let's assume have large pandas dataframe dfbig columns param1, param2, ..., paramn, score, step, , smaller dataframe dfsmall columns param1, param2, ..., paramn (i.e. missing score , step columns).
i want select rows of dfbig values of columns param1, param2, ..., paramn match of row in dfsmall. there clean way of doing in pandas?
edit: give example, consider dataframe dfbig:
arch | layers | score | time | 1 | 0.3 | 10 | 1 | 0.6 | 20 | 1 | 0.7 | 30 | 2 | 0.4 | 10 | 2 | 0.5 | 20 | 2 | 0.6 | 30 b | 1 | 0.1 | 10 b | 1 | 0.2 | 20 b | 1 | 0.7 | 30 b | 2 | 0.7 | 10 b | 2 | 0.8 | 20 b | 2 | 0.8 | 30 let's imagine model specified pair (arch, layers). want query dfbig , time series scores on time best performing models arch , arch b.
following edchum's answer below, take best solution procedurally:
modelcolumns = [col col in dfbigcol if col not in ["time", "score"]] groupedbest = dfbig.groupby("arch").score.max() dfsmall = pd.dataframe(groupedbest).reset_index()[modelcolumns] dfbest = pd.merge(dfsmall, dfbig) which yields:
arch | layers | score | time | 1 | 0.3 | 10 | 1 | 0.6 | 20 | 1 | 0.7 | 30 b | 2 | 0.7 | 10 b | 2 | 0.8 | 20 b | 2 | 0.8 | 30 if there's better way this, i'm happy hear it.
if understand question correctly should able call merge on dfbig , pass dfsmall matches in aligned columns , return rows.
example:
in [71]: dfbig = pd.dataframe({'a':np.arange(100), 'b':np.arange(100), 'c':np.arange(100)}) dfsmall = pd.dataframe({'a':[3,4,5,6]}) dfbig.merge(dfsmall) out[71]: b c 0 3 3 3 1 4 4 4 2 5 5 5 3 6 6 6
Comments
Post a Comment