python - Select rows in one DataFrame based on rows in another -
let's assume have large pandas dataframe dfbig
columns param1, param2, ..., paramn, score, step
, , smaller dataframe dfsmall
columns param1, param2, ..., paramn
(i.e. missing score
, step
columns).
i want select rows of dfbig
values of columns param1, param2, ..., paramn
match of row in dfsmall
. there clean way of doing in pandas?
edit: give example, consider dataframe dfbig
:
arch | layers | score | time | 1 | 0.3 | 10 | 1 | 0.6 | 20 | 1 | 0.7 | 30 | 2 | 0.4 | 10 | 2 | 0.5 | 20 | 2 | 0.6 | 30 b | 1 | 0.1 | 10 b | 1 | 0.2 | 20 b | 1 | 0.7 | 30 b | 2 | 0.7 | 10 b | 2 | 0.8 | 20 b | 2 | 0.8 | 30
let's imagine model specified pair (arch, layers)
. want query dfbig
, time series scores on time best performing models arch , arch b.
following edchum's answer below, take best solution procedurally:
modelcolumns = [col col in dfbigcol if col not in ["time", "score"]] groupedbest = dfbig.groupby("arch").score.max() dfsmall = pd.dataframe(groupedbest).reset_index()[modelcolumns] dfbest = pd.merge(dfsmall, dfbig)
which yields:
arch | layers | score | time | 1 | 0.3 | 10 | 1 | 0.6 | 20 | 1 | 0.7 | 30 b | 2 | 0.7 | 10 b | 2 | 0.8 | 20 b | 2 | 0.8 | 30
if there's better way this, i'm happy hear it.
if understand question correctly should able call merge
on dfbig
, pass dfsmall
matches in aligned columns , return rows.
example:
in [71]: dfbig = pd.dataframe({'a':np.arange(100), 'b':np.arange(100), 'c':np.arange(100)}) dfsmall = pd.dataframe({'a':[3,4,5,6]}) dfbig.merge(dfsmall) out[71]: b c 0 3 3 3 1 4 4 4 2 5 5 5 3 6 6 6
Comments
Post a Comment