python - group-by + case when Equivalent -
want select:
select user_id, max(case when value > 0 timestamp else 0 end) max_timestamp_when_value_is_positive df group user_id what right way aggregate?
groupped = raw_data.groupby('user_id') res = groupped.agg({<how-to-do-described-aggregation?>}) update explanation , example.
in [2]: df = pd.dataframe({'user_id': [1, 1, 1, 2, 2, 3, 3, 3, 3], 'timestamp': [100, 200, 300, 10, 110, 10, 110, 210, 250], 'value': [0, 1, 0, 0, 0, 0, 10, 0, 1]}) in [3]: groupped = df.groupby('user_id') in [4]: res = groupped.agg({'timestamp': [min, max], 'value': lambda x: sum(x > 0), <described-magic>}) in [5]: res out[5]: timestamp value <...magic...> min max <lambda> user_id 1 100 300 1 200 2 10 110 0 0 3 10 250 2 210 magic want.
create new column positive_value_timestamp as
df['positive_value_timestamp'] = df.timestamp * df.value.apply(lambda x: 1 if x > 0 else 0) when grouping, take max of column
res = df.groupby('user_id').agg( { 'timestamp': [min, max], 'value': sum, 'positive_value_timestamp': max })
Comments
Post a Comment