data.table - R Data Aggregation With WHERE Clause on Group -
as example, have data.table shown below. want simple aggregation b=sum(b). c, want value of record in c b maximum. desired output shown below (data.aggr). leads few questions:
1) there way data.table?
2) there simpler way in plyr?
3) in plyr output object got change data.table data.frame. can avoid behavior?
library(plyr) library(data.table) dt <- data.table(a=c('a', 'a', 'a', 'b', 'b'), b=c(1, 2, 3, 4, 5), c=c('m', 'n', 'p', 'q', 'r')) dt # b c # 1: 1 m # 2: 2 n # 3: 3 p # 4: b 4 q # 5: b 5 r dt.split <- split(dt, dt$a) dt.aggr <- ldply(lapply(dt.split, fun=function(dt){ dt[, .(b=sum(b), c=dt[b==max(b), c]), by=.(a)] }), .id='a') dt.aggr # b c # 1 6 p # 2 b 9 r class(dt.aggr) # [1] "data.frame"
this simple operation within data.table
scope
dt[, .(b = sum(b), c = c[which.max(b)]), = a] # b c # 1: 6 p # 2: b 9 r
a similar option be
dt[order(b), .(b = sum(b), c = c[.n]), = a]
Comments
Post a Comment