r - How to calculate count by group, then keep only one per group -
say have data.frame, data
:
data <- data.frame(val=c(rep(6,10), rep(7, 15), rep(8, 20), rep(9, 25), rep(10, 100), rep(11, 20), rep(12, 15), rep(13, 10))) data$plus <- data$val + 100
my goal create new data.frame
has frequencies of each val
, , associated plus
value.
my current strategy create table (called table
), merge frequencies. keep first observation within each group:
table <- table(data$val) df1 <- data.frame(val = as.integer(names(table)[1:length(table)]), n = table[1:length(table)]) df2 <- merge(data, df1) df3 <- do.call(rbind, by(df2, list(df2$val), fun=function(x) head(x, 1)))
this works, seems clunky.
in stata, example, less , simpler code. like:
bys val plus: egen max = _n bys val plus: gen first = _n==1 keep if first==1
is there way simplify or make more elegant r code?
here's approach using "data.table":
library(data.table) as.data.table(data)[, n := .n, = val][, .sd[1], = val] # val plus n # 1: 6 106 10 # 2: 7 107 15 # 3: 8 108 20 # 4: 9 109 25 # 5: 10 110 100 # 6: 11 111 20 # 7: 12 112 15 # 8: 13 113 10 ## or (@ricardosaporta) as.data.table(data)[, list(.n, plus=plus[1]), = val] ## or (@davidarenburg) unique(as.data.table(data)[, n := .n, = val], = "val")
with "dplyr", can try:
library(dplyr) data %>% group_by(val) %>% mutate(n = n()) %>% slice(1)
in base r, guess can try like:
do.call(rbind, lapply(split(data, data$val), function(x) cbind(x, n = nrow(x))[1, ]))
Comments
Post a Comment