r - How to calculate count by group, then keep only one per group -


say have data.frame, data:

data <- data.frame(val=c(rep(6,10), rep(7, 15), rep(8, 20), rep(9, 25), rep(10, 100), rep(11, 20), rep(12, 15), rep(13, 10))) data$plus <- data$val + 100 

my goal create new data.frame has frequencies of each val, , associated plus value.

my current strategy create table (called table), merge frequencies. keep first observation within each group:

table <- table(data$val) df1 <- data.frame(val = as.integer(names(table)[1:length(table)]), n = table[1:length(table)]) df2 <- merge(data, df1) df3 <- do.call(rbind, by(df2, list(df2$val), fun=function(x) head(x, 1))) 

this works, seems clunky.

in stata, example, less , simpler code. like:

bys val plus: egen max = _n bys val plus: gen first = _n==1 keep if first==1 

is there way simplify or make more elegant r code?

here's approach using "data.table":

library(data.table) as.data.table(data)[, n := .n, = val][, .sd[1], = val] #    val plus   n # 1:   6  106  10 # 2:   7  107  15 # 3:   8  108  20 # 4:   9  109  25 # 5:  10  110 100 # 6:  11  111  20 # 7:  12  112  15 # 8:  13  113  10  ## or (@ricardosaporta) as.data.table(data)[, list(.n, plus=plus[1]), = val]  ## or (@davidarenburg) unique(as.data.table(data)[, n := .n, = val], = "val") 

with "dplyr", can try:

library(dplyr)  data %>%   group_by(val) %>%   mutate(n = n()) %>%   slice(1) 

in base r, guess can try like:

do.call(rbind, lapply(split(data, data$val),                        function(x) cbind(x, n = nrow(x))[1, ])) 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -