dataframe - Extract and compare column data by date in R -


i using kaggle data set bike sharing. write script compares predicted values training data set. comparisons of mean month each year.

the training data set, call df looks this:

              datetime count 1 2011-01-01 00:00:00    16 2 2011-01-11 01:00:00    40 3 2011-02-01 02:00:00    32 4 2011-02-11 03:00:00    13 5 2011-03-21 04:00:00     1 6 2011-03-11 05:00:00     1 

my predicted values, call sub this:

             datetime count 1 2011-01-01 00:00:00    42 2 2011-01-11 01:00:00    33 3 2011-02-01 02:00:00    33 4 2011-02-11 05:00:00    36 5 2011-03-21 06:00:00    57 6 2011-03-11 07:00:00   129 

i have isolated month , year using lubridate package. concatenated month-date new column. used new column , split, use lapply find mean.

library(lubridate) df$monyear <- interaction(     month(ymd_hms(df$datetime)),     year(ymd_hms(df$datetime)),     sep="-") s<-split(df,df$monyear) x <-lapply(s,function(x) colmeans(x[,c("count", "count")],na.rm=true)) 

but gives me average each month-year combination nested in list not easy compare. instead :

  year-month train-mean sub-mean diff 1 2011-01    28         37.5      9.5 2 2011-02    22.5       34.5      12 3 2011-03    1          93        92 

is there better way this?

something this. each of data sets:

library(dplyr) dftrain %>% group_by(monyear) %>% summarize(mc=mean(count)) -> xtrain dftest %>% group_by(monyear) %>% summarize(mc=mean(count)) -> xtest merged <- merge(xtrain, xtest, by="monyear") 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -