dataframe - Extract and compare column data by date in R -
i using kaggle data set bike sharing. write script compares predicted values training data set. comparisons of mean month each year.
the training data set, call df looks this:
datetime count 1 2011-01-01 00:00:00 16 2 2011-01-11 01:00:00 40 3 2011-02-01 02:00:00 32 4 2011-02-11 03:00:00 13 5 2011-03-21 04:00:00 1 6 2011-03-11 05:00:00 1
my predicted values, call sub this:
datetime count 1 2011-01-01 00:00:00 42 2 2011-01-11 01:00:00 33 3 2011-02-01 02:00:00 33 4 2011-02-11 05:00:00 36 5 2011-03-21 06:00:00 57 6 2011-03-11 07:00:00 129
i have isolated month , year using lubridate package. concatenated month-date new column. used new column , split, use lapply find mean.
library(lubridate) df$monyear <- interaction( month(ymd_hms(df$datetime)), year(ymd_hms(df$datetime)), sep="-") s<-split(df,df$monyear) x <-lapply(s,function(x) colmeans(x[,c("count", "count")],na.rm=true))
but gives me average each month-year combination nested in list not easy compare. instead :
year-month train-mean sub-mean diff 1 2011-01 28 37.5 9.5 2 2011-02 22.5 34.5 12 3 2011-03 1 93 92
is there better way this?
something this. each of data sets:
library(dplyr) dftrain %>% group_by(monyear) %>% summarize(mc=mean(count)) -> xtrain dftest %>% group_by(monyear) %>% summarize(mc=mean(count)) -> xtest merged <- merge(xtrain, xtest, by="monyear")
Comments
Post a Comment