r - Create a character vector column of predefined text and bind it to existing dataframe using rbind or bind

good day,

i present 2 [likely] puny problems excellent review.

problem #1

i have relatively tidy df (dat) dim 10299 x 563. 563 variables common both datasets [that created] dat 'subject' (numeric), 'label' (numeric), 3:563 (variable names text file). observations 1:2947 'test' dataset whereas observations 2948:10299 'training' dataset.

i'd insert column (header = 'type') dat rows 1:2947 comprised of string test , rows 2948:10299 of string train way can group later on dataset or other similar aggregate functions in dplyr/tidyr.

i created test df (testdf = 1:10299: dim(testdf) = 102499 x 1) , then:

testdat[1:2947 , "type"] <- c("test") testdat[2948:10299, "type"] <- c("train") > head(ds, 2);tail(ds, 2)   x1.10299 type 1        1 test 2        2 test       x1.10299  type 10298    10298 train 10299    10299 train

so don't there column of x1.10299.

questions:

is there better , more expedient way create column has i'm looking based upon use case above?
what way insert column 'dat' can use later grouping dplyr?

problem #2

the way arrived @ [nearly] tidy df (dat) above 2 take dfs (test , train) of form dim(2947 x 563 , 7352 x 563), respectively, , rbinding them together.

i confirm of variable names present after binding effort this:

test.names <- names(test) train.names <- names(train) identical(test.names, train.names) > true

what interesting , of primary concern if try use bind_rows function 'dplyr' perform same binding exercise:

dat <- bind_rows(test, train)

it returns dataframe apparently keeps of observations (x: 10299) variable count reduced 563 470!

question:

does know why variables being chopped?
is best way combine 2 dfs of same structure later slicing/dicing dplyr/

tidyr?

thank time , consideration of these matters.

sample test/train dfs review (the left numeric df indices):

test df test[1:10, 1:5]

   subject labels tbodyacc-mean()-x tbodyacc-mean()-y tbodyacc-mean()-z 1        2      5         0.2571778       -0.02328523       -0.01465376 2        2      5         0.2860267       -0.01316336       -0.11908252 3        2      5         0.2754848       -0.02605042       -0.11815167 4        2      5         0.2702982       -0.03261387       -0.11752018 5        2      5         0.2748330       -0.02784779       -0.12952716 6        2      5         0.2792199       -0.01862040       -0.11390197 7        2      5         0.2797459       -0.01827103       -0.10399988 8        2      5         0.2746005       -0.02503513       -0.11683085 9        2      5         0.2725287       -0.02095401       -0.11447249 10       2      5         0.2757457       -0.01037199       -0.09977589

train df train[1:10, 1:5]

   subject label tbodyacc-mean()-x tbodyacc-mean()-y tbodyacc-mean()-z 1        1     5         0.2885845      -0.020294171        -0.1329051 2        1     5         0.2784188      -0.016410568        -0.1235202 3        1     5         0.2796531      -0.019467156        -0.1134617 4        1     5         0.2791739      -0.026200646        -0.1232826 5        1     5         0.2766288      -0.016569655        -0.1153619 6        1     5         0.2771988      -0.010097850        -0.1051373 7        1     5         0.2794539      -0.019640776        -0.1100221 8        1     5         0.2774325      -0.030488303        -0.1253604 9        1     5         0.2772934      -0.021750698        -0.1207508 10       1     5         0.2805857      -0.009960298        -0.1060652

actual code (ignore function calls/i'm doing of testing via console).

[http://archive.ics.uci.edu/ml/machine-learning-databases/00240/]the data set i'm using code. 1

run_analysis <- function () {     #vars available use throughout function should preserved     vars <- read.table("features.txt", header = false, sep = "")     lookup_table <- data.frame(activitynum = c(1,2,3,4,5,6),                                 activity_label = c("walking", "walking_up",                                                    "walking_down", "sitting",                                                    "standing", "laying"))     test <- test_read_process(vars, lookup_table)     train <- train_read_process(vars, lookup_table) }  test_read_process <- function(vars, lookup_table) {     #read in 3 documents cbinding later     test.sub <- read.table("test/subject_test.txt", header = false)     test.labels <- read.table("test/y_test.txt", header = false)     test.obs <- read.table("test/x_test.txt", header = false, sep = "")      #cbind cols , set remaining colnames var names in vars     test.dat <- cbind(test.sub, test.labels, test.obs)       colnames(test.dat) <- c("subject", "labels", as.character(vars[,2]))      #use lookup_table set "test_labels" string values correspond     #to integer ids     #test.lookup <- merge(test, lookup_table, by.x = "labels",      #               by.y ="activitynum", all.x = t)      #remove temporary symbols globalenv/memory     rm(test.sub, test.labels, test.obs)      #return     return(test.dat) }  train_read_process <- function(vars, lookup_table) {     #read in 3 documents cbinding     train.sub <- read.table("train/subject_train.txt", header = false)     train.labels <- read.table("train/y_train.txt", header = false)     train.obs <- read.table("train/x_train.txt", header = false, sep = "")      #cbind cols , set remaining colnames var names in vars     train.dat <- cbind(train.sub, train.labels, train.obs)         colnames(train.dat) <- c("subject", "label", as.character(vars[,2]))      #clean temporary symbols globalenv/memory     rm(train.sub, train.labels, train.obs, vars)      return(train.dat) }

the problem you're facing stems fact have duplicated names in variable list you're using create data frame objects. if ensure column names unique , shared between objects code run. i've included working example based on code used above (with fixes , various edits noted in comments):

vars <- read.table(file="features.txt", header=f, stringsasfactors=f)  ##  frs: source of original problem: duplicated(vars[,2]) vars[317:340,2] duplicated(vars[317:340,2]) vars[396:419,2]  ##  frs: edited following both account data , variable ##    issues: test_read_process <- function() {   #read in 3 documents cbinding later   test.sub <- read.table("test/subject_test.txt", header = false)   test.labels <- read.table("test/y_test.txt", header = false)   test.obs <- read.table("test/x_test.txt", header = false, sep = "")    #cbind cols , set remaining colnames var names in vars   test.dat <- cbind(test.sub, test.labels, test.obs)     #colnames(test.dat) <- c("subject", "labels", as.character(vars[,2]))   colnames(test.dat) <- c("subject", "labels", paste0("v", 1:nrow(vars)))    return(test.dat) }  train_read_process <- function() {   #read in 3 documents cbinding   train.sub <- read.table("train/subject_train.txt", header = false)   train.labels <- read.table("train/y_train.txt", header = false)   train.obs <- read.table("train/x_train.txt", header = false, sep = "")    #cbind cols , set remaining colnames var names in vars   train.dat <- cbind(train.sub, train.labels, train.obs)       #colnames(train.dat) <- c("subject", "labels", as.character(vars[,2]))   colnames(train.dat) <- c("subject", "labels", paste0("v", 1:nrow(vars)))    return(train.dat) }   test_df <- test_read_process() train_df <- train_read_process()  identical(names(test_df), names(train_df))   library("dplyr")  ## frs: these piped i've kept them separate clarity: train_df %>%   mutate(test="train") ->    train_df  test_df %>%   mutate(test="test") ->    test_df  test_df %>%    bind_rows(train_df) ->    out_df  head(out_df) out_df  ##  frs: can set column names of original  ##    variable list still have duplicates deal with: names(out_df) <- c("subject", "labels", as.character(vars[,2]), "test")  duplicated(names(out_df))

Search This Blog

Braziel

r - Create a character vector column of predefined text and bind it to existing dataframe using rbind or bind_rows -

Comments

Post a Comment

Popular posts from this blog

javascript - Add class to another page attribute using URL id - Jquery -

android - MPAndroidChart - How to add Annotations or images to the chart -

IF statement in MySQL trigger -