regex - Extract value from character array in R -
i have following charecter array
head(rest, n=20) [,1] [,2] [,3] [1,] "" "" "" [2,] "" "" "" [3,] "b" "-1" "-tv" [4,] "" "" "" [5,] "" "" "" [6,] "a" "" "" [7,] "" "" "" ... [2893,] "" "" "" [2894,] "" "" "" [2895,] "" "" "" [2896,] "st" "" "" [2897,] "2" "-th" "" [2898,] "1" "" ""
i extract capital letters, numbers , lower case letter while keeping index values.
i can find capital positions letters this
grep("[a-z]", rest, perl=true)
and values
grep("[a-z]", rest, perl=true, value=true)
but can't figure out how return value while keeping index.
i think might you're looking (using example data):
rest <- matrix(c('','','','','','','b','-1','-tv','','','','','','','a','','','','','','','','','','','','','','','st','','','2','-th','','1','',''),13,byrow=t); pat <- c('[a-z]','[0-9]','[a-z]'); name <- c('house','floor','side'); res <- setnames(as.data.frame(lapply(pat,function(x) { <- grep(x,rest); x <- rep('',nrow(rest)); x[(i-1)%%nrow(rest)+1] <- rest[i]; x; }),stringsasfactors=f),name); res; ## house floor side ## 1 ## 2 ## 3 b -1 -tv ## 4 ## 5 ## 6 ## 7 ## 8 ## 9 ## 10 ## 11 st ## 12 2 -th ## 13 1
actually that's not great demo because of dearth of populated cells, here's randomized data demo:
set.seed(9); r <- 12; c <- 3; n <- 5; rest <- matrix(sample(c(rstr(n,charset=letters,lmin=1,lmax=3),rstr(n,charset=letters,lmin=1,lmax=3),rstr(n,charset=0:9,lmin=1,lmax=3),rep('',r*c-n*3))),r); rest; ## [,1] [,2] [,3] ## [1,] "an" "" "" ## [2,] "895" "" "" ## [3,] "698" "" "" ## [4,] "zd" "" "32" ## [5,] "" "" "" ## [6,] "ck" "" "" ## [7,] "" "" "" ## [8,] "jwz" "" "r" ## [9,] "1" "j" "ix" ## [10,] "" "" "zfm" ## [11,] "k" "d" "" ## [12,] "" "" "252" pat <- c('[a-z]','[0-9]','[a-z]'); name <- c('house','floor','side'); res <- setnames(as.data.frame(lapply(pat,function(x) { <- grep(x,rest); x <- rep('',r); x[(i-1)%%r+1] <- rest[i]; x; }),stringsasfactors=f),name); res; ## house floor side ## 1 ## 2 895 ## 3 698 ## 4 32 zd ## 5 ## 6 ck ## 7 ## 8 jwz r ## 9 ix 1 j ## 10 zfm ## 11 d ## 12 252
note used little function wrote called rstr()
produce random string values. it's not relevant question haven't posted it, if want can provide in answer.
by chance in row 11 there's collision between 2 side
values. specified in comments can't happen in actual data, can see output code handles case gracefully; ends keeping rightmost value in row.
the new requirement of moving single-letter lowercase strings third column first, concatenating existing value in first column, can satisfied thusly (continuing second demo):
res$house <- ifelse(nchar(res$side)==1,paste0(res$house,res$side),res$house); res$side <- ifelse(nchar(res$side)==1,'',res$side); res; ## house floor side ## 1 ## 2 895 ## 3 698 ## 4 32 zd ## 5 ## 6 ck ## 7 ## 8 jwzr ## 9 ixj 1 ## 10 zfm ## 11 d ## 12 252
Comments
Post a Comment