Extracting a specific format substring (ID, code) from a string using R -

suppose have data frame composed of tweets harvested using twitter. want extract substring unique id data set, contained in each tweet. ids of same form, 3-4 uppercase letters, followed hyphen, , followed 6 digit number. examples are: yld-000123,ylsl-000323,ylp-000135. need id , can drop else on each tweet.

here 2 examples of tweets i'm working with:

  st1="elijo entertimer, ylc-000354, como ganador para  http://t.co/jcldk8d796 #younglionsco #fantasylions" st2="elijo #aesetrennomelesubo, ylsl-000169, como ganador para  http://t.co/wppm7x5ecn #younglionsco #fantasylions" tweets=c(st1,st2)

the result need "ylc-000354" "ylsl-000169". id not between commas.

an approach using gsub:

gsub('.*[^[:alpha:]]([[:alpha:]]+-\\d+).*','\\1',tweets) #[1] "ylc-000354"  "ylsl-000169"

Search This Blog

Braziel

Extracting a specific format substring (ID, code) from a string using R -

Comments

Post a Comment

Popular posts from this blog

javascript - Add class to another page attribute using URL id - Jquery -

android - MPAndroidChart - How to add Annotations or images to the chart -

IF statement in MySQL trigger -