Extracting a specific format substring (ID, code) from a string using R -


suppose have data frame composed of tweets harvested using twitter. want extract substring unique id data set, contained in each tweet. ids of same form, 3-4 uppercase letters, followed hyphen, , followed 6 digit number. examples are: yld-000123,ylsl-000323,ylp-000135. need id , can drop else on each tweet.

here 2 examples of tweets i'm working with:

  st1="elijo entertimer, ylc-000354, como ganador para  http://t.co/jcldk8d796 #younglionsco #fantasylions" st2="elijo #aesetrennomelesubo, ylsl-000169, como ganador para  http://t.co/wppm7x5ecn #younglionsco #fantasylions" tweets=c(st1,st2) 

the result need "ylc-000354" "ylsl-000169". id not between commas.

an approach using gsub:

gsub('.*[^[:alpha:]]([[:alpha:]]+-\\d+).*','\\1',tweets) #[1] "ylc-000354"  "ylsl-000169" 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -