r - Removing loops in RecordLinkage -


i using recordlinkage package in r deduplicate dataset. deduped output recordlinkage package has loops in it.

for example:

table rlinkage

    id name           id2  name2       1  jane johnson   5    jane johnson       5  jane johnson   17   jane johnson 

i trying make table lists each id associated other id numbers in loop of records.

for example:

    id1  id2  id3  name       1    5    17   jane johnson 

or

    name          ids     jane johnson  1,5,17 

is possible in r? tried using sqldf package join dataset onto multiple times try , id's on same line.

for example:

    rlinkage2 <-sqldf('select a.id,      a.id2,      b.id id3     b.id2 id4     rlinkage     left join rlinkage b     on a.id = b.id     or a.id = b.id2     or a.id2 = b.id     or a.id2 = b.id2') 

this creates messy dataset , not put of id's on same line unless join table rlinkage many times. there better way this?

1) sqldf using sqldf union 2 sets of columns , use group_concat

sqldf("select name, group_concat(distinct id) ids (          select id, name rlinkage           union           select id2 id, name2 name rlinkage        ) group name") 

giving:

          name    ids 1 jane johnson 1,5,17 

2) rbind/aggregate plain r:

long <- rbind(rlinkage[1:2], setnames(rlinkage[3:4], names(rlinkage)[1:2])) aggregate(id ~ name, long, function(x) tostring(unique(x))) 

giving:

          name       id 1 jane johnson 1, 5, 17 

note: used data:

lines <- "id,name,id2,name2 1,jane johnson,5,jane johnson 5,jane johnson,17,jane johnson"  rlinkage <- read.csv(text = lines, as.is = true) 

Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -