I am trying to sum probes count within subgroup of a dataframe in R -
this question has answer here:
i biochemistry student working breast cancer data (copy number probes). have following data frames primary breast tumors:
patient chrom start end probecount 1 1 51599 62640 8 1 1 88466 16022503 8676 1 2 2785 285255 186 1 2 290880 4178544 2903 ... 2 1 51599 4098530 1282 2 1 4101675 46753618 25229 2 2 2785 36178040 25931 2 2 36185342 36192717 21 ...
i add add fifth column add total of probecounts each chrom of each patient:
patient chrom start end probecount total 1 1 51599 62640 8 8684 1 1 88466 16022503 8676 8684 1 2 2785 285255 186 3089 1 2 290880 4178544 2903 3089 ... 2 1 51599 4098530 1282 26511 2 1 4101675 46753618 25229 26511 2 2 2785 36178040 25931 25952 2 2 36185342 36192717 21 25952 ...
there must simple function this. aggregate? appreciate if please give me hint. thank you!
one way dplyr
package following. data frame called mydf
.
library(dplyr) group_by(mydf, patient, chrom) %>% mutate(whatever = sum(probecount)) #source: local data frame [8 x 6] #groups: patient, chrom # # patient chrom start end probecount whatever #1 1 1 51599 62640 8 8684 #2 1 1 88466 16022503 8676 8684 #3 1 2 2785 285255 186 3089 #4 1 2 290880 4178544 2903 3089 #5 2 1 51599 4098530 1282 26511 #6 2 1 4101675 46753618 25229 26511 #7 2 2 2785 36178040 25931 25952 #8 2 2 36185342 36192717 21 25952
if data large, may want use data.table
.
library(data.table) setdt(mydf)[, whatever := sum(probecount), = list(patient, chrom)][] # patient chrom start end probecount whatever #1: 1 1 51599 62640 8 8684 #2: 1 1 88466 16022503 8676 8684 #3: 1 2 2785 285255 186 3089 #4: 1 2 290880 4178544 2903 3089 #5: 2 1 51599 4098530 1282 26511 #6: 2 1 4101675 46753618 25229 26511 #7: 2 2 2785 36178040 25931 25952 #8: 2 2 36185342 36192717 21 25952
Comments
Post a Comment