r - Size of nested vs. unested (tidy) data.frame? -


this question uses data.frame contains list-columns (nested). had me wondering why/if there's advantage working way. assumed want minimize amount of memory each table uses...but when checked surprised:

compare table sizes nested vs. tidy format:

1. generate nested/tidy versions of 2-col , 5-col data.frame:

    library(pryr)     library(dplyr)     library(tidyr)     library(ggvis)     n <- 1:1e6     df <- data_frame(id = n, vars = lapply(n, function(x)  x <- sample(letters,sample(1:26,1))))     dfu <- df %>% unnest(vars)     df_morecols <- data_frame(id = n, other1 = n, other2 = n, other3 = n,                      vars = lapply(n, function(x)  x <- sample(letters,sample(1:26,1))))     dfu_morecols <- df_morecols %>% unnest(vars) 

they like:

    head(df)     #> source: local data frame [6 x 2]      #>   id      vars     #> 1  1 <chr[16]>     #> 2  2  <chr[4]>     #> 3  3 <chr[26]>     #> 4  4  <chr[9]>     #> 5  5 <chr[11]>     #> 6  6 <chr[18]>      head(dfu)     #> source: local data frame [6 x 2]      #>   id vars     #> 1  1    k     #> 2  1    d     #> 3  1    s     #> 4  1    j     #> 5  1    m     #> 6  1    t      head(df_morecols)     #> source: local data frame [6 x 5]      #>   id other1 other2 other3      vars     #> 1  1      1      1      1  <chr[4]>     #> 2  2      2      2      2 <chr[22]>     #> 3  3      3      3      3 <chr[24]>     #> 4  4      4      4      4  <chr[6]>     #> 5  5      5      5      5 <chr[15]>     #> 6  6      6      6      6 <chr[11]>      head(dfu_morecols)     #> source: local data frame [6 x 5]      #>   id other1 other2 other3 vars     #> 1  1      1      1      1    r     #> 2  1      1      1      1    p     #> 3  1      1      1      1    s     #> 4  1      1      1      1    w     #> 5  2      2      2      2    l     #> 6  2      2      2      2    j 

2. calculate object sizes , col sizes

from: lapply(list(df,dfu,df_morecols,dfu_morecols),object_size)

170 mb vs. 162 mb nested vs. tidy 2-col df
170 mb vs. 324 mb nested vs. tidy 5-col df

    col_sizes <- sapply(c(df,dfu,df_morecols,dfu_morecols),object_size)     col_names <- names(col_sizes)     parent_obj <- c(rep(c('df','dfu'),each = 2),                     rep(c('df_morecols','dfu_morecols'),each = 5))     res <- data_frame(parent_obj,col_names,col_sizes) %>%        unite(elementof, parent_obj,col_names, remove = f) 

3. plot columns sizes coloured parent object:

    res %>%        ggvis(y = ~elementof, x = ~0, x2 = ~col_sizes, fill = ~parent_obj) %>%        layer_rects(height = band()) 

plot of sizes

questions:

  • what explains smaller footprint of tidy 2-col df compared nested one?
  • why doesn't effect change 5-col df?


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -