Call data.frame columns inside of R functions? -
what proper way this?
i have function works great on own given series of inputs , i'd use function on large dataset rather singular values looping through data row. have tried update function call data.frame columns rather vector values, have been unsuccessful.
a simple example of is:
let's have date.frame 4 columns, data$id, data$height, data$weight, data$gender. want write function loop on each row (using apply) , calculate bmi (kg/m^2). know easy dplyr learn how without resorting external packages can't find clear answer how reference columns within function.
apologize in advance if duplicate. i've been searching stackoverflow pretty thoroughly in hopes of finding exisiting example.
i think you're looking for. easiest way refer columns of data frame functionally use quoted column names. in principle, you're doing this
data[, "weight"] / data[, "height"]^2
but inside function might want let user specify height or weight column named differently, can write function
add_bmi = function(data, height_col = "height", weight_col = "weight") { data$bmi = data[, weight_col] / data[, height_col] return(data) }
this function assume columns use named "height" , "weight" default, user can specify other names if necessary. similar solution using column indices instead, using names tends easier debug.
functions simple useful. if you're calculating bmi lot of datasets maybe worth keeping function around, since one-liner in base r don't need it.
my_data$bmi = with(my_data, weight / height^2)
one note using column names stored in variables means can't use $
. price pay making things more programmatic, , it's habit form such applications. see fortunes::fortune(343)
:
sooner or later r beginners bitten convenient shortcut. r newbie, think of r bank account: overuse of $-extraction can lead undesirable consequences. it's best acquire '[[' , '[' habit early.
-- peter ehlers (about use of $-extraction) r-help (march 2013)
for fancier usage dplyr
don't have quote column names , such (and can evaluate expressions), lazyeval
package makes things relatively painless , has nice vignettes.
the base function with
can used lazy evaluating, e.g.,
with(mtcars, plot(disp, mpg)) # nice plot(mtcars$disp, mtcars$mpg)
but with
best used interactively , in straightforward scripts. if writing programmatic production code (e.g., own r package), it's safer avoid non-standard evaluation. see, example, warning in ?subset
, base r function uses non-standard evaluation.
Comments
Post a Comment