How to get the Gini coefficient using random forests in the caret R package? -
i'm trying understand difference between random forest implementation in randomforest
package , in caret
package.
for example, specifies 2000 trees mtry = 2
in randomforest
, show gini coefficient each predictor:
library(randomforest) library(tidyr) rf1 <- randomforest(species ~ ., data = iris, ntree = 2000, mtry = 2, importance = true) data.frame(rf = sort(importance(rf1)[, "meandecreasegini"], decreasing = true)) %>% add_rownames() %>% rename(predictor = rowname) # predictor rf # 1 petal.width 45.57974 # 2 petal.length 41.61171 # 3 sepal.length 9.59369 # 4 sepal.width 2.47010
i'm trying same info in caret
, don't know how specify number of trees, or how gini coefficient:
rf2 <- train(species ~ ., data = iris, method = "rf", metric = "kappa", tunegrid = data.frame(mtry = 2)) varimp(rf2) # not gini coefficient # overall # petal.length 100.000 # petal.width 99.307 # sepal.width 0.431 # qsepal.length 0.000
also, confusion matrix of rf1
has errors , of rf2
doesn't. parameter causing difference?:
# rf1 confusion matrix: # setosa versicolor virginica class.error # setosa 50 0 0 0.00 # versicolor 0 47 3 0.06 # virginica 0 4 46 0.08 table(predict(rf2, iris), iris$species) # setosa versicolor virginica # setosa 50 0 0 # versicolor 0 50 0 # virginica 0 0 50
this quick , dirty. know isn't right way test performance of classifier, dont' understand difference in results.
this might answer question - see 2nd post:
caret: using random forest , include cross-validation
randomforest sampling replacement. if use "rf" in caret, need specify trcontrol in train::caret(); want same resampling method used in caret i.e. bootstrap, need set trcontrol="oob". trcontrol list of values defines how function acts; can set "cv" cross validation, "repeatedcv" repeated cross validation etc. see caret package documentation more info.
you should same result using randomforest, remember set seeds properly.
Comments
Post a Comment