How to get the Gini coefficient using random forests in the caret R package? -


i'm trying understand difference between random forest implementation in randomforest package , in caret package.

for example, specifies 2000 trees mtry = 2 in randomforest , show gini coefficient each predictor:

library(randomforest) library(tidyr)  rf1 <- randomforest(species ~ ., data = iris,                        ntree = 2000, mtry = 2,                       importance = true) data.frame(rf = sort(importance(rf1)[, "meandecreasegini"], decreasing = true)) %>% add_rownames() %>% rename(predictor = rowname) #      predictor       rf # 1  petal.width 45.57974 # 2 petal.length 41.61171 # 3 sepal.length  9.59369 # 4  sepal.width  2.47010 

i'm trying same info in caret, don't know how specify number of trees, or how gini coefficient:

rf2 <- train(species ~ ., data = iris, method = "rf",               metric = "kappa",                tunegrid = data.frame(mtry = 2)) varimp(rf2) # not gini coefficient #              overall # petal.length 100.000 # petal.width   99.307 # sepal.width    0.431 # qsepal.length  0.000 

also, confusion matrix of rf1 has errors , of rf2 doesn't. parameter causing difference?:

# rf1 confusion matrix: #            setosa versicolor virginica class.error # setosa         50          0         0        0.00 # versicolor      0         47         3        0.06 # virginica       0          4        46        0.08  table(predict(rf2, iris), iris$species) #             setosa versicolor virginica #  setosa         50          0         0 #  versicolor      0         50         0 #  virginica       0          0        50 

this quick , dirty. know isn't right way test performance of classifier, dont' understand difference in results.

this might answer question - see 2nd post:

caret: using random forest , include cross-validation

randomforest sampling replacement. if use "rf" in caret, need specify trcontrol in train::caret(); want same resampling method used in caret i.e. bootstrap, need set trcontrol="oob". trcontrol list of values defines how function acts; can set "cv" cross validation, "repeatedcv" repeated cross validation etc. see caret package documentation more info.

you should same result using randomforest, remember set seeds properly.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -