randomForest {RFO} | R Documentation |
randomForest
implements Breiman's random forest algorithm (based on
Breiman and Cutler's original Fortran code) for classification.
## S3 method for class 'formula' randomForest(formula, data = NULL, ..., subset, na.action = na.fail) ## Default S3 method: randomForest(x, y, ntree = 500, mtry = floor(sqrt(ncol(x))), replace = TRUE, classwt = NULL, cutoff, sampsize = if (replace) nrow(x) else ceiling(.632*nrow(x)), nodesize = if (!is.null(y) && !is.factor(y)) 5 else 1, maxnodes = NULL, na.action = na.fail, internal = FALSE, ...) ## S3 method for class 'randomForest' print(x, ...)
data |
an optional data frame containing the variables in the model.
By default the variables are taken from the environment which
|
subset |
an index vector indicating which rows should be used. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if NAs are found. (NOTE: If given, this argument must be named.) |
x, formula |
a data frame or a matrix of predictors, or a formula
describing the model to be fitted (for the
|
y |
A response vector of |
ntree |
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. |
mtry |
Number of variables randomly sampled as candidates at each split. |
replace |
Should sampling of cases be done with or without replacement? |
classwt |
Priors of the classes. Need not add up to one. |
cutoff |
A vector of length equal to number of classes. The ‘winning’ class for an observation is the one with the maximum ratio of proportion of votes to cutoff. Default is 1/k where k is the number of classes (i.e., majority vote wins). |
sampsize |
Size(s) of sample to draw. |
nodesize |
Minimum size of terminal nodes. Setting this number larger causes smaller trees to be grown (and thus take less time). |
maxnodes |
Maximum number of terminal nodes trees in the forest
can have. If not given, trees are grown to the maximum possible
(subject to limits by |
internal |
For internal test only. |
... |
optional parameters to be passed to the low level function
|
An object of class randomForest
, which is a list with the
following components:
call |
the original call to |
type |
|
classes |
the classes of the target. |
ntree |
number of trees grown. |
mtry |
number of predictors sampled for spliting at each node. |
forest |
a list that contains the entire forest. |
cutoff |
the cutoff vector used to build the model. |
ncat |
the number of levels of the attributes. |
attr.names |
the names of the attributes. |
xlevels |
the levels of the attributes. |
For large data sets, especially those with large number of variables,
calling randomForest
via the formula interface is not advised:
There may be too much overhead in handling the formula.
Lei Zhang lei.c.zhang@oracle.com, Andy Liaw andy\_liaw@merck.com and Matthew Wiener matthew\_wiener@merck.com, based on original Fortran code by Leo Breiman and Adele Cutler.
Breiman, L. (2001), Random Forests, Machine Learning 45(1), 5-32.
Breiman, L (2002), “Manual On Setting Up, Using, And Understanding Random Forests V3.1”, http://oz.berkeley.edu/users/breiman/Using_random_forests_V3.1.pdf.
## Classification: ##data(iris) set.seed(71) iris.rf <- randomForest(Species ~ ., data=iris) print(iris.rf) ## "x" can be a matrix instead of a data frame: set.seed(17) x <- matrix(runif(5e2), 100) y <- gl(2, 50) (myrf <- randomForest(x, y)) (predict(myrf, x)) ## Grow no more than 4 terminal nodes per tree: rf <- randomForest(Species ~ ., data=iris, maxnodes=4, ntree=30)