RF generates an ensemble of decision trees during training.
Each tree is the result of applying CART to a random selection of attributes/features at each node
And of using a random subset of the original input data (chosen with
replacement, -- bootstrapping || Bagging = bootstrapping aggregation)
Response variables are obtained by voting/averaging over the ensemble (simple majority)
Advantages
Input data: N training cases, each with M variables
n out of N samples are chosen with replacements (bootstrapping).
Rest of samples to estimate error of the tree (out of bag).
m << M variables are used to determine the decision at the node of the tree
Each tree is fully grown and not pruned
Output of the ensemble: aggregation of the output of the trees
Advantages
Disadvantages
Summary of Steps in Random Forest with Bootstrapping
Example
Assume you have a dataset with 100 data points. To build a Random Forest with 10 trees: