Random Forest

Random forest (RF) is machine learning algorithm which combines the output of multiple decision trees to reach a single result.

Introduction

Leo Breiman continued working on DT and around the year 2000 he
found and demonstrated that regression results and classification
accuracy can be improved by using ensembles of trees where each tree
grown in a “random” fashion.

Advantages

  1. No purning needed
  2. High Accuracy
  3. Provides variable importance
  4. No overfitting, not very sensitive to outliers
  5. Fast and easy to implement
  6. Easily paralellized 

Disadvantages

  1. Cannot predict beyond range of input parameters
  2. Smoothing extreme values (underestimate high values, overestimate low values)
  3. More difficult to visualize or interpret

Explanation

Random forest algorithm (source: Javatpoint)

Random forest algorithm:

  • Input data: N training cases each with M variables
  • n out of N samples are choosen with replacement (bootstrapping)
  • Rest of the samples to estimate the error of the tree (out of bag)
  • m << M variables are used to determine the decision at a node of the tree
  • Each tree is fully grown and not purned
  • Output of the ensemble: aggregation of the outputs of the trees

 

Response variables:

  • Majority, for classification
  • Average, for regression

Outgoing relations

Incoming relations