CART

Classification and Regression Trees

Introduction

CART is one of the most popular DT methods because

  • it can perform both classification (predicting discrete categories) and regression (predicting continuous values) tasks,
  • It is analytically rigorous.

The use of DT requires a clear definition of the following 3 elements:

  • A way to select a split at every intermediate node,
  • A rule for determining if a node is terminal one,
  • A rule for assigning a value to each terminal node.

Explanation

  • CART works by recursively splitting the data at each internal node based on a single feature that best separates the data points into distinct classes (classification) or reduces the variance in the target variable (regression).
  • This process continues until a stopping criterion is met, such as reaching a maximum depth, achieving sufficient purity within a node (meaning all data points belong to the same class), or having too few data points remaining for a meaningful split.

Advantages of CART:

  • Interpretability: Decision trees are easy to understand and visualize, making it clear how the model arrives at its predictions.
  • Can handle both categorical and numerical features: CART doesn't require feature scaling or complex transformations, making it applicable to a wider range of data types.
  • Robust to outliers: Less sensitive to outliers in the data compared to some other algorithms.

Disadvantages of CART:

  • Prone to overfitting: Without proper controls, CART trees can become overly complex and memorize the training data, leading to poor performance on unseen data.
  • Variable importance can be unstable: The importance of features in the tree can be sensitive to small changes in the data.

Things to remember:

1. During the splitting at intermediate nodes, an object goes to left if the condition is met, else goes to right. [For continuous data, x<=condition, for nominal data, x={A,B,C,D}

2. The split is always binary

3. An attribute can be used multiple times.

4. In decision trees, the value stored in a terminal node depends on the type of data the tree is predicting: categorical or continuous.

  • for categorical data, the terminal node value reflects the most common class label (mode),
  • for continuous data, it represents the average target variable value (mean) within that group.

 

Examples

Outgoing relations