An impure node refers to a node that contains data points with mixed classifications/labels, where the data has not yet been perfectly separated into distinct categories.
In decision tree algorithms, quantifying impurity is a crucial step for determining how to split the nodes of the tree. Impurity measures help identify the best feature and the best threshold for splitting the data at each node. The most common impurity measures are Gini impurity, entropy (information gain), and mean squared error (for regression trees). Below, I'll explain each of these impurity measures and how they are used in decision trees.
Impurity can be quantified using:
Gini coefficient: 1-(
Weighted Gini = 1 - Σ (wi * ni / tc)^2
where: