Random Forest

What

Random Forest is Decision Tree plus Bagging, with slight modification. A single Decision Tree often includes all given features and is responsible for predicting output on its own. Bagging(Bootstrap Aggregating), which is a voting mechanism, instead trains multiple Trees based on subsets of data and ask them to vote for the final answer. Random Forest takes a step further. Instead of including all the given features in every Decision Tree, Random Forest randomly selects M features for each Tree. Then each Tree is again trained on randomly chosen subset of data. The final decision is still based on the aggregated result from all Trees. Both Bagging and parameter dropping introduce stochasticity and thus improve generalization.

The goal of Bagging in general is to reduce variance in output. In comparison, Boosting is another technique aiming at reducing bias. Empirically Boosting tends to give higher test accuracy.

How

 

Advantages

Random Forest deals with high-dimensional data very well.

Generalizes well to unseen data, due to the random feature inclusion mechanism.

Caveats

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.