Variable Importance in Random Forests

When designing classifiers the success of a classifier largely depends on the selection of proper spectral descriptors. While the selection of descriptors can be achieved by many techniques, random forests provide some kind of a built-in support for selecting the right variables. When random forests are trained the algorithm tracks how often each descriptor is used by the trees of the forest and how many of the training data points are affected by the decision within a tree.

This information can be compiled into a characteristic number which reflects the importance of a variable. The variable importance is calculated for each class separately, and in addition, the overall importance for all classes is calculated as well. The results are displayed both in tabular and graphical form, and can be used to prune the list of descriptors. The overall importance is calculated by determining the maximum for each descriptor over all classes.

Please note that the variable importance is a relative measure and it is scaled to a maximum of 1.0 for each class. Thus the variable importance has to be judged in combination with the classification results.

The following example shows an example of the importance of variables used to detect an apple. One can clearly see that for successfully detecting apples only 10 of a total of 111 descriptors are actually necessary.

How To:
  1. After calculating the random forest classifier switch to the Variable Importance tab.
  2. Browse through the variable importance (VIP) plots of all classes to get an overview.
  3. Switch to the total variable importance (class 0).
  4. Move the red horizontal cursor line in the VIP chart such that only the descriptors with high VIP values are selected (indicated by red lines in the graph).
  5. Click the "replace" button to copy the selected descriptors to the descriptor list at the top left (hint: you can save the reduced set of spectral descriptors by clicking the "save" button at the top left of the window).
  6. Retrain the classifier by clicking the "Calculate" button.