I had a previous blog post on Data Exploration using Oracle Data Miner 11gR2. This blog post builds on the steps illustrated in that blog post.
After we have explored the data we can identity some attributes/features that have just one value or mainly one value, etc. In most of these cases we know that these attributes will not contribute to the model build process.
In our example data set we have a small number of attributes. So it is easy to work through the data and get a good understanding of some of the underlying information that exists in the data. Some of these were pointed out in my previous blog post.
The reality is that our data sets can have a large number of attributes/features. So it will be very difficult or nearly impossible to work through all of these to get a good understanding of what is a good attribute to use, and keep in our data set, or what attribute does not contribute and should be removed from the data set.
Plus as our data evolves over time, the importance of the attributes will evolve with some becoming less important and some becoming more important.
The Attribute Importance node in Oracle Data Miner allows use to automate this work for us and can save us many hours or even days, in our work on this task.
The Attribute Importance node using the Minimum Description Length algorithm.
The following steps, builds on our work in my previous post, and shows how we can perform Attribute Importance on our data.
1. In the Component Palette, select Filter Columns from the Transforms list
2. Click on the workflow beside the data node.
3. Link the Data Node to the Filter Columns node. Righ-click on the data node, select Connect, move the mouse to the Filter Columns node and click. the link will be created
4. Now we can configure the Attribute Importance settings.Click on the Filter Columns node. In the Property Inspector, click on the Filters tab.
- Click on the Attribute Importance Checkbox
- Set the Target Attribute from the drop down list. In our data set this is Affinity Card
5. Right click the Filter Columns node and select Run from the menu
After everything has run, we get the little green box with the tick mark on the Filter Column node. To view the results we right clicking on the Filter Columns node and select View Data from the menu. We get the list of attributes listed in order of importance and their Importance measure.
We see that there are a number of attributes that have a zero value. It algorithm has worked out that these attributes would not be used in the model build step. If we look back to the previous blog post, some of the attributes we identified in it have also been listed here with a zero value.