Thursday, November 3, 2011

ODM 11.2 Data Dictionary Views.

The Oracle 11.2 database contains the following Oracle Data Mining views. These allow you to query the database for the metadata relating to what Data Mining Models you have, what the configurations area and what data is involved.

ALL_MINING_MODELS

Describes the high level information about the data mining models in the database.  Related views include DBA_MINING_MODELS and USER_MINING_MODELS.

Attribute Data Type Description
OWNER Varchar2(30) NN Owner of the mining model
MODEL_NAME Varchar2(30) NN Name of the mining model
MINING_FUNCTION Varchar2(30) What data mining function to use
CLASSIFICATION
REGRESSION
CLUSTERING
FEATURE_EXTRACTION
ASSOCIATION_RULES
ATTRIBUTE_IMPORTANCE
ALGORITHM Varchar2(30) Algorithm used by the model
NAIVE_BAYES
ADAPTIVE_BAYES_NETWORK
DECISION_TREE
SUPPORT_VECTOR_MACHINES
KMEANS
O_CLUSTER
NONNEGATIVE_MATRIX_FACTOR
GENERALIZED_LINEAR_MODEL
APRIORI_ASSOCIATION_RULES
MINIMUM_DESCRIPTION_LENGTH
CREATION_DATE Date NN Date model was created
BUILD_DURATION Number Time in seconds for the model build process
MODEL_SIZE Number Size of model in MBytes
COMMENTS Varchar2(4000)  
Lets query the my DMUSER2 data mining schema. This was created during a previous post where we exported some ODM models from schema and loaded them into DMUSER2 schema

SELECT model_name, 
       mining_function,
       algorithm,
       build_duration,
       model_size
FROM ALL_MINING_MODELS;

MODEL_NAME     MINING_FUNCTION  ALGORITHM                      BUILD_DURATION MODEL_SIZE
-------------  ---------------- -------------------------- -------------- ----------
CLAS_SVM_1_6   CLASSIFICATION    SUPPORT_VECTOR_MACHINES                     3      .1515
CLAS_DT_1_6    CLASSIFICATION    DECISION_TREE                               2      .0842
CLAS_GLM_1_6   CLASSIFICATION    GENERALIZED_LINEAR_MODEL                    3      .0877
CLAS_NB_1_6    CLASSIFICATION    NAIVE_BAYES                                 2      .0459

ALL_MINING_MODEL_ATTRIBUTES

Describes the attributes of the data mining models.  Related views are DBA_MINING_MODEL_ATTRIBUTES and USER_MINING_MODEL_ATTRIBUTES.

Attribute Data Type Description
OWNER Varchar2(30) NN Owner of the mining model
MODEL_NAME Varchar2(30) NN Name of the mining mode
ATTRIBUTE_NAME Varchar2(30) NN Name of the attribute
ATTRIBUTE_TYPE Varchar2(11) Logical type of attribute
NUMERICAL – numeric data
CATEGORICAL – character data
DATA_TYPE Varchar2(12) Data type of attribute
DATA_LENGTH Number Length of data type
DATA_PRECISION Number Precision of a fixed point number
DATA_SCALE Number Scale of the fixed point number
USAGE_TYPE Varchar2(8) Indicated if the attribute was used to create the model (ACTIVE) or not (INACTIVE)
TARGET Varchar2(3) Indicates if the attribute is the target

If we take one of our data mining models that was listed about and select what attributes are used by that model;

SELECT attribute_name,
       attribute_type,
       usage_type,
       target
from all_mining_model_attributes
where model_name = 'CLAS_DT_1_6';

ATTRIBUTE_NAME                 ATTRIBUTE_T USAGE_TY TAR
------------------------------ ----------- -------- ---
AGE                            NUMERICAL   ACTIVE   NO
CUST_MARITAL_STATUS            CATEGORICAL ACTIVE   NO
EDUCATION                      CATEGORICAL ACTIVE   NO
HOUSEHOLD_SIZE                 CATEGORICAL ACTIVE   NO
OCCUPATION                     CATEGORICAL ACTIVE   NO
YRS_RESIDENCE                  NUMERICAL   ACTIVE   NO
Y_BOX_GAMES                    NUMERICAL   ACTIVE   NO
AFFINITY_CARD                  CATEGORICAL ACTIVE   YES

The first thing to note here is that all the attributes are listed as ACTIVE. This is the default and will be the case for all attributes for all the algorithms, so we can ignore this attribute in our queries, but it is good to check just in case.

The second thing to note is for the last row we have the AFFINITY_CARD has a target attribute value of YES. This is the target attributes used by the classification algorithm.

ALL_MINING_MODEL_SETTINGS

Describes the setting of the data mining models. The settings associated with a model are algorithm dependent. The Setting values can be provided as input to the model build process. Alternatively, separate settings table can used.  If no setting values are defined of provided, then the algorithm will use its default settings.

Attribute Data Type Description
OWNER Varchar2(30) NN Owner of the mining model
MODEL_NAME Varchar2(30) NN Name of the mining model
SETTING_NAME Varchar2(30) NN Name of the Setting
SETTING_VALUE Varchar2(4000) Value of the Setting
SETTING_TYPE Varchar2(7) Indicates whether the default value (DEFAULT) or a user specified value (INPUT) is used by the model

Lets take our previous example of the 'CLAS_DT_1_6' model and query the database to see what the setting are.

column setting_value format a30
select setting_name, 
       setting_value,
       setting_type
from all_mining_model_settings
where model_name = 'CLAS_DT_1_6';

SETTING_NAME            SETTING_VALUE                SETTING
----------------------- ---------------------------- -------
ALGO_NAME               ALGO_DECISION_TREE           INPUT
PREP_AUTO               ON                           INPUT
TREE_TERM_MINPCT_NODE   .05                          INPUT
TREE_TERM_MINREC_SPLIT  20                           INPUT
TREE_IMPURITY_METRIC    TREE_IMPURITY_GINI           INPUT
CLAS_COST_TABLE_NAME    ODMR$15_42_50_762000JERWZYK  INPUT
TREE_TERM_MINPCT_SPLIT  .1                           INPUT
TREE_TERM_MAX_DEPTH     7                            INPUT
TREE_TERM_MINREC_NODE   10                           INPUT

No comments:

Post a Comment