Monday, April 1, 2019

Data Normalization in Oracle Data Mining

Normalization is the process of scaling continuous values down to a specific range, often between zero and one. Normalization transforms each numerical value by subtracting a number, called the shift, and dividing the result by another number called the scale. The normalization techniques include:
  • Min-Max Normalization : There is where the normalization is based on the using the minimum value for the shift and the (maximum-minimum) for the scale.
  • Scale Normalization : This is where the normalization is based on zero being used for the shift and the value calculated using max[abs(max), abs(min)] being used for the scale
  • Z-Score Normalization : This is where the normalization is based on using the mean value for the shift and the standard deviation for the scale.
When using Automatic Data Processing the normalization functions are used. But sometimes you may want to process the data is a more explicit manner. To do so you can use the various normalization function. To use these there is a three stage process. The first stage involves the creation of a table that will contain the normalization transformation data. The second stage applies the normalization procedures to your data source, defines the normalization required and inserts the required transformation data  into the table create during the first stage. The third stage involves the defining of a view that applies the normalization transformations to your data source and displays the output via a database view. The following example illustrates how you can normalize the AGE and YRS_RESIDENCE attributes. The input data source will be the view that was created as the output of the previous transformation (MINING_DATA_V_2). This is passed on the original MINING_DATA_BUILD_V data set. The final output from this transformation step and all the other data transformation steps is MINING_DATA_READY_V.

BEGIN
   -- Clean-up : Drop the previously created tables
   BEGIN
      execute immediate 'drop table TRANSFORM_NORMALIZE';
   EXCEPTION
      WHEN others THEN
         null;
   END;

   -- Stage 1 : Create the table for the transformations
   -- Perform normalization for: AGE and YRS_RESIDENCE
   dbms_data_mining_transform.CREATE_NORM_LIN (
      norm_table_name => 'MINING_DATA_NORMALIZE');       

   -- Step 2 : Insert the normalization data into the table
   dbms_data_mining_transform.INSERT_NORM_LIN_MINMAX (
      norm_table_name => 'MINING_DATA_NORMALIZE',
      data_table_name => 'MINING_DATA_V_2',
      exclude_list    => DBMS_DATA_MINING_TRANSFORM.COLUMN_LIST (
                         'affinity_card',
                         'bookkeeping_application',
                         'bulk_pack_diskettes',
                         'cust_id',
                         'flat_panel_monitor',
                         'home_theater_package',
                         'os_doc_set_kanji',
                         'printer_supplies',
                         'y_box_games'));

   -- Stage 3 : Create the view with the transformed data
   DBMS_DATA_MINING_TRANSFORM.XFORM_NORM_LIN (
      norm_table_name => 'MINING_DATA_NORMALIZE',
      data_table_name => 'MINING_DATA_V_2',
      xform_view_name => 'MINING_DATA_READY_V');

END;
/
 
The above example performs normalization based on the Minimum-Maximum values of the variables/columns. The other normalization functions are:

INSERT_NORM_LIN_SCALEInserts linear scale normalization definitions in a transformation definition table.
INSERT_NORM_LIN_ZSCOREInserts linear zscore normalization definitions in a transformation definition table.

No comments:

Post a Comment