Thursday, August 22, 2013

The 2013 Gartner Hype Cycle

The 2013 Gartner Hype Cycle is out and it can be interesting to compare the new graph with the ones from previous years. Particularly for my interests in Data Science, Big Data, Data Mining, Predictive Analytics and of course the Oracle Database.

Gartner Hype Cycles 2013

from 2012

from 2011

from 2010

from 2009

Tuesday, August 20, 2013

Schema Table Filtering for Oracle Data Miner

If you have been using Oracle Data Miner, that is part of SQL Developer 4 or SQL Developer 3, you will notice that your schema can get filled up with various tables that are created by your workflows. The following image gives an example.

image

These tables can include details of the various algorithms used and their settings, sample tables that were created using the various nodes, etc.  Basically they contain all the information that was setup by each node. Not every node in your workflow will create a table, but a lot do in particular if you have set the Cache or Sample in the Properties tab.

In most cases you do not need to be aware of or use most of these tables.

So How do I hide them, so that my schema table listing only shows me the main tables in my schema ?  By main tables, I mean the tables that you would expect to have in your schema before you started using Oracle Data Miner.

The answer to this question is to apply filters to your tables in SQL Developer. To do this go to your schema in the Connections tab. Expand to get the full list of schema objects and then right click on Tables. You should get a menu like the following.

image

Select Apply Filter from the menu and the Apply Filter window will open. Here you can create filters to apply to the tables in your schema.

To restrict Oracle Data Miner related table you will need to exclude tables that begin with, DM$ and ODMR$. The following image shows these filters.

image

When these filters are applied we only get our schema tables.

image

There are two additional filters you may want to consider. The first of these is for the tables that begin with OUTPUT. These are tables that are created when you build a node sends the outputs from running a model to a table, or some other scenario where the output is sent to a table. In reality this is bad naming and we should use a name that is more meaningful, and reflects the contents of the table. But sometimes you just want to spool the outputs to a table and the name is not important. I have an additional filter to not show these tables (see below).

With SQL Developer 4, Oracle Data Miner seems to generate IOTs, as we can see in the above image. Again another filter can be created to exclude these from the list.

Here is the full list of filters.

image

Tuesday, August 6, 2013

Depreciated ODM features in 12c

With the release of the Oracle 12c Database there has been some changes to the Advanced Analytics options. Most of the new features both in the database and in the ODM tool have been documented in previous blog posts.
But what has been removed from the Advanced Analytics Option and what is not longer supported.
The first of these is the Java API that Oracle supplied many, many years ago. They have been saying for a few years now and since the release of 11.2g that these Java APIs are no longer supported. Again the documentation states this and the demo scripts are not included in the latest SQL Developer 4.  Instead of using the Java APIs you can using the in-database SQL functions and procedures.
One of the in-database DM algorithms was the Adaptive Bayes Network (ABN). Although this was de-supported in 11.2g of the database is was still in the database. This was to give customers who were still using it time to migrate to using the other algorithms. In 12c the ABN algorithm is not in the the database.  Before you upgrade your 11.2.x Oracle database to 12c you will need to drop any ABN models that you have in your database

Thursday, July 25, 2013

12c New Data Mining functions

With the release of Oracle 12c we get new functions/procedures and some updated ones for Oracle Data Miner that is part of the Advanced Analytics option.

The following are the new functions/procedures and the functions/procedures that have been updated in 12c, with a link to the 12c Documentation that explains what they do.

  • CLUSTER_DETAILS is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns an XML string that describes the predicted cluster or a specified cluster.

  • CLUSTER_DISTANCE is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns the raw distance between each row and the centroid of either the predicted cluster or a specified.

  • CLUSTER_ID has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering.

  • CLUSTER_PROBABILITY has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.

  • CLUSTER_SET has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE

  • FEATURE_DETAILS is a new function that predicts feature matches for each row. It can use a pre-defined feature extraction model or perform dynamic feature extraction. The function returns an XML string that describes the predicted feature or a specified feature.

  • FEATURE_ID has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction.

  • FEATURE_SET has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.

  • FEATURE_VALUE has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.

  • PREDICTION has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.

  • PREDICTION_BOUNDS now returns the upper and lower bounds of the prediction as the BINARY_DOUBLE data type. It previously returned these values as the NUMBER data type.

  • PREDICTION_COST has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned cost has been changed from NUMBER to BINARY_DOUBLE.

  • PREDICTION_DETAILS has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.

  • PREDICTION_PROBABILITY has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.

  • PREDICTION_SET has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.

Tuesday, July 23, 2013

Oracle Data Miner New Features (SQL Dev 4)

With the release of the new Oracle 12c database and SQL Developer 4 we have a range of Oracle Data Miner new features . Some of these are embedded into the database and are only available in 12c. Check out my previous blog post on these new features.

In this blog post I will look at the new Oracle Data Miner features that come with the ODM tool in SQL Developer4.

The new features of the Oracle Data Miner tool can be grouped into 2 categories. The first category contains the new features that are available to all user of the tool (11.2g and 12c). The second category contains the new features that are only available in 12c. The new features of each of these categories will be explained below.

Category 1 – Common new features for 11.2g and 12c Database users

There is a new View Data feature that allows you to drill down to view the customer object and to view nested tables.

A new Graph Node that allows you to create graphs such as line, bar, scatter and boxplots for data at any stage of a workflow. You can specify any of the attributes from the data source for the graphs. You don’t seem to be limited to the number of graphs you can create.

image

A new SQL Node. This is welcome addition, as there has been many times that I’ve need to write some SQL or PL/SQL to do a specific piece of processing on the data that was not available with the other nodes. There are 2 important elements to this SQL node really. The first is that you can write SQL and PL/SQL code to do whatever processing you want to do. But you can only do it on the Data node you are connected to.

image

The second is that you can use it to call some ORE code. This allows you to use the power of R and extensive range of packages that are available to expand the analytic functionality that is available in the database. If there is some particular function that you cannot do in Oracle and it is available in R, you can now embed this function/code as an ORE object in the database. You can then called using SQL. 

WARNING: this particular feature will only work if you have ORE installed on your 11.2.0.3g or 12.1c database

New Model Build Node features, include node level text specifications for text transformations, displays the heuristic rules responsible for excluding predictor columns and being able to control the amount of classification and regression test results that are generated.  I’ll be covering these in later blog posts.

New Workflow SQL Script Deployment features. Up to now the workflow SQL script, I found to be of limited use. The development team have put a lot of work into generating a proper script that can be used by developers and DBA. But there are some limitations still. You can use the script will run the workflow automatically in the database without having the use the ODM tool. But it can only be run the in the schema that the workflow was generated. You will still have to do a lot of coding (although a lot less than you used to) to get your ODM models and workflows to run in another schema or database.

image

This will output the script to a file buried deep somewhere inside you SQL  Developer directory.  Unfortunately in the EA1 release, the size of this location field is small and scrolling has not been enabled. So you cannot (currently) scroll to the end of the field to see the actual location.  You can edit this location to have a different shorter location.

image

Maybe this will be fixed for the official release.

Category 2 – New features for 12c Database users.

Now for the new features that are only visible when you are running ODM / SQL Dev 4 against a 12c database. No configuration changes are needed. The ODM tool checks to see what version of the database you are logging into. It will then present the available features based on the version of the database.

New Predictive Query nodes allows you to build a node based on the new non-transient feature in 12c called Predictive Queries (PQs). In SQL Developer we get 3 addition types of Predictive Queries. These can be used for Anomaly Detection, Clustering and Feature Extraction

image

It is important to remember that underlying model produced by these PQs to not exist in the database after the query has executed. The model is created, used on the data and then the model deleted.

The Clustering node has the new algorithm Expectation Maximization in addition to the existing algorithms of K-Means and O-Cluster.

image

The Feature Extraction node has the new algorithm called Principal Component Analysis in addition to the existing Non-Negative Matrix Factorization algorithm.

image

Text Transformations are now built into the model build nodes. These text transformations will be part of the Automatic Data Processing steps for the model build nodes. This is illustrated in the above images.

The Generalized Linear Model that is part of the Classification Node has a Feature Selection option in the Algorithm Settings. The default setting is Ridge Regression. Now there is an additional option of using Feature Selection.

image

Prediction Result Explanations gives the scoring details used to to explain why the prediction was made.

 

Look out for blog post on each of these new features.

Friday, July 19, 2013

Oracle 12c Books

Oracle 12c is only a few days old and there are books available on Amazon. The carousels below list some of the books available on Amazon.com and Amazon.co.uk




Thursday, July 18, 2013

Upgrading your ODM Repository for SQL Dev 4

For those users of Oracle Data Miner (ODM) that is part of SQL Developer, now that Oracle have finally released SQL Developer 4, you might want to upgrade to this new release. There are a lot of new features. Some of these are available for 11.2g and 12.1c databases and some are only available for 12.1c users.

I will have another blog post soon on the new Oracle Data Miner (ODM) features that are available in SQL Developer 4.

The instructions given below are what I did to upgrade so that I could use the new ODM tool/SQL Developer 4.

Step 1 – Install SQL Developer 4 : I have another blog post on what this involves, so check it out and complete the steps before you continue with the result of the steps below.

Step 2 – Make ODM Visible : After SQL Developer 4 opens you should see all your migrated connections. To make ODM visible you need to click on the Tools menu, select Oracle Data Miner and then Make Visible. This will open a number of tabs on the left hand side of SQL Developer. These will include Data Miner (connections), Workflow Structure and Workflow Jobs.

image

Step 3 – Open an ODM Connection : Take one your ODM connections and double click on it. SQL Developer 4 / ODM will check what versions of the ODM repository exists in your database. If this is your first time connecting from SQL Developer 4, you will be told that you will need to upgrade your repository

SNAGHTML19755c5

Step 4 – Upgrade the ODM Repository : Select the Yes button on the Upgrade Repository window. You will then be asked for the SYS password. If you do not have access to this you can talk nicely to your DBA and ask them to enter the password for you.

SNAGHTML198e42e

You may or may not get a warning message like the following. Just click OK to continue.

SNAGHTML199f5cb

Step 5 – Start the Repository Upgrade : When the Migrate Data Miner Repository window opens, just click the Start button. 

SNAGHTML19b0a35

This might be a good time to go off an make yourself a coffee. The upgrade process tool approx. 8 minutes on my laptop. If you were running this on a server located somewhere then the script will take a little bit longer to run!

The progress bar will let you know how things are progressing. It also gives some messages to let you known at what stage of the process it is at.

SNAGHTML19f591f

Step 6 – All finished : When the Repository Migration has finished you will get a window with a message saying Task Successfully Complete. Click on the Close button to close this window.

SNAGHTML1a0ffe9

Step 7 – Open an Existing Workflow : Just to make sure that everything has worked with the install and ODM Repository migration, open one of your existing workflows. If it opens then everything should be OK.

When you open the workflow, the new Workflow Editor tab opens on the right hand side of SQL Developer. This seems to have replaced the Component Palette we had with the pervious version of the ODM tool. Expand the headings under the Workflow Editor to see the different nodes that are available. Most of these are the same but we have 2 new nodes under the Data section. These are Graph and SQL Query. I’ll have more on these in another post or posts.

image

Wednesday, July 17, 2013

Auto-Starting your pluggables in 12c

After installing 12c you get your container database and a pluggable. But the problem that most people have is that when they restart their server or in my case my VMs the container database gets started but the pluggable database does not automatically start. This means that you have to manually go in an start it. But this is a pain. Surely there is an easy way to get your pluggable databases to start. You would have though that Oracle would have some easy way of doing this.  If there is, I haven’t found it yet.

But I have come across how to automatically start your 12c pluggable databases, using a trigger.

CREATE or REPLACE trigger OPEN_ALL_PLUGGABLES
   after startup
   on  database
BEGIN
   execute immediate 'alter pluggable database all open';
END open_all_pdbs;

Let us test this out.  I’ve started my VirtualBox VM that has 12c installed on Windows 7. Here is the code that I ran to verify that the container has been started and the pluggable is in MOUNTED mode.

C:\Users\oracle>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Wed Jul 17 15:27:35 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing opt
ions

SQL> select name,DB_UNIQUE_NAME from v$database;

NAME      DB_UNIQUE_NAME
--------- ------------------------------
ORCL      orcl

SQL> SELECT v.name, v.open_mode, NVL(v.restricted, 'n/a') "RESTRICTED", d.status

  2  FROM v$pdbs v, dba_pdbs d
  3  WHERE v.guid = d.guid
  4  ORDER BY v.create_scn;

NAME                           OPEN_MODE  RES STATUS
------------------------------ ---------- --- -------------
PDB$SEED                       READ ONLY  NO  NORMAL
PDB12C                         MOUNTED    n/a NORMAL

SQL>

Next we will create the procedure (given above).

image

To test the automatic starting of the pluggables, we need to shut down the container database, by issuing the shutdown command.

SQL> shutdown
Database closed.
Database dismounted.
ORACLE instance shut down.


SQL> select name,DB_UNIQUE_NAME from v$database;
select name,DB_UNIQUE_NAME from v$database
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0

This shows us that the container database is shutdown.

Now we can start the container and test to see if the pluggable database is started automatically by the trigger.

SQL> startup
ORACLE instance started.

Total System Global Area  855982080 bytes
Fixed Size                  2408408 bytes
Variable Size             562036776 bytes
Database Buffers          285212672 bytes
Redo Buffers                6324224 bytes
Database mounted.
Database opened.
SQL>

SQL> select name,DB_UNIQUE_NAME from v$database;

NAME      DB_UNIQUE_NAME
--------- ------------------------------
ORCL      orcl

SQL> select status from v$instance;

STATUS
------------
OPEN

SQL> SELECT v.name, v.open_mode, NVL(v.restricted, 'n/a') "RESTRICTED", d.status

  2  FROM v$pdbs v, dba_pdbs d
  3  WHERE v.guid = d.guid
  4  ORDER BY v.create_scn;

NAME                           OPEN_MODE  RES STATUS
------------------------------ ---------- --- -------------
PDB$SEED                       READ ONLY  NO  NORMAL
PDB12C                         READ WRITE NO  NORMAL

SQL>

We can see that the pluggable was started.

Tuesday, July 16, 2013

Installing & Setting up SQL Developer 4

The EA1 (early adopter) release of SQL Developer is now available. The main reason that I’m interested in this tools is that it has the upgraded Oracle Data Mining workflow tool. I’ve been using SQL Developer for a long, long time.  I was lucky enough to see a demo of it before it was ever released, back ……(well a long, long time ago) when Barry McGillin gave a demo of what they called Project Raptor, to a small group of (12) Oracle users in the Oracle East Point office, Dublin, Ireland. Barry was one of a couple of developers who were developing Project Raptor.
The EA1 release of SQL Developer 4 comes without the JDK install. For SQL Developer 4 you will need to install JDK 1.7.  There is a link from the SQL Developer 4 download page.
image
After installing JDK 1.7 or maybe you have it installed already, you are ready to setup SQL Developer 4. The following instructions are for installing SQL Developer 4 on Windows.
After downloading it from the download page, all you have to do is to unzip the download. There is no install program. You are almost ready to start using SQL Developer.
There are 2 types of setup for SQL Developer. The first is where you have not used SQL Developer before. Point 1 below shows what is involved with this scenario.  Point 2 below shows what is involved if you have used previous releases of SQL Developer.
0.   Common steps to installing and setting up SQL Developer
  • Unzip the SQL Developer 4 download file to a location where you want the software to be located.
  • Go down the directories to where the sqldeveloper.exe is located.
  • Create a shortcut on your desktop for this file.
  • Double click on the shortcut on your desktop
  • Enter the location where JDK 1.7 was installed
    • C:\Program Files\Java\jdk1.7.0_25
  • SQL Developer will start
image
1.   Scenario: Env. that has not used SQL Dev before
image
  • Double click on the connection to open the SQL Worksheet
  • Finally enjoy 12c Smile
2. Scenario: Previous releases of SQL Developer exist
  • When asked about importing preferences from your previous SQL Developer installation, say Yes. This will take the connections from the most recent version of SQL Developer that you have installed. If you want to change this click on the button and select the version from the list
  • The install will progress updating everything and pull in your connects.
  • When finished SQL Developer 4 will open
  • But before you get going you should test that your connections work. An easy way of doing this is to use the pingall command. Open a SQL worksheet, connect to one of your schemas (this will test that your connection works), type pingall and press F5. This will test all of your connections and tell you which ones are currently working and which connections are not (you will see a –1ms).
  • You can now enjoy SQL Developer 4.
During the install of SQL Developer 4 I had an error. After inserting the directory for Java, the progress bar of the loading window got to about 1cm, displaying Registering Extensions above it, and then the loading window closed. SQL Dev 4 did not open.  After various attempts at investigating the problem, it looks like the directory created in AppData (Windows 7) was corrupted in some way. The solution to this problem is to rename or remove the directory.
\AppData\Roaming\SQL Developer\system4.0.0.12.27
When you have renamed or removed this directory, try starting SQL Dev 4 again. Everything should work now. Well it did for me.
Many thanks to Turloch in Oracle for his help.

Monday, July 15, 2013

Installing Oracle 12c on Windows 7 64bit

Here are the steps I when through to install Oracle 12.1c on Windows 7 64 bit.

  • Unzip the two 12c downloads files into the same directory. I called this directory database

image

  • Go down a couple of levels in the database directory until you come to the directory that contains setup.exe. Double click on this to start the installer.
  • Step 1 – Configure Security Updates:  Un-tick the tick-box and click the Next button. A warning message will appear. You can click on the Yes button to proceed.
  • Step 2 – Software Update : select the Skip Software Updates option and then click the Next button.
  • Step 3 – Installation Option : select the Create and Configure a Database option and then click the Next button.
  • Step 4 – System Class: Select the Server Class option and then click the Next button
  • Step 5 – Grid Installation Options : Select the Single Instance Database Installation option and then click the next button.
  • Step 6 – Install Types : Select the Typical install option and then click the Next button.
  • Step 7 - Installation Location : Select the Use Windows Built-in Account option and then click the Next button. An warning message appears. Click the Yes button.
  • Step 8 – Typical Installation.  Set Global Database Name to cdb12c for the container database name. Set the Administrative password for the container database. Set the name of the pluggable database that will be created. Set this to pdb12c. Or you can accept the default names. Then click the Next button. If you get a warning message saying the password does not conform to the recommended standards, you can click the Yes button to ignore this warning and proceed.
  • Step 9 – Prerequisite Checks : the install will check to see that you have enough space and necessary permissions etc.
  • Step 10 – Summary – You should now be ready to start the install. Click the Install button.

You can now sit back, relax and watch the installation of 12.1c complete.

You may get some Windows Security Alert windows pop up. Just click on the Allow Access button.

image

Then the Database Configuration Assistant will start. This step might take a while to complete.

image

When everything is done you will get something like the following

image

Now you are almost ready to start using your Pluggable 12c database on windows. The final two steps that you need to do is to add an entry to your tnsnames.ora file. You can manually do this if you know what you are doing or you can select Net Configuration Assistant under the Oracle –Ora12cDB Home 1 section of the windows menu. The second thing you need to do is to create a new user/schema.

Check out my previous blog post called ‘My first steps with 12c’ for how to do these last two steps. The ‘My fist steps with 12c’ post was based on installing 12c on Linux 6.

Friday, July 12, 2013

Oracle 12c Advanced Analytics Option new features

With the release of Oracle 12c (finally) now have a lot of learning to do. Oracle 12c is a different beast to what we have been used to up to now.

As part of the 12c there are a number of new in-database Advanced Analytics features. These are separate to the Advanced Analytics new features that come as part of the Oracle Data Miner tool, that is part of SQL Developer.

This post will only look at the new features that are part of the 12c Database. The new in-Database Advanced Analytics features include:

  • Using Decisions Trees for Text analysis is now possible. Up to now (11.2g) when you wanted to do text classification you had to exclude Decision Trees from the process. This was because the Decision Trees algorithm could not support nested data.
  • Additionally for text mining some of the text processing has been moved from having a separate step, to being part of the some of the algorithms.
  • A number of additional features are available for Clustering. These include a cluster distance (from the centroid) and details functions.
  • There is a new clustering algorithm (in addition to the K-Means and O-Cluster algorithms), called Expectation Maximization algorithm. This creates a density model that can be give better results when data from different domains are combined for clustering. This algorithm will also determine the optimal number of clusters.
  • There are two new Feature Extraction methods that are scalable for high dimensional data, large number of records, for both structured and unstructured. This can be used to reduce the number of dimensions to use as input to the data mining algorithms. The first of these is called Singular Value Decomposition (SVD) and is widely used in text mining. The second method can be considered a special scoring method of SVD is called Principal Component Analysis (PCA). With this method it produces projections that are scaled with the data variance.
  • A new feature of the GLM algorithm is that it will perform a feature section step. This is used to reduce the number of predictors used by the algorithm and allow for faster builds. This will makes the outputs more understandable and model more transparent. This feature is not default so you will need to set this on if you want to use it with the GLM algorithm.
  • In previous versions of the database, there could be some performance issues that relate to the data types used. In 12c these has been addressed for BINARY_DOUBLE and BINARY_FLOAT. So if you are using these data types you should now see faster scoring of the data in 12c
  • There is new in-database feature called Predictive Queries. This allows on-the-fly models that are temporary models that are formed as part of an analytics clause. These models cannot be tuned and you cannot see the details of the model produced. They are formed for the query and do not exist afterwards.
  • SELECT cust_id, age, pred_age, age-pred_age age_diff, pred_det FROM
    (SELECT cust_id, age, pred_age, pred_det,
    RANK() OVER (ORDER BY ABS(age-pred_age) DESC) rnk FROM
    (SELECT cust_id, age,
    PREDICTION(FOR age USING *) OVER () pred_age,
    PREDICTION_DETAILS(FOR age ABS USING *) OVER () pred_det
    FROM mining_data_apply_v))
    WHERE rnk <= 5;


  • There is a new function called PREDICTION_DETAILS. This allows you to see what the algorithm used to make the prediction. For example if we want to score a customer to see if they will churn, we can use the PREDICTION and PREDITION_PROBABILITY functions to do this and to see how how strong this prediction is. With PREDICTION_DETAILS we can now see what attributes and values the algorithm used to make that particular prediction. The output is in XML format.


  • image



These are the new in-database Advanced Analytics (Data Mining) features. Apart from the new algorithms or changes to them, most of the other changes gives greater transparency into what the algorithms/models are doing. This is good as it allows us to better understand and see what is happening.



The rest of the new Advanced Analytics Option new features will be part of Oracle Data Miner tool in SQL Developer 4. My next blog post will cover the new features in SQL Developer 4.



I haven’t mentioned anything about ORE. The reason for that is that it comes as a separate install and its current version 1.3 works the same in 11.2.0.3g as well as 12c. I’ve had some previous blog posts on this and you can check out the ORE website on OTN.

DBMS_PREDICTIVE_ANALYTICS & Profile

In this blog post I will look at the PROFILE procedure that is part of the DBMS_PREDICTIVE_ANALYTICS package. The PROFILE procedure generates rules that identify the records that have the same target value.

Like the EXPLAIN procedure, the PROFILE procedure only works with classification type of problems. What the PROFILE procedure does is it works out some rules that determine a particular target value. For example, what rules determine if a customer will take up an affinity card and the rules for those who do not take up an affinity card. So you will need a pre-labelled data set with the value of the target attribute already determined.

Oracle does not tell us what algorithm that they use to calculate these rules, but they are similar to the rules that are produced by some of the classification algorithms that are in the database (and can be used by ODM).

The syntax of the PROFILE procedure is

DBMS_PREDICTIVE_ANALYTICS.PROFILE (
     data_table_name           IN VARCHAR2,
     target_column_name        IN VARCHAR2,
     result_table_name         IN VARCHAR2,
     data_schema_name          IN VARCHAR2 DEFAULT NULL);

Where

Parameter Name Description
data_table_name Name of the table that contains the data that you want to analyze.
target_column_name The name of the target attribute.
result_table_name The name of the table that will contain the results. This table should not exist in your schema, otherwise an error will occur
data_schema_name The name of the schema where the table containing the input data is located. This is probably in your current schema, so you can leave this parameter NULL.

The PROFILE procedure will produce an output table called ‘result_table_name) in your schema and this table will contain 3 attributes.

PROFILE_ID This is the PK/unique identifier for the profile/rule
RECORD_COUNT This is the number of records that are described by the profile/rule
DESCRIPTION This is the profile rule and it is in XML format and has the following XSD

<xs:element name="SimpleRule">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="PREDICATE"/>
      <xs:element ref="ScoreDistribution" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="id" type="xs:string" use="optional"/>
    <xs:attribute name="score" type="xs:string" use="required"/>
    <xs:attribute name="recordCount" type="NUMBER" use="optional"/>
  </xs:complexType>
</xs:element>

Using the examples I have used in my previous blog posts, the following illustrates how to use the PROFILE procedure.

BEGIN
   DBMS_PREDICTIVE_ANALYTICS.PROFILE(
      DATA_TABLE_NAME    => 'mining_data_build_V',
      TARGET_COLUMN_NAME => 'affinity_card',
      RESULT_TABLE_NAME  => 'PA_PROFILE');
END;

image

image

 

NOTE: For the above examples I used and 11.2.0.3 database.