Brendan Tierney - Oralytics Blog: 12c

Showing posts with label 12c. Show all posts

Monday, August 7, 2017

Auto enabling APPROX_* function in the Oracle Database

With the releases of 12.1 and 12.2 of Oracle Database we have seen some new functions that perform approximate calculations. These include:

APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT_DETAIL
APPROX_COUNT_DISTINCT_AGG
APPROX_MEDIAN
APPROX_PERCENTILE
APPROX_PERCENTILE_DETAIL
APPROX_PERCENTILE_AGG

These functions can be used when approximate answers can be used instead of the exact answer. Yes can have many scenarios for these and particularly as we move into the big data world, the ability to process our data quickly is slightly more important and exact numbers. For example, is there really a difference between 40% of our customers being of type X versus 41%. The real answer to this is, 'It Depends!', but for a lot of analytical and advanced analytical methods this difference doesn't really make a difference.

There are various reports of performance improvement of anything from 6x to 50x with the response times of the queries that are using these functions, instead of using the more traditional functions.

If you are a BI or big data analyst and you have build lots of code and queries using the more traditional functions. But what if you now want to use the newer functions. Does this mean you have go and modify all the code you have written over the years? you can imagine getting approval to do this!

The simple answer to this question is 'No'. No you don't have to change any code, but with some parameter changes for the DB or your session you can tell the database to automatically switch from using the traditional functions (count, etc) to the newer more optimised and significantly faster APPROX_* functions.

So how can you do this magic?

First let us see what the current settings values are:

SELECT name, value 
FROM   v$ses_optimizer_env 
WHERE  sid = sys_context('USERENV','SID') 
AND    name like '%approx%';

NewImage

Now let us run a query to test what happens using the default settings (on a table I have with 10,500 records).

set auto trace on

select count(distinct cust_id) from test_inmemory;

COUNT(DISTINCTCUST_ID)
----------------------
		  1500


Execution Plan
----------------------------------------------------------
Plan hash value: 2131129625

--------------------------------------------------------------------------------------
| Id  | Operation	     | Name	     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |		     |	   1 |	  13 |	  70   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE      |		     |	   1 |	  13 |		  |	     |
|   2 |   VIEW		     | VW_DAG_0      |	1500 | 19500 |	  70   (2)| 00:00:01 |
|   3 |    HASH GROUP BY     |		     |	1500 |	7500 |	  70   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| TEST_INMEMORY | 10500 | 52500 |	  69   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Let us now set the automatic usage of the APPROX_* function.

alter session set approx_for_aggregation = TRUE;

SQL> select count(distinct cust_id) from test_inmemory;

COUNT(DISTINCTCUST_ID)
----------------------
		  1495


Execution Plan
----------------------------------------------------------
Plan hash value: 1029766195

---------------------------------------------------------------------------------------
| Id  | Operation	      | Name	      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      | 	      |     1 |     5 |    69	(0)| 00:00:01 |
|   1 |  SORT AGGREGATE APPROX| 	      |     1 |     5 | 	   |	      |
|   2 |   TABLE ACCESS FULL   | TEST_INMEMORY | 10500 | 52500 |    69	(0)| 00:00:01 |
---------------------------------------------------------------------------------------

We can see above that the APPROX_* equivalent function was used, and slightly less work. But we only used this on a very small table.

The full list of session level settings is:

alter session set approx_for_aggregation = TRUE;
alter session set approx_for_aggregation = FALSE;

alter session set approx_for_count_distinct = TRUE;
alter session set approx_for_count_distinct = FALSE;

alter session set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter session set approx_for_percentile = PERCENTILE_DISC;
alter session set approx_for_percentile = NONE;

Or at a system wide level:

alter system set approx_for_aggregation = TRUE;
alter system set approx_for_aggregation = FALSE;

alter system set approx_for_count_distinct = TRUE;
alter system set approx_for_count_distinct = FALSE;

alter system set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter system set approx_for_percentile = PERCENTILE_DISC;
alter system set approx_for_percentile = NONE;

And to reset back to the default settings:

alter system reset approx_for_aggregation;
alter system reset approx_for_count_distinct;
alter system reset approx_for_percentile;

Friday, April 21, 2017

Setting up Oracle Database on Docker

A couple of days ago it was announced that several Oracle images were available on the Docker Store.

This is by far the easiest Oracle Database install I have every done !

You simply have no excuse now for not installing and using an Oracle Database. Just go and do it now!

The following steps outlines what I did you get an Oracle 12.1c Database.

1. Download and Install Docker

There isn't much to say here. Just go to the Docker website, select the version docker for your OS, and just install it.

You will probably need to create an account with Docker.

After Docker is installed it will automatically start and and will be placed in your system tray etc so that it will automatically start each time you restart your laptop/PC.

2. Adjust the memory allocation

From the system tray open the Docker application. In the Advanced section allocate a bit more memory. This will just make things run a bit smoother. Be a bit careful on how much to allocate.

In the General section check the tick-box for automatically backing up Docker VMs. This is assuming you have back-ups setup, for example with Time Machine or something similar.

3. Download & Edit the Oracle Docker environment File

On the Oracle Database download Docker webpage, click on the the Get Content button.

NewImage

You will have to enter some details like your name, company, job title and phone number, then click on the check-box, before clicking on the Get Content button. All of this is necessary for the Oracle License agreement.

The next screen lists the Docker Services and Partner Services that you have signed up for.

NewImage

Click on the Setup button to go to the webpage that contains some of the setup instructions.

NewImage

The first thing you need to do is to copy the sample Environment File. Create a new file on your laptop/desktop and paste the environment file contents into the file. There are a few edits you need to make to this file. The following is the edited/modified Environment file that I created and used. The changes are for DB_SID, DB_PASSWD and DB_DOMAIN.

####################################################################
## Copyright(c) Oracle Corporation 1998,2016. All rights reserved.##
##                                                                ##
##                   Docker OL7 db12c dat file                    ##
##                                                                ##
####################################################################

##------------------------------------------------------------------
## Specify the basic DB parameters
##------------------------------------------------------------------

## db sid (name)
## default : ORCL
## cannot be longer than 8 characters

DB_SID=ORCL

## db passwd
## default : Oracle

DB_PASSWD=oracle

## db domain
## default : localdomain

DB_DOMAIN=localdomain

## db bundle
## default : basic
## valid : basic / high / extreme
## (high and extreme are only available for enterprise edition)

DB_BUNDLE=basic

## end

I called this file 'docker_ora_db.txt'

4. Download and Configure Oracle Database for Docker

The following command will download and configure the docker image

$ docker run -d --env-file ./docker_ora_db.txt -p 1527:1521 -p 5507:5500 -it --name dockerDB121 --shm-size="8g" store/oracle/database-enterprise:12.1.0.2

This command will create a container called 'dockerDB121'. The 121 at the end indicate the version number of the Oracle Database. If you end up with a number of containers containing different versions of the Oracle Database then you need some way of distinguishing them.

Take note of the port mapping in the above command, as you will need this information later.

When you run this command, the docker image will be downloaded from the docker website, will be unzipped and the container setup and ready to run.

NewImage

5. Log-in and Finish the configuration

Although the docker container has been setup, there is still a database configuration to complete. The following images shows that the new containers is there.

NewImage

To complete the Database setup, you will need to log into the Docker container.

docker exec -it dockerDB121 /bin/bash

Then run the Oracle Database setup and startup script (as the root user).

/bin/bash /home/oracle/setup/dockerInit.sh

This script can take a few minutes to run. On my laptop it took about 2 minutes.

When this is finished the terminal session will open as this script goes into a look.

To run any other commands in the container you will need to open another terminal session and connect to the Docker container. So go open one now.

6. Log into the Database in Docker

In a new terminal window, connect to the Docker container and then switch to the oracle user.

su - oracle

Check that the Oracle Database processes are running (ps -ef) and then connect as SYSDBA.

sqlplus / as sysdba

Let's check out the Database.

SQL> select name,DB_UNIQUE_NAME from v$database;

NAME	  DB_UNIQUE_NAME
--------- ------------------------------
ORCL	  ORCL


SQL> SELECT v.name, v.open_mode, NVL(v.restricted, 'n/a') "RESTRICTED", d.status
     FROM v$pdbs v, dba_pdbs d
     WHERE v.guid = d.guid
     ORDER BY v.create_scn;


NAME			       OPEN_MODE  RES STATUS
------------------------------ ---------- --- ---------
PDB$SEED		       READ ONLY  NO  NORMAL
PDB1			       READ WRITE NO  NORMAL

And the tnsnames.ora file contains the following:

ORCL =   (DESCRIPTION = 
    (ADDRESS = (PROTOCOL = TCP)(HOST = 0.0.0.0)(PORT = 1521))
     (CONNECT_DATA = 
      (SERVER = DEDICATED)
       (SERVICE_NAME = ORCL.localdomain)     )   )

PDB1 =   (DESCRIPTION =
     (ADDRESS = (PROTOCOL = TCP)(HOST = 0.0.0.0)(PORT = 1521))
      (CONNECT_DATA =
       (SERVER = DEDICATED)
       (SERVICE_NAME = PDB1.localdomain)     )   )

You are now up an running with an Docker container running an Oracle 12.1 Databases.

7. Configure SQL Developer (on Client) to

access the Oracle Database on Docker

You can not use your client tools to connect to the Oracle Database in a Docker Container. Here is a connection setup in SQL Developer.

NewImage

Remember that port number mapping I mentioned in step 4 above. See in this SQL Developer connection that the port number is 1527.

Thats it. How easy is that. You now have a fully configured Oracle 12.1c Enterprise Edition Database to play with, to have fun and to explore all the wonderful features of the Oracle Database.

Monday, May 30, 2016

PREDICTION_DETAILS function in Oracle

When building predictive models the data scientist can spend a large amount of time examining the models produced and how they work and perform on their hold out sample data sets. They do this to understand is the model gives a good general representation of the data and can identify/predict many different scenarios. When the "best" model has been selected then this is typically deployed is some sort of reporting environment, where a list is produced. This is typical deployment method but is far from being ideal. A more ideal deployment method is that the predictive models are build into the everyday applications that the company uses. For example, it is build into the call centre application, so that the staff have live and real-time feedback and predictions as they are talking to the customer.

But what kind of live and real-time feedback and predictions are possible. Again if we look at what is traditionally done in these applications they will get a predicted outcome (will they be a good customer or a bad customer) or some indication of their value (maybe lifetime value, possible claim payout value) etc.

But can we get anymore information? Information like what was reason for the prediction. This is sometimes called prediction insight. Can we get some details of what the prediction model used to decide on the predicted value. In more predictive analytics products this is not possible, as all you are told is the final out come.

What would be useful is to know some of the thinking that the predictive model used to make its thinking. The reasons when one customer may be a "bad customer" might be different to that of another customer. Knowing this kind of information can be very useful to the staff who are dealing with the customers. For those who design the workflows etc can then build more advanced workflows to support the staff when dealing with the customers.

Oracle as a unique feature that allows us to see some of the details that the prediction model used to make the prediction. This functions (based on using the Oracle Advanced Analytics option and Oracle Data Mining to build your predictive model) is called PREDICTION_DETAILS.

When you go to use PREDICTION_DETAILS you need to be careful as it will work differently in the 11.2g and 12c versions of the Oracle Database (Enterprise Editions). In Oracle Database 11.2g the PREDICTION_DETAILS function would only work for Decision Tree models. But in 12c (and above) it has been opened to include details for models created using all the classification algorithms, all the regression algorithms and also for anomaly detection.

The following gives an example of using the PREDICTION_DETAILS function.

select cust_id, 
       prediction(clas_svm_1_27 using *) pred_value,
       prediction_probability(clas_svm_1_27 using *) pred_prob,
       prediction_details(clas_svm_1_27 using *) pred_details
from mining_data_apply_v;

The PREDICTION_DETAILS function produces its output in XML, and this consists of the attributes used and their values that determined why a record had the predicted value. The following gives some examples of the XML produced for some of the records.

NewImage

I've used this particular function in lots of my projects and particularly when building the applications for a particular business unit. Oracle too has build this functionality into many of their applications. The images below are from the HCM application where you can examine the details why an employee may or may not leave/churn. You can when perform real-time what-if analysis by changing some of attribute values to see if the predicted out come changes.

NewImage

Monday, July 13, 2015

V506 of Oracle OBIEE SampleApp Virtual Machine

A few days ago Oracle released the latest version of the Virtual Machine for OBIEE SampleApp. The current version has a number of new features and new product versions (see below).

To get this latest version go to the following link to download the VM files and to install. As always this is a beast of a VM and you should only consider the install and setup if you have the space and in particular you have 16G RAM.

Oracle Business Intelligence Enterprise Edition Samples on OTN.

NewImage

v506 New Features

DB 12.1.0.2 with the In-Memory option
Load and process JSON data
Integrates with Big Data SQL
Connects to Impala
Session tracking in UT
Exalytics Aggregation Functions
Lots of new Visualizations
Custom Style Features
Hierarchical Session Variables
etc.

Software on V506

Oracle Enterprise Linux 6.5 x64
OBIEE 11.1.1.9GA two distinct OBIEE instances, Essbase 11.1.2.4, updated BIMAD
Oracle MapViewer11.1.1.9.1
Oracle BICS Data Sync v1
Oracle Database 12c IMDB 12.1.0.2, PDB Install, AWM 12.1.0.2a, APEX 4.2.6 & ORDS 2.0.1, ODM, Oracle Spatial and Graph
ORE 1.4.1 & R 3.1.1
ENDECA 3.1, Server 7.6.1, Studio 3.1, Provisioning Services
Cloudera CDH 5.1.2, Oracle BigData SQL, Oracle BigData Connectors
Plug and Play Companions : EPM 11.1.2.3, BIApps Demos
Utils: Start scripts, MapBuilder, SQLDev 4.1

Friday, May 8, 2015

Loading JSON data into Oracle using ROracle and jsonlite

In this post I want to show you one way of taking a JSON file of data and loading it into your Oracle schema using ROracle. The JSON data will then be used to create a table in your schema. Yes you could use other methods to connect to the database and to create the table. But ROracle is by far the fastest method of connecting, selecting and processing data.

1. Necessary R Packages

You will need two R library. The first of these is the ROracle package. This gives us all the connection and data processing commands to work with the Oracle database. The second package is the jsonlite R package. This package allows us to open, read and process a file that has JSON data.

> install.package("ROracle")

> install.package("jsonlite")

After you have installed the packages you can now load them into your R environment so that you can use them in your current session.

> library(ROracle)

> library(jsonlite)

Depending on your version of R you may get some working messages about the libraries being built under a different version of R. Then again maybe you won't get these :-)

2. Open & Read the JSON file in R
Now you are ready to name and open the file that contains your JSON data. In my case the file is called 'demo_json_data.json'

> jsonFile <- "c:/app/demo_json_data.json"

> jsonData <- fromJSON(jsonFile)

We now have the JSON data loaded into R. We can now look at the attributes of each JSON record and the number of records that was in the JSON file.

> names(jsonData$items)

[1] "cust_id" "cust_gender" "age"

[4] "cust_marital_status" "country_name" "cust_income_level"

[7] "education" "occupation" "household_size"

[10] "yrs_residence" "affinity_card" "bulk_pack_diskettes"

[13] "flat_panel_monitor" "home_theater_package" "bookkeeping_application"

[16] "printer_supplies" "y_box_games" "os_doc_set_kanji"

> nrow(jsonData$items)

[1] 1500

As you can see the records are grouped under a higher label of 'items'. You might want to extract these records into a new data frame.

> data <- jsonData$items

Now we have our data ready in a data frame and we can use this data frame to create a table and insert the data.

3. Create the connection to the Oracle Schema

I have a previous post on connecting to an Oracle Schema using ROracle. That was connecting to an 11g Oracle Database.

JSON is a new feature in Oracle 12c and the connection details are a little bit different because we are now having to deal with connection to a pluggable database. The following illustrates connecting to a 12c database and assumes you have Oracle Client already installed and configured with your tnsnames.ora entry.

# Create the connection string

> host <- "localhost"

> port <- 1521

> service <- "pdb12c"

> connect.string <- paste(

"(DESCRIPTION=",

"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",

"(CONNECT_DATA=(SERVICE_NAME=", service, ")))", sep = "")

> con2 <- dbConnect(drv, username = "dmuser", password = "dmuser",dbname=connect.string)

> 4. Create the table in your Oracle Schema

At this point we have our connection to our Oracle Schema setup and connected, we have read in the JSON file and we have the JSON data in a data frame. We are now ready to push the JSON data to a table in our schema.

> dbWriteTable(con, "JSON_DATA", overwrite=TRUE, value=data)

Job done :-)

The table JSON_DATA has been created and the data is stored in the table in typical table attributes and rows format.

One thing to watch our for with the above command is with the overwrite=TRUE parameter setting. This replaces a table if it already exists. So your old data will be gone. Be careful.

5. View and Query the data using SQL

When you now log into your schema in the 12c Database, you can now query the data in the JSON_DATA table. (Yes I know it is not in JSON format in this table).

How did I get/generate my JSON data?

I generated the JSON file using a table that I already had in one of my schemas. This table is part of the sample data set that is built on top of the Oracle sample schemas.

The image below shows the steps involved in generating the data in JSON format. I used SQL Developer and set the SQLFORMAT to be JSON. I then ran the query to select the data. You will need to run this as a script. Then copy the JSON data and paste it into a file.

NewImage

The SQL FORMAT command sets the output format for a query back to the default query output format that we are well use to.

A nice little JSON viewer can be found at http://jsonviewer.stack.hu/

Copy and paste your JSON data into this and you can view the structure of the data. Check it out.

Monday, May 4, 2015

Oracle Data Miner (ODM 4.1) New Features

With the release of SQL Developer 4.1 we also get a number of new features with Oracle Data Miner (ODMr). These include:

Data Source node can now include data sources that contain JSON data, generating JSON schema and has a JSON viewer
Create Table can now create data in JSON
JSON Query Node allows you to view, query and process JSON data, combine it with relational data, generate sub-group by, and nested columns to be part of input to algorithms
New PL/SQL APIs for managing Data Miner projects and workflows. This includes run, cancel, rename, delete, import and export of workflows using PL/SQL.
New ODMr Repository views that allows us to query and monitor our workflows.
Transformation Node now allows you different ways of handling NULLS.
Transformation Node now allows us to create Custom Bins, define bin labels and bin values
Overall Workflow and ODMr environment improvements to allow for greater efficiency in workflow behaviour and interactions with the database. So using ODMr should feel quicker and more responsive.

What out for the Gotchas: Although support for JSON has been added to ODMr, as outlined above, you are still a bit limited to what else you can do with your JSON data. Based on the documentation you can use JSON data in the Association and Classification build nodes.

I'm not sure about the other nodes and this will need a bit of investigation to see what nodes can and cannot use JSON data. I'm sure this will all be sorted out in the next release.

Keep an eye out for some blog posts over the coming weeks on how to explore and use these new features of Oracle Data Miner.

Thursday, April 30, 2015

Viewing Models Details for Decision Trees using SQL

When you are working with and developing Decision Trees by far the easiest way to visualise these is by using the Oracle Data Miner (ODMr) tool that is part of SQL Developer.
Developing your Decision Tree models using the ODMr allows you to explore the decision tree produced, to drill in on each of the nodes of the tree and to see all the statistics etc that relate to each node and branch of the tree.
But when you are working with the DBMS_DATA_MINING PL/SQL package and with the SQL commands for Oracle Data Mining you don't have the same luxury of the graphical tool that we have in ODMr. For example here is an image of part of a Decision Tree I have and was developed using ODMr.
Blog dt 1

What if we are not using the ODMr tool? In that case you will be using SQL and PL/SQL. When using these you do not have luxury of viewing the Decision Tree.
So what can you see of the Decision Tree? Most of the model details can be used by a variety of functions that can apply the model to your data. I've covered many of these over the years on this blog.
For most of the data mining algorithms there is a PL/SQL function available in the DBMS_DATA_MINING package that allows you to see inside the models to find out the settings, rules, etc. Most of these packages have a name something like GET_MODEL_DETAILS_XXXX, where XXXX is the name of the algorithm. For example GET_MODEL_DETAILS_NB will get the details of a Naive Bayes model. But when you look through the list there doesn't seem to be one for Decision Trees.
Actually there is and it is called GET_MODEL_DETAILS_XML. This function takes one parameter, the name of the Decision Tree model and produces an XML formatted output that contains the attributes used by the model, the overall model settings, then for each node and branch the attributes and the values used and the other statistical measures required for each node/branch.
The following SQL uses this PL/SQL function to get the Decision Tree details for model called CLAS_DT_1_59.
SELECT dbms_data_mining.get_model_details_xml('CLAS_DT_1_59')
FROM dual;

If you are using SQL Developer you will need to double click on the output column and click on the pencil icon to view the full listing.
Blog dt 2

Nothing too fancy like what we get in ODMr, but it is something that we can work with.
If you examine the XML output you will see references to PMML. This refers to the Predictive Model Markup Language (PMML) and this is defined by the Data Mining Group (www.dmg.org). I will discuss the PMML in another blog post and how you can use it with Oracle Data Mining.

Friday, April 24, 2015

Changing REVERSE Transformations in Oracle Data Miner

In my previous blog post I showed you how you can have a look at the transformations that the Automatic Data Preparation (ADP) feature of Oracle Data Mining produces. I also gave some example of the different types of ADF that are performed for different algorithms.

One of the features of the transformations produced is that it will generate a REVERSE_EXPRESSION. This will take the scored results and apply the inverse of the transformation that was performed when the data was being prepared for input to the algorithm.

Somethings you may want to have the scored data returned in a slightly different ways or labeled in a slightly different way.

In this blog post I will show you how to define an alternative REVERSE_EXPRESSION for an attribute.

The function we need to use for this is the ALTER_REVERSE_EXPRESSION procedure that is part of the DBMS_DATA_MINING package.

When we score data for a typical classification problem we typically use 0 (zero) and 1 to be the target variable values. But what if we wanted the output from our classification model to label the scored data slighted differently.

In this case we can use the ALTER_REVERSE_EXPRESSION procedure to define the new values. What if we wanted the zero to be labeled as NO and the 1 as YES. In this case we can use the following.

BEGIN

dbms_data_mining.alter_reverse_expression(

model_name => 'CLAS_NB_1_59',

expression => 'decode(affinity_card, ''1'', ''YES'', ''NO'')',

attribute_name => 'AFFINITY_CARD');

END;

When we view the transformations for our data mining model we can now see the transformation.

Now when we score our data the predicted target variable will now have our newly defined values.

SELECT cust_id,

PREDICTION(CLAS_NB_1_59 USING *) PRED

FROM mining_data_apply_v

FETHC FIRST 5 ROWS ONLY;

Blog dat trans 4

You can see that this is a very powerful feature and allows use to turn the scored data values is a different way to make them more useful. This is particularly the case as we work towards a more Automatic type of Predictive Analytics.

Saturday, April 18, 2015

ODM : View Transformations generated by Automatic Data Prepreparation

A very powerful feature of Oracle Data Mining and one that I think does not get enough notice is called Automatic Data Preparation.

Data Preparation is one of the most time consuming, repetitive and boring parts of the work that a Data Miner or Data Scientist performs as part of their daily tasks. Apart from gathering the data, integrating the data, getting the data into the required formation the most interesting part of the work is with feature engineering.

Then you have all the other boring data preparation tasks of how to handle missing data, type conversion, binning, normalization, outlier treatment etc.

With Automatic Data Preparation (ADP) in Oracle Data Mining you can let Oracle work all of these things out for you and to perform all the necessary coding and to store all of this coding as part of the in-database data mining model.

This is Fantastic. This ADP feature can same you hours and in some cases days of effort.

But (there is always a but :-) ) what if you are a bit unsure if the transformations that are being performed are exactly what you would wanted. Maybe you would like to see what Oracle is doing and depending on this you can do it a different way.

The first step is to examine the transformations that are generated by stored as part of the in-database data mining model. The DBMS_DATA_MINING package has a function called GET_MODEL_TRANSFORMATIONS. When you query this function, passing in the name of the data mining model, you will get returned the list of transformations that have been applied to each model.

In the following example a GLM model was created using the Oracle Data Miner tool (that is part of SQL Developer). When you use Oracle Data Miner, ADP is automatically turned on.

The following query calls the GET_MODEL_TRANSFORMATIONS function with the data mining model called CLAS_GLM_1_59/.

SELECT * FROM TABLE(DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS('CLAS_GLM_1_59'));

The following image contains the output generated by this query.

Blog dat trans 1

When you look at the data under the EXPRESSION column we get to see what the ADP did to the data. In most of the cases there are just some simple data clean-up being performed and formatting for getting the data ready for input into the algorithm.

If we now look at the Naive Bayes model for the same data set we get a very different sent of transformations being listed under the EXPRESSION column.

SELECT * FROM TABLE(DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS('CLAS_NB_1_59'));

Blog dat trans 2

Now we get to see some of the data binning that ADP performs and is required for input to the Naive Bayes algorithm. You will also notices that we also have some transformations in the REVERSE_EXPRESSION column. These are the inverse or reverse of the transformation that was generated in the EXPRESSION column.

I will let you explore the data transformations that are produced by ADP for the SVM and Decision Tree algorithms.

I will show you how you change the reverse expression in my next blog post, as there are times when you might want the data to be presented slightly differently after the model has been run to score your data.

To get more details of what Automatic Data Preparation is performed for each data mining algorithm you can check out this link in the 11g documentaion. This section seems to be missing from the online 12c documentation.

Tuesday, March 31, 2015

OTech article on Predictive Queries

Last week the Spring 2015 edition of OTech Magazine was published.

Check out the link to the it here.

I was lucky to have an article accepted and published in this edition and the topic of the article was on Predictive Queries.

I've given a presentation on Predictive Queries at a few Oracle User Group conferences over the past 6 months or so, and this article covers what I talk about in that presentation.

The article covers what Predictive Queries are about and goes through some example of how you can use them. Again I give some of these examples in my presentation.

Now is your chance to try out Predictive Queries using the examples in the articles.

Recently I recorded a very short video with Bob Hubbard of OTN on this topic as part of his 2 Minute Tech Tips. Check out my blog post about this video and view the video.

Friday, February 13, 2015

My OTN 2 Minute Tech Tip: Predictive Queries

A few days ago I recorded a 2-minute tech tip with Bob Hubbard of OTN.

My topic was on Predictive Queries which are a new feature in the Oracle 12c Database.

The challenge was to talk about the topic within 2 minutes. That is a lot harder than you time. Believe me.

Check out the video on the Bobs OTN 2-Minute Tech Tip channel or click on the link below.

It was fun doing this and hopefully I get a chance to do another video with Bob.

Here is a screen capture of when things were being recorded.

OTN tech tip

Wednesday, November 12, 2014

Approximate Count Distinct (12.1.0.2 new feature)

With the release of the Oracle Database 12.1.0.2 there was a number of new features and options. Most of the publicity has been around the in-Memory option. But there was lots of other features for the DBA and a few for the developer.

One of the new SQL functions is the APPROX_COUNT_DISTINCT(). This function is different to the tradition count distinct, COUNT(DISTINCT expression), in that is performs an approximate count distinct. The theory is that this approximate count is a lot more efficient than performing the full count distinct.

The APPROX_COUNT_DISTINCT() function is really only suitable when you are processing very large volumes of data and when the data set contains a large number of distinct values.

The general syntax of the function is:

... APPROX_COUNT_DISTINCT(expression) ...

and returns a Number.

The function returns the approximate number of records that contain distinct value for the expression.

SELECT approx_count_distinct(cust_id)

FROM mining_data_build_v;

The APPROX_COUNT_DISTINCT() function ignores records that contain a null value for the expression. Plus is performs less work on the sorting and aggregations. Just run and Explain Plan and you can see the differences.

In some of the material from Oracle the APPROX_COUNT_DISTINCT() function can be 5x to 50x++ times faster. But it depends on the number of distinct values and the complexity of the SQL query.

As the result / returned value from the function may not be 100% accurate, Oracle says that the functions has an accuracy of >97% (with 95% confidence).

The function cannot be used on the following data types: BFILE, BLOB, CLOB, LONG, LONG RAW and NCLOB

Wednesday, October 29, 2014

Something new in 12c: FETCH FIRST x ROWS

In this post I want to show some example of using a new feature in 12c for selecting the first X number of records from the results set of a query.

See the bottom of this post for the background and some of the reasons for this post.

Before we had the 12c Database if we only wanted to see a subset or the initial set of records from the results of a query we could add something like the following to our query

...

AND ROWNUM <= 5;

The could use the pseudo column ROWNUM to restrict the number of records that would be displayed. This was particularly useful when the results many 10s, 100s, or millions of records. It allowed us to quickly see a subset and to see if the results where what we expected.

In my book (Predictive Analytics Using Oracle Data Miner) I had lots of examples of using ROWNUM.

What I wasn't aware of when I was writing my book was that there was a new way of doing this in 12c. We now have something like the following:

...

FETCH FIRST x ROWS ONLY;

There is an example:

SELECT * FROM mining_data_build_v

FETCH FIRST 10 ROWS ONLY;

Fetch first 1

There are a number of different ways you can use the row limiting feature. Here is the syntax for it:

[ OFFSET offset { ROW | ROWS } ]

[ FETCH { FIRST | NEXT } [ { rowcount | percent PERCENT } ]

{ ROW | ROWS } { ONLY | WITH TIES } ]

In most cases you will probably use the number of rows. But there many be cases where you might what to use the PERCENT. In previous versions of the database you would have used SAMPLE to bring back a certain percentage of records.

select CUST_GENDER from mining_data_build_v

FETCH FIRST 2 PERCENT ROWS ONLY;

This will set the first 2 percent of the records.

You can also decide from what point in the result set you want the records to be displayed from. In the previous examples above the results displayed will befing with the first records. In the following example the results set will be processed to record 60 and then the first 5 records will be selected and displayed. This will be records 61, 62, 63, 64 and 65. So the first record processed will be the OFFSET record + 1.

select CUST_GENDER from mining_data_build_v

OFFSET 60 ROWS FETCH FIRST 5 ROWS ONLY;

Similar to the PERCENT example above you can use the OFFSET value, for example.

select CUST_GENDER from mining_data_build_v

OFFSET 60 ROWS FETCH FIRST 2 PERCENT ROWS ONLY;

This query will go to records 61 and return the next 2 percent of the records.

The background to this post

There are a number of reasons that I really love attending Oracle User Group conferences. One of the challenges I set myself is to go to presentations on topics that I think I know or know very well. I can list many, many reasons for this but there are 2 main points. The first is that you are getting someone elses perspective on the topic and hence you might learn something new or understand it better. The second is that you might actually learn something new, like some new command, parameter setting or something else like that.

At Oracle Open World recently I attended the EMEA 12 things about 12c set of presentations that Debra Lilly arranged during the User Group Forum on the Sunday. During these session Alex Nuijten gave an overview of some 12c new SQL features. One of these was the command FETCH FIRST x ROWS. This blog post illustrates some of the different ways of using this command.

Friday, October 10, 2014

Installing Oracle 12.1.0.2 on Windows 64bit

The following steps are what I did for installing 12.1.0.2 on Windows.

1. Download the Oracle installation ZIP files from the Oracle Downloads page.

DB Install 15

2. Unzip the two 12c downloads files into the same directory.

3. Go to the newly created directory (it is probably called 'database') and you will find a file called setup.exe. Double click on this file.

DB Install 1

After a couple of seconds you will see the Oracle Database 12c splash screen.

DB Install 2

4. Step 1 : Configure Security Updates : Un-tick the tick-box and click the Next button. A warning message will appear. You can click on the Yes button to proceed.

DB Install 3

5. Step 2 : Installation Options : select the Create and Configure a Database option and then click the Next button.

DB Install 4

6. Step 3 : System Class : Select the Server Class option and then click the Next button.

DB Install 5

7. Step 4 : Grid Installation Options : Select the Single Instance Database Installation option and then click the next button.

DB Install 6

8. Step 5 : Select Install Type : Select the Typical install option and then click the Next button.

DB Install 7

9. Step 6 : Oracle Home User Selection : Select the Use Windows Built-in Account option and then click the Next button. A warning message appears. Click the Yes button.

DB Install 8

10. Step 7 : Typical Install Configuration : Set Global Database Name to cdb12c for the container database name. Set the Administrative password for the container database. Set the name of the pluggable database that will be created. Set this to pdb12c. Or you can accept the default names. Then click the Next button. If you get a warning message saying the password does not conform to the recommended standards, you can click the Yes button to ignore this warning and proceed.

DB Install 9

11. Step 8 : Prerequisite Checks : the install will check to see that you have enough space and necessary permissions etc.

12. Step 9 : Summary : When the prerequisite checks (like checking you have enough space and privileges) are finished you will get a window like the following.

DB Install 10

13. Step 10 : Install : You are now ready to start the install process. To do this click on the Install button in the Summary screen.

DB Install 11

You can now sit back, relax and watch the installation of 12.1.0.2c (with the in-memory option) complete.

You may get some Windows Security Alert windows pop up. Just click on the Allow Access button.

Then the Database Configuration Assistant will start. This step might take a while to complete.

DB Install 12

When everything is done you will get something like the following.

DB Install 13

Congratulations you now have Oracle Database 12.1.0.2c installed.

But you are not finished yet!!!

14. Add entry to TNSNAMES.ORA : you will need to add an entry to your tnsnames.ora file for the pluggable database. There is be an entry for the container DB but not for the pluggable. Here is what I added to my tnsnames.ora.

DB Install 14

The last step you need to do is to tell the container database to start up the pluggables when you reboot the server/laptop/PC/VM. To do this you will need to create the following trigger in the container DB.

sqlplus / as sysdba

CREATE or REPLACE trigger OPEN_ALL_PLUGGABLES

after startup

on database

BEGIN

execute immediate 'alter pluggable database all open';

END open_all_pdbs;

Restart your database or machine and you plug gage DB 'pdb12c' will no automatically start.

You are all finished now :-)

Enjoy :-)

Tuesday, September 9, 2014

ORE now available for Multitenant (PDB) version of 12c

Oracle has released an update to their Oracle R Enterprise software. We now have ORE 1.4.1 and this seems to have been released on the past day or so.

Here are the links to the important stuff:

ORE 1.4.1 Release Note

ORE 1.4.1 User Guide

ORE 1.4.1 Installation Guide

ORE 1.4.1 Download page

One of the main features of this new release is that it now supports the multi tenant option of the 12c database. Up to now if you wanted to use ORE and 12c then you needed to do a traditional install of the database. That means you would be just installing a single instance of the 12c database with no CDB or PDB.

With ORE 1.4.1 you can now install ORE into a PDB. It needs to be one of your current PDBs and should not be installed into the root PDB, otherwise it will not work. Check out the installation instructions using the links above.

As with all new releases there are a lot of bug fixes and perhaps some new ones too :-)

Monday, April 14, 2014

Oracle R Enterprise and Oracle 12c

A few of weeks ago we had the release of Oracle R Enterprise (ORE).

There has been some posts on the R/ORE on the Oracle discussion forums about installing ORE on Oracle 12c.

It turns out that the only way to install ORE on an Oracle 12c database is if you do a traditional install. What this means is that you do not have a CDB and PDBs configuration of Oracle 12c.

I'll assume that Oracle are currently working on this particular issue, as you can imagine that that there is considerable amount of complexity in getting ORE to work with the PDBs.

If you are not using Oracle 12c then you are OK, as long as you are using 11.2.0.3 or 11.2.0.4 versions of the database. If you are using a lower version of the 11.2 database then you need to apply a patch to allow ORE to run.

As they say I'm sure it will be "fixed in the next release" :-)

Tuesday, August 20, 2013

Schema Table Filtering for Oracle Data Miner

If you have been using Oracle Data Miner, that is part of SQL Developer 4 or SQL Developer 3, you will notice that your schema can get filled up with various tables that are created by your workflows. The following image gives an example.

These tables can include details of the various algorithms used and their settings, sample tables that were created using the various nodes, etc. Basically they contain all the information that was setup by each node. Not every node in your workflow will create a table, but a lot do in particular if you have set the Cache or Sample in the Properties tab.

In most cases you do not need to be aware of or use most of these tables.

So How do I hide them, so that my schema table listing only shows me the main tables in my schema ? By main tables, I mean the tables that you would expect to have in your schema before you started using Oracle Data Miner.

The answer to this question is to apply filters to your tables in SQL Developer. To do this go to your schema in the Connections tab. Expand to get the full list of schema objects and then right click on Tables. You should get a menu like the following.

Select Apply Filter from the menu and the Apply Filter window will open. Here you can create filters to apply to the tables in your schema.

To restrict Oracle Data Miner related table you will need to exclude tables that begin with, DM$ and ODMR$. The following image shows these filters.

When these filters are applied we only get our schema tables.

There are two additional filters you may want to consider. The first of these is for the tables that begin with OUTPUT. These are tables that are created when you build a node sends the outputs from running a model to a table, or some other scenario where the output is sent to a table. In reality this is bad naming and we should use a name that is more meaningful, and reflects the contents of the table. But sometimes you just want to spool the outputs to a table and the name is not important. I have an additional filter to not show these tables (see below).

With SQL Developer 4, Oracle Data Miner seems to generate IOTs, as we can see in the above image. Again another filter can be created to exclude these from the list.

Here is the full list of filters.

Tuesday, August 6, 2013

Depreciated ODM features in 12c

With the release of the Oracle 12c Database there has been some changes to the Advanced Analytics options. Most of the new features both in the database and in the ODM tool have been documented in previous blog posts.
But what has been removed from the Advanced Analytics Option and what is not longer supported.
The first of these is the Java API that Oracle supplied many, many years ago. They have been saying for a few years now and since the release of 11.2g that these Java APIs are no longer supported. Again the documentation states this and the demo scripts are not included in the latest SQL Developer 4. Instead of using the Java APIs you can using the in-database SQL functions and procedures.
One of the in-database DM algorithms was the Adaptive Bayes Network (ABN). Although this was de-supported in 11.2g of the database is was still in the database. This was to give customers who were still using it time to migrate to using the other algorithms. In 12c the ABN algorithm is not in the the database. Before you upgrade your 11.2.x Oracle database to 12c you will need to drop any ABN models that you have in your database

Thursday, July 25, 2013

12c New Data Mining functions

With the release of Oracle 12c we get new functions/procedures and some updated ones for Oracle Data Miner that is part of the Advanced Analytics option.

The following are the new functions/procedures and the functions/procedures that have been updated in 12c, with a link to the 12c Documentation that explains what they do.

CLUSTER_DETAILS is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns an XML string that describes the predicted cluster or a specified cluster.
CLUSTER_DISTANCE is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns the raw distance between each row and the centroid of either the predicted cluster or a specified.
CLUSTER_ID has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering.
CLUSTER_PROBABILITY has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.
CLUSTER_SET has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE
FEATURE_DETAILS is a new function that predicts feature matches for each row. It can use a pre-defined feature extraction model or perform dynamic feature extraction. The function returns an XML string that describes the predicted feature or a specified feature.
FEATURE_ID has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction.
FEATURE_SET has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.
FEATURE_VALUE has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.
PREDICTION_BOUNDS now returns the upper and lower bounds of the prediction as the BINARY_DOUBLE data type. It previously returned these values as the NUMBER data type.
PREDICTION_COST has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned cost has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION_DETAILS has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.
PREDICTION_PROBABILITY has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION_SET has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.

Tuesday, July 23, 2013

Oracle Data Miner New Features (SQL Dev 4)

With the release of the new Oracle 12c database and SQL Developer 4 we have a range of Oracle Data Miner new features . Some of these are embedded into the database and are only available in 12c. Check out my previous blog post on these new features.

In this blog post I will look at the new Oracle Data Miner features that come with the ODM tool in SQL Developer4.

The new features of the Oracle Data Miner tool can be grouped into 2 categories. The first category contains the new features that are available to all user of the tool (11.2g and 12c). The second category contains the new features that are only available in 12c. The new features of each of these categories will be explained below.

Category 1 – Common new features for 11.2g and 12c Database users

There is a new View Data feature that allows you to drill down to view the customer object and to view nested tables.

A new Graph Node that allows you to create graphs such as line, bar, scatter and boxplots for data at any stage of a workflow. You can specify any of the attributes from the data source for the graphs. You don’t seem to be limited to the number of graphs you can create.

A new SQL Node. This is welcome addition, as there has been many times that I’ve need to write some SQL or PL/SQL to do a specific piece of processing on the data that was not available with the other nodes. There are 2 important elements to this SQL node really. The first is that you can write SQL and PL/SQL code to do whatever processing you want to do. But you can only do it on the Data node you are connected to.

The second is that you can use it to call some ORE code. This allows you to use the power of R and extensive range of packages that are available to expand the analytic functionality that is available in the database. If there is some particular function that you cannot do in Oracle and it is available in R, you can now embed this function/code as an ORE object in the database. You can then called using SQL.

WARNING: this particular feature will only work if you have ORE installed on your 11.2.0.3g or 12.1c database

New Model Build Node features, include node level text specifications for text transformations, displays the heuristic rules responsible for excluding predictor columns and being able to control the amount of classification and regression test results that are generated. I’ll be covering these in later blog posts.

New Workflow SQL Script Deployment features. Up to now the workflow SQL script, I found to be of limited use. The development team have put a lot of work into generating a proper script that can be used by developers and DBA. But there are some limitations still. You can use the script will run the workflow automatically in the database without having the use the ODM tool. But it can only be run the in the schema that the workflow was generated. You will still have to do a lot of coding (although a lot less than you used to) to get your ODM models and workflows to run in another schema or database.

This will output the script to a file buried deep somewhere inside you SQL Developer directory. Unfortunately in the EA1 release, the size of this location field is small and scrolling has not been enabled. So you cannot (currently) scroll to the end of the field to see the actual location. You can edit this location to have a different shorter location.

Maybe this will be fixed for the official release.

Category 2 – New features for 12c Database users.

Now for the new features that are only visible when you are running ODM / SQL Dev 4 against a 12c database. No configuration changes are needed. The ODM tool checks to see what version of the database you are logging into. It will then present the available features based on the version of the database.

New Predictive Query nodes allows you to build a node based on the new non-transient feature in 12c called Predictive Queries (PQs). In SQL Developer we get 3 addition types of Predictive Queries. These can be used for Anomaly Detection, Clustering and Feature Extraction

It is important to remember that underlying model produced by these PQs to not exist in the database after the query has executed. The model is created, used on the data and then the model deleted.

The Clustering node has the new algorithm Expectation Maximization in addition to the existing algorithms of K-Means and O-Cluster.

The Feature Extraction node has the new algorithm called Principal Component Analysis in addition to the existing Non-Negative Matrix Factorization algorithm.

Text Transformations are now built into the model build nodes. These text transformations will be part of the Automatic Data Processing steps for the model build nodes. This is illustrated in the above images.

The Generalized Linear Model that is part of the Classification Node has a Feature Selection option in the Algorithm Settings. The default setting is Ridge Regression. Now there is an additional option of using Feature Selection.

Prediction Result Explanations gives the scoring details used to to explain why the prediction was made.

Look out for blog post on each of these new features.

Pages

Monday, August 7, 2017

Friday, April 21, 2017

Monday, May 30, 2016

Monday, July 13, 2015

Friday, May 8, 2015

Monday, May 4, 2015

Thursday, April 30, 2015

Friday, April 24, 2015

Saturday, April 18, 2015

Tuesday, March 31, 2015

Friday, February 13, 2015

Wednesday, November 12, 2014

Wednesday, October 29, 2014

Friday, October 10, 2014

Tuesday, September 9, 2014

Monday, April 14, 2014

Tuesday, August 20, 2013

Tuesday, August 6, 2013

Thursday, July 25, 2013

Tuesday, July 23, 2013