Thursday, April 30, 2015
Viewing Models Details for Decision Trees using SQL
Developing your Decision Tree models using the ODMr allows you to explore the decision tree produced, to drill in on each of the nodes of the tree and to see all the statistics etc that relate to each node and branch of the tree.
But when you are working with the DBMS_DATA_MINING PL/SQL package and with the SQL commands for Oracle Data Mining you don't have the same luxury of the graphical tool that we have in ODMr. For example here is an image of part of a Decision Tree I have and was developed using ODMr.
What if we are not using the ODMr tool? In that case you will be using SQL and PL/SQL. When using these you do not have luxury of viewing the Decision Tree.
So what can you see of the Decision Tree? Most of the model details can be used by a variety of functions that can apply the model to your data. I've covered many of these over the years on this blog.
For most of the data mining algorithms there is a PL/SQL function available in the DBMS_DATA_MINING package that allows you to see inside the models to find out the settings, rules, etc. Most of these packages have a name something like GET_MODEL_DETAILS_XXXX, where XXXX is the name of the algorithm. For example GET_MODEL_DETAILS_NB will get the details of a Naive Bayes model. But when you look through the list there doesn't seem to be one for Decision Trees.
Actually there is and it is called GET_MODEL_DETAILS_XML. This function takes one parameter, the name of the Decision Tree model and produces an XML formatted output that contains the attributes used by the model, the overall model settings, then for each node and branch the attributes and the values used and the other statistical measures required for each node/branch.
The following SQL uses this PL/SQL function to get the Decision Tree details for model called CLAS_DT_1_59.
SELECT dbms_data_mining.get_model_details_xml('CLAS_DT_1_59')
FROM dual;
If you are using SQL Developer you will need to double click on the output column and click on the pencil icon to view the full listing.
Nothing too fancy like what we get in ODMr, but it is something that we can work with.
If you examine the XML output you will see references to PMML. This refers to the Predictive Model Markup Language (PMML) and this is defined by the Data Mining Group (www.dmg.org). I will discuss the PMML in another blog post and how you can use it with Oracle Data Mining.
Friday, April 24, 2015
Changing REVERSE Transformations in Oracle Data Miner
In my previous blog post I showed you how you can have a look at the transformations that the Automatic Data Preparation (ADP) feature of Oracle Data Mining produces. I also gave some example of the different types of ADF that are performed for different algorithms.
One of the features of the transformations produced is that it will generate a REVERSE_EXPRESSION. This will take the scored results and apply the inverse of the transformation that was performed when the data was being prepared for input to the algorithm.
Somethings you may want to have the scored data returned in a slightly different ways or labeled in a slightly different way.
In this blog post I will show you how to define an alternative REVERSE_EXPRESSION for an attribute.
The function we need to use for this is the ALTER_REVERSE_EXPRESSION procedure that is part of the DBMS_DATA_MINING package.
When we score data for a typical classification problem we typically use 0 (zero) and 1 to be the target variable values. But what if we wanted the output from our classification model to label the scored data slighted differently.
In this case we can use the ALTER_REVERSE_EXPRESSION procedure to define the new values. What if we wanted the zero to be labeled as NO and the 1 as YES. In this case we can use the following.
BEGIN dbms_data_mining.alter_reverse_expression( model_name => 'CLAS_NB_1_59', expression => 'decode(affinity_card, ''1'', ''YES'', ''NO'')', attribute_name => 'AFFINITY_CARD'); END;
When we view the transformations for our data mining model we can now see the transformation.
Now when we score our data the predicted target variable will now have our newly defined values.
SELECT cust_id, PREDICTION(CLAS_NB_1_59 USING *) PRED FROM mining_data_apply_v FETHC FIRST 5 ROWS ONLY;
You can see that this is a very powerful feature and allows use to turn the scored data values is a different way to make them more useful. This is particularly the case as we work towards a more Automatic type of Predictive Analytics.
Saturday, April 18, 2015
ODM : View Transformations generated by Automatic Data Prepreparation
A very powerful feature of Oracle Data Mining and one that I think does not get enough notice is called Automatic Data Preparation.
Data Preparation is one of the most time consuming, repetitive and boring parts of the work that a Data Miner or Data Scientist performs as part of their daily tasks. Apart from gathering the data, integrating the data, getting the data into the required formation the most interesting part of the work is with feature engineering.
Then you have all the other boring data preparation tasks of how to handle missing data, type conversion, binning, normalization, outlier treatment etc.
With Automatic Data Preparation (ADP) in Oracle Data Mining you can let Oracle work all of these things out for you and to perform all the necessary coding and to store all of this coding as part of the in-database data mining model.
This is Fantastic. This ADP feature can same you hours and in some cases days of effort.
But (there is always a but :-) ) what if you are a bit unsure if the transformations that are being performed are exactly what you would wanted. Maybe you would like to see what Oracle is doing and depending on this you can do it a different way.
The first step is to examine the transformations that are generated by stored as part of the in-database data mining model. The DBMS_DATA_MINING package has a function called GET_MODEL_TRANSFORMATIONS. When you query this function, passing in the name of the data mining model, you will get returned the list of transformations that have been applied to each model.
In the following example a GLM model was created using the Oracle Data Miner tool (that is part of SQL Developer). When you use Oracle Data Miner, ADP is automatically turned on.
The following query calls the GET_MODEL_TRANSFORMATIONS function with the data mining model called CLAS_GLM_1_59/.
SELECT * FROM TABLE(DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS('CLAS_GLM_1_59'));
The following image contains the output generated by this query.
When you look at the data under the EXPRESSION column we get to see what the ADP did to the data. In most of the cases there are just some simple data clean-up being performed and formatting for getting the data ready for input into the algorithm.
If we now look at the Naive Bayes model for the same data set we get a very different sent of transformations being listed under the EXPRESSION column.
SELECT * FROM TABLE(DBMS_DATA_MINING.GET_MODEL_TRANSFORMATIONS('CLAS_NB_1_59'));
Now we get to see some of the data binning that ADP performs and is required for input to the Naive Bayes algorithm. You will also notices that we also have some transformations in the REVERSE_EXPRESSION column. These are the inverse or reverse of the transformation that was generated in the EXPRESSION column.
I will let you explore the data transformations that are produced by ADP for the SVM and Decision Tree algorithms.
I will show you how you change the reverse expression in my next blog post, as there are times when you might want the data to be presented slightly differently after the model has been run to score your data.
To get more details of what Automatic Data Preparation is performed for each data mining algorithm you can check out this link in the 11g documentaion. This section seems to be missing from the online 12c documentation.
Friday, April 3, 2015
Oracle Magazine - Fall 1987
This was a bumper issue in comparison to the very first edition.
Other articles included:
- Oracle (International) User Week was held during week 27th September. This coincided with Oracle's 10th anniversary, and have over 1000 attendees. That is a bit of difference in numbers that Oracle Open World now gets!
- Oracle post record Revenues and Earnings. Fiscal year 1996 had revenue of $55.4 million and fiscal year 1997 had revenue of $131.3 million, with fourth quarter revenue of $50 million.
- AHOLD, a leading Dutch supermarket chain is using the Oracle database to automate 500 of its Albert Heijn (NL) supermarkets.
- Finnish Defense Ministry deploys the Oracle database system.
- SQL*Menu is released. Its main was to all application designers can unify applications build with SQL*Forms, SQL*Report, SQL*Plus and other applications.
- Loews Anatole Hotel in Dallas had recently checked in over 3,100 guests and checked out 2,900 guest in one day. All of this processing what done using an Oracle Database .
- Over 120 competitions in 16 days. More than 2,600 athletes from 51 countries. An estimated 1.6 million spectators. Volunteers, scorers and worldwide media Organising and managing the 1998 Winter Olympics in Calgary was all done using an Oracle database.
- There was an article on writing SQL by Richard Finkelstein, using some of the set operators and how to update data using a nested query.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.
Tuesday, March 31, 2015
OTech article on Predictive Queries
Last week the Spring 2015 edition of OTech Magazine was published.
Check out the link to the it here.
I was lucky to have an article accepted and published in this edition and the topic of the article was on Predictive Queries.
I've given a presentation on Predictive Queries at a few Oracle User Group conferences over the past 6 months or so, and this article covers what I talk about in that presentation.
The article covers what Predictive Queries are about and goes through some example of how you can use them. Again I give some of these examples in my presentation.
Now is your chance to try out Predictive Queries using the examples in the articles.
Recently I recorded a very short video with Bob Hubbard of OTN on this topic as part of his 2 Minute Tech Tips. Check out my blog post about this video and view the video.
Thursday, March 12, 2015
Automatic Analytics is So main stream. Not something new.
Everyone is doing advanced analytics. Right? Hmm
Everyone is talking about advanced analytics? Yes that is true.
Everyone is an expert in advanced analytics? This is so not true. Watch out for these Great Pretenders. You know what I mean! You know who I mean! Maybe you know some of them already? If not, watch out for these Great Pretenders!!!
Some people are going around talking about data mining, predictive analytics, advanced analytics, machine learning etc as if this is some new topic. Well it isn't. It isn't anything new and most of the techniques have been about for 10, 20, 30+ years.
Some people are saying you should only use language X or tool Y because. Everything else is basically rubbish.
What we do have is a wider understanding of how to use these techniques on our various data sources.
What we have is a lot more tools that allow us to perform these tasks a lot easier, at greater speed, with more functionality and without the need to fully understand the hard core maths that is going on behind the scenes.
What we have is a lot more languages to perform these tasks and to support the vast amount of work that goes into understanding the data and preparing the data.
Someone thing for all of us to watch out for, when we ready about these topics, is what kind of problem area they are addressing. The following table illustrates the three main types or categories of Analytics. These categories are Descriptive Analytics, Predictive Analytics and Prescriptive Analytics. I think most people would agree that the Descriptive and Predictive Analytics categories are very mature at this stage. With Predictive Analytics we are perhaps still evolving in this category and a lot more work needs to be done before this this become wide spread.
Some people talk as if Predictive Analytics is some new and exciting topic. But isn't all that new. It was been around for the past 30+ years. If you go back over the Gartner Hype Cycle that comes out every September, Predictive Analytics is no longer being shown on this graph. The last time it appeared on the Gartner Hype Cycle was back in 2013 and it was positioned on the far right of the graph in the section called Plateau of Productivity.
So Predictive Analytics is very mature and main stream. Part of the reason that it is main stream is that Predictive Analytics has allowed for a new category of Analytics to evolve and this is Automatic Analytics.
Automatic Analytics is where Advanced and Predictive Analytics has been build into our day to day applications that are used to run our business. We do not need the hard core type of data scientists to perform various analytic on our data. Instead these task, once they have been defined, can then be added to our applications to process, evaluate and make decisions all automatically. This is were we need the data scientists to be able to communicate with the business and be able to work with them to solve real world business projects. This is a different type of data scientist to the "hard" core data scientist who delves into the various statistical methods, machine learning methods, data management methods, etc.
The following table extends the table given above to include Automatic Analytics, and is my own take on how and where Automatic Analytics fits.
Every time we get an insurance quote, health insurance quote, get a "random" call from our Telco offering a free upgrade, get our loyalty card statements, get a loan from the bank, look at or buy a book on Amazon, etc. the list could go on and on, but these are all examples of how predictive analytics has been automated into our everyday business application.
But this is nothing new. When I first got into data mining/predictive analytics over 16 years ago, it was considered a common thing that certain types of companies did. What has happened in the time since and particularly in the past few years is that a lot more people are seeing the value in using it.
Before I finish off this post we can have a quick look at what Oracle has been doing in this area. They have their Advanced Analytics Option and Real-Time Decisions tools to all data scientists do their magic. But over the past X years (nobody can give me an exact number) they have been very, very active in building in lots and lots of predictive analytics into their various business applications, particularly with into with Fusion Apps and BI Apps.
A recent quote from Oracle highlights their aim with this,
" ... products designed to close the gap between data scientists and businesses."
Now with Oracle making a big push to the cloud, they are busy adding in more and more Automatic (Predictive) Analytics into their Cloud Applications. What we need from Oracle is a clearer identification of where they have done this. Plus with the migration of their Apps to the cloud, their Advanced Analytics Option is a core part of their Cloud platform. As they upgrade or add new features into their Cloud Apps, you will now be able to get the benefit of these Automatic (Predictive) Analytics as they come available.
Monday, March 9, 2015
OUG Ireland 2015 is next week
The annual OUG Ireland conference is on next week on Thursday 19th March.
If you haven't already signed up for the conference this is only a few days left to do so. Click here to go to the registrations pages.
Also don't forget to sign Maria Colgan's one day seminar on the Oracle 12c In-Memory Option.
As always there is a very full agenda with 7 streams, 47 presentations and several keynote presentations.
I'll be a draw for a copy of my book and I'll be giving away a few Oracle Press goodies too. Check out this blog post for the details and rules of the book draw.
The following are the presentations I'm planning on attending (so you know where to find me)
Time | Presenter | Topic |
---|---|---|
09:10-09:30 | Debra Lilley | OUG Ireland Welcome, Introduction and Opening |
09:30-10:10 | Jon Paul (Oracle) | Opening Keynote by Jon Paul from Oracle |
10:15-11:00 | Oralce Presentation | Oracle Big Data Strategy |
11:00-11:25 | Exhibition Hall | |
11:25-12:10 | Antony Heljula | Real Business Value Using Predictive BI (I've seen this before but I worked with Antony on some of what he will be talking about) |
12:15-13:00 | Roel Hartman & Brendan Tierney | What Are They Thinking? With Oracle Application Express & Oracle Data Mining. (we gave this presentation at Oracle Open World back in September 2014) |
12:15-13:00 | Gurcan Orhan | How to handle Dev, Test & Prod with ODI |
13:00-14:00 | Lunch (and then freaking out before I give my second presentation) |
|
14:00-14:45 | Brendan Tierney | Predictive Queries in Oracle 12c Database (I suppose I have to turn up to my own presentation) |
14:50-15:35 | Roel Hartman | Hidden APEX 5 Gems Revealed (APEX 5 is due out any day now) |
15:35-16:00 | Exhibition Hall & Coffee (and then freaking out before I give my third presentation) |
|
16:00-16:45 | Brendan Tierney | Running R in your Oracle Database using Oracle R Enterprise (This presentation generally runs for 50 minutes) |
16:50-17:35 | Maria Colgan | BI, Dev & Tech Closing Keynote: Oracle Database In-Memory-The next big thing |
17:35-18:35 | Event Social i.e. free drink :-) |
As you can see it is going to be a busy, busy day.
I would love to attend lots of others, but being able to be in multiple places at the same time is not one of them.
NOTE:The User Group has a rule that a presenter can have a max of 2 presentations. Unfortunately we had to break this rule a week out from the conference, due to some cancellations. And that is why I've ended with 2.5 presentations.
Friday, March 6, 2015
RIP SQL*Plus & hello SQL Command Line
Over the past couple of months Oracle has been releasing some EA (Early Adopter) versions of a new tool that is currently called SQL Command Line.
The team behind this new tool is the SQL Developer development team and they have been working on creating a new command line SQL tool that is based on some of the technology that is included in SQL Developer.
SQL Command Line in an stand alone tool and all you need to do is to download and un-zip the tile.
What I want to show in this blog post is some of new features that are available and that I have found particularly useful. But before we get onto those commands let us first have a look at how you can get setup and running with SQL Command Line.
Download & Setup
The current download of SQL Command Line can be found under the SQL Developer 4.1 EA Download page. I'm assuming when 4.1 is formally released the download for SQL Command line will be on the main SQL Developer Download web page.
After you have downloaded the file, all you need to do is to unzip the file and then copy the unzipped directory to where you want the software to be located on your client.
Now you are ready to get started with using SQL Command Line.
Connecting to your Oracle Schema
(That) Jeff Smith and Barry McGillin have a couple of good blog posts on the different connection methods and some setup or configuration you might need to consider. Check out these links for more details.
For me I did not have to do any additional setup or configuration. I was able to use the TNS Names and the EZConnect methods without any problems.
The following how to connect to my (DMUSER) schema using the EZConnect method. With this method we pass in the username, password, the host name, port number and the service name. Just like this> sql dmuser/dmuser@localhost:1521/pdb12c
We can not have a look at the JDBC connection details.
SQL> show jdbc
-- Database Info --
Database Product Name: Oracle
Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options
Database Major Version: 12
Database Minor Version: 1
-- Driver Info --
Driver Name: Oracle JDBC driver
Driver Version: 12.1.0.2.0
Driver Major Version: 12
Driver Minor Version: 1
Driver URL: jdbc:oracle:thin:@localhost:1521/pdb12c
SQL>
If we have a TNSNAMES.ORA file on our computer and the directory that it is in, is on the search PATH, then we can use the service names defined in the TNSNAMES.ORA file. The following example shows you how to use this in two ways. The first shows how to enter all the details when you are starting SQL CL and the other is when SQL CL prompts you for each parameter.
> sql dmuser/dmuser@pdb12c
and when we are prompted to enter the parameters, we get the following.
> sql
SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:12 2015
Copyright (c) 1982, 2015, Oracle. All rights reserved.
SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:14 2015
Copyright (c) 1982, 2015, Oracle. All rights reserved.
Username? (''?) dmuser
Password? (**********?) ******
Database? (''?) pdb12c
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
SQL>
As you can see these work in the same way as when we use SQL*Plus.
Now that you are connected to your schema, what else can you do? The following sections are some useful commands. Commands & Help
The following list of commands is by no means a complete list of commands available in SQL Command Line. Theoretically everything you can currently do in SQL*Plus you can also do in SQL Command Line (theoretically) But the commands I give examples of below are some of my favourites (so far).
You can get the list of commands by typing help at the SQL prompt.
SQL> help
Then to get help on a specific command you can just add the command after the help.
SQL> help cd
CD
---
Changes path to look for script at after startup.
(show SQLPATH shows the full search path currently:
- CD current directory setting set by last cd command
- baseURL (url for subscripts)
- topURL (top most url when starting script)
- Last Node opened (i.e. file in worksheet)
- Where last script started
- Last opened on sqlplus path related file chooser
- SQLPATH setting
- "." if in SQLDeveloper UI (included in SQLPATH in command line (sdsql))
).
SQL>
Some work is still needed on the help documentation and what is listed for each command, as the current version is missing some important details.
Alais
This is by far my favourite new feature. This allows us to take some of our most common SQL statements and to create a shortcut for it.
Very soon I will not be using Oracle SQL but I will be using My SQL, as I will have created my own personalised version of SQL.
To list what aliases you have defined in your schema you can type
SQL > alais
Oracle will have a few aliases already defined in SQL CL. By having a look at some of these you can see some of what you want they can do and get ideas for what you might want to do with them. To list the contents of an alias, you can use the following command.
alias list {alias name}
for example
SQL > alias list tables
This command lists the query that is used for the 'tables' alias that comes with SQL CL.
I use Oracle Data Miner a lot and when you use this tool it can create a number of tables with a variety of names in your schema. Most of these you will never need to look at. So what I do is create an alias that excludes these from the list of tables in my schema.
SQL> alias tables2=select table_name from user_tables where table_name not like 'ODMR$%' and table_name not like 'DM$%' and table_name not like 'SYS_IOT%';
So now all I need to do to list my important data only tables (and exclude all the Oracle Data Miner tables) I can run my alias 'table2'.
SQL> tables2
You will quickly build up a suite of commands using aliases.
info and >info+
info and info+ are the new commands to replace the DESC command.
The difference between info and info+ is that info+ gives you some statistical information about the table and the attributes in the table. This is illustrated in the following examples.
Example using 'info'
Example using 'info+'
CTAS & DDL
If you want to get the DDL script to create a copy of a table you have two options open to you. The first of these is the DDL command. This creates a DDL statement based on the meta data for the table, just like in the following
An alternative to this is to use the CTAS command that will give a slightly different output to DDL command. With the CTAS we also get the CREATE TABLE .. AS SELECT ...
History
In SQL*Plus we had a limited scroll through our previous commands. The same kind of scrolling is available in SQL CL, but we can get to see all our previous commands using the 'history' command. The following illustrates how you can list all you previous commands, I'm sure it is limited to a certain number or will be otherwise it will become a very long, long list.
SQL> history
To find out how often each command has been run you can run
SQL> history usage
and to find out how long the query took to run the last time it was run
SQL> history time
There are lots more that I could show, but this post is way, way to long as it is. What I suggest you do is go and download SQL CL (Command Line) and start using it today.
Tuesday, March 3, 2015
Book give away at OUG Ireland
The annual Oracle User Group in Ireland conference is on the 19th March in Croke Park.
I'll be giving 2 presentations, with one each on the Development and Business Analytics tracks. Here are the details of these presentations.
Time | Room | Presentation Title / Topic |
---|---|---|
14:00-14:45 | InterConnect 681 | Predictive Queries in Oracle 12c |
16:00-16:45 | Davin Suite | Running R in the Database using Oracle R Enterprise |
I will be giving away a copy of my book to one luck person :-)
How will this book give away work?
During both of my presentations I will pass around a "hat" for you to put your name or business card into. Then at end of my last presentation we will draw one name out of the hat.
But you have to be in the room to collect the book. If you are not there then I will draw out another name (and so on) until the winner is in the room.
So by attending both of my presentations you are doubling your chances of winning my book.
(Maybe this is an attempt by me to have a good attendance at my last presentation)
Plus I might have a few other Oracle Press goodies to give away too.
Wednesday, February 25, 2015
US President talks about Data Science
Check out the video of US President talking about Data Science and the first Chief Data Scientist of the USA talks about his mission.
Sunday, February 22, 2015
Oracle ACEs at OUG Ireland 2015
The annual Oracle User Group in Ireland Conference will be on Thursday 19th March. This year the conference will be held in the Croke Park conference centre. This conference centre is only a short taxi ride from Dublin Airport and Dublin City Centre.
If you are planning a hotel stay for the conference I would recommend staying in a hotel in the city centre and get a taxi to/from the conference venue.
We have a large number of Oracle ACEs presenting at the conference. The following table lists the ACEs, their twitter handle and their website.
Make sure you check out the full agenda for the conference by clicking on the following image. Plus there is a full day session on Friday 20th March with Maria Colgan on the Oracle In-Memory option.
Friday, February 13, 2015
My OTN 2 Minute Tech Tip: Predictive Queries
A few days ago I recorded a 2-minute tech tip with Bob Hubbard of OTN.
My topic was on Predictive Queries which are a new feature in the Oracle 12c Database.
The challenge was to talk about the topic within 2 minutes. That is a lot harder than you time. Believe me.
Check out the video on the Bobs OTN 2-Minute Tech Tip channel or click on the link below.
It was fun doing this and hopefully I get a chance to do another video with Bob.
Here is a screen capture of when things were being recorded.