Showing posts with label Oracle Analytics Option. Show all posts
Showing posts with label Oracle Analytics Option. Show all posts

Tuesday, January 12, 2016

ORE video : Demo Code Part 3

The following is the third set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.

The following code is illustrates some simple examples of using Oracle R Enterprise. In these example you will see how to connect to the Oracle Database, how to query and process some of the tables and views in the Oracle Database, how to check that you are working with objects in the database, how to move data to the database and query it.

> library(ORE)
> # ore.connect(user="rquser", sid="orcl", host="localhost", password="rquser", port=1521, all=TRUE);
> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);
> # Test the connection
> ore.is.connected()
 [1] TRUE
> # List all the tables and views
> ore.ls()
character(0)
> # Use ore.sync to only include the tables and views listed
> ore.sync()
> ore.ls()
 [1] "DEMO_R_APPLY_RESULT"      "DEMO_R_TABLE"             "INSUR_CUST_LTV_SAMPLE"    "MINING_DATA_APPLY"       
 [5] "MINING_DATA_APPLY_V"      "MINING_DATA_BUILD_V"      "MINING_DATA_TEST_V"       "MINING_DATA_TEXT_APPLY_V”
 [9] "MINING_DATA_TEXT_BUILD_V" "MINING_DATA_TEXT_TEST_V" 
> # Disconnect and reattached with no meta-data sync
> ore.disconnect()
> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);
> ore.sync(table = c("MINING_DATA_BUILD_V", "MINING_DATA_TEST_V", "INSUR_CUST_LTV_SAMPLE"))
> ore.ls()
 [1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V"   "MINING_DATA_TEST_V"   
> # Check for the existance of a table or view
> ore.exists("MINING_DATA_BUILD_V")
 [1] TRUE
> # list the objects in the DMUSER schema
> ore.ls("DMUSER")
 [1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V"   "MINING_DATA_TEST_V" 
> #
> # Load data from a file into a new table
> ore.exists("DEMO_R_TABLE")
 [1] TRUE
> ore.drop(table='DEMO_R_TABLE')
> ore.ls()
 [1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V"   "MINING_DATA_TEST_V"   
> titanic <- read.table("c:/R/titanic2.txt", header=T, sep="\t")
> ore.create(titanic, table="DEMO_R_TABLE")
> tData <- ore.get("DEMO_R_TABLE")
> head(tData)
                 NAME PCLASS AGE    SEX SURVIVED
1 Fynney, Mr Joseph J    2nd  35   male        0
2      Gale, Mr Harry    2nd  35   male        0
3   Gale, Mr Shadrach    2nd  38   male        0
4 Garside, Miss Ethel    2nd  24 female        1
5  Gaskell, Mr Alfred    2nd  16   male        0
6  Gavey, Mr Lawrence    2nd  26   male        0
> # Use ORE to pull data from the Database to local R
> # ore.pull  -- United States of America
> mdbv <- ore.get("MINING_DATA_BUILD_V")
> mdbv_data <- ore.pull(mdbv)
Warning message:ORE object has no unique key - using random order 
> head(mdbv_data,3)
  CUST_ID CUST_GENDER AGE CUST_MARITAL_STATUS             COUNTRY_NAME    CUST_INCOME_LEVEL EDUCATION OCCUPATION
1  101501           F  41              NeverM United States of America J: 190,000 - 249,999   Masters      Prof.
2  101502           M  27              NeverM United States of America I: 170,000 - 189,999     Bach.      Sales
3  101503           F  20              NeverM United States of America H: 150,000 - 169,999   HS-grad    Cleric.
  HOUSEHOLD_SIZE YRS_RESIDENCE AFFINITY_CARD BULK_PACK_DISKETTES FLAT_PANEL_MONITOR HOME_THEATER_PACKAGE
1              2             4             0                   1                  1                    1
2              2             3             0                   1                  1                    0
3              2             2             0                   1                  0                    0
  BOOKKEEPING_APPLICATION PRINTER_SUPPLIES Y_BOX_GAMES OS_DOC_SET_KANJI
1                       1                1           0                0
2                       1                1           1                0
3                       1                1           1                0
> class(mdbv_data)
[1] "data.frame”
> summary(mdbv_data)

Thursday, April 30, 2015

Viewing Models Details for Decision Trees using SQL

When you are working with and developing Decision Trees by far the easiest way to visualise these is by using the Oracle Data Miner (ODMr) tool that is part of SQL Developer.
Developing your Decision Tree models using the ODMr allows you to explore the decision tree produced, to drill in on each of the nodes of the tree and to see all the statistics etc that relate to each node and branch of the tree.
But when you are working with the DBMS_DATA_MINING PL/SQL package and with the SQL commands for Oracle Data Mining you don't have the same luxury of the graphical tool that we have in ODMr. For example here is an image of part of a Decision Tree I have and was developed using ODMr.
Blog dt 1
What if we are not using the ODMr tool? In that case you will be using SQL and PL/SQL. When using these you do not have luxury of viewing the Decision Tree.
So what can you see of the Decision Tree? Most of the model details can be used by a variety of functions that can apply the model to your data. I've covered many of these over the years on this blog.
For most of the data mining algorithms there is a PL/SQL function available in the DBMS_DATA_MINING package that allows you to see inside the models to find out the settings, rules, etc. Most of these packages have a name something like GET_MODEL_DETAILS_XXXX, where XXXX is the name of the algorithm. For example GET_MODEL_DETAILS_NB will get the details of a Naive Bayes model. But when you look through the list there doesn't seem to be one for Decision Trees.
Actually there is and it is called GET_MODEL_DETAILS_XML. This function takes one parameter, the name of the Decision Tree model and produces an XML formatted output that contains the attributes used by the model, the overall model settings, then for each node and branch the attributes and the values used and the other statistical measures required for each node/branch.
The following SQL uses this PL/SQL function to get the Decision Tree details for model called CLAS_DT_1_59.
SELECT dbms_data_mining.get_model_details_xml('CLAS_DT_1_59')
FROM dual;

If you are using SQL Developer you will need to double click on the output column and click on the pencil icon to view the full listing.
Blog dt 2
Nothing too fancy like what we get in ODMr, but it is something that we can work with.
If you examine the XML output you will see references to PMML. This refers to the Predictive Model Markup Language (PMML) and this is defined by the Data Mining Group (www.dmg.org). I will discuss the PMML in another blog post and how you can use it with Oracle Data Mining.

Monday, September 15, 2014

Oracle Advanced Analytics sessions at OOW14

With Oracle Open World just a few days away now, I was going through the list of presentations that are focused on using the Oracle Advanced Analytics Option. These will cover Oracle Data Miner and Oracle R Enterprise.

So I've decided to share this list with you :-) and hopefully I will get to see you are some or all of these sessions.

DateTimeLocationPresentation Title
Sunday 28th Sept.9:00-9:45Moscone South Room 304What Are They Thinking? With Oracle Application Express and Oracle Data Miner [UGF2861]. (This is my presentation with Roel Hartman.)
Tuesday 30th Sept.17:00-17:45Intercontinental - Grand Ballroom CAdvanced Predictive Analytics for Database Developers on Oracle [CON7977]
Tuesday 30th Sept.18:00-18:45Moscone South - 303Oracle’s Big Data Management System [MTE9350]
Wednesday 1st Oct.10:15-11:00Moscone South - 301Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631]
Wednesday 1st Oct.10:30-10:50Big Data Theater, Moscone South, Big Data ShowcaseBig Data: Maximize the Business Impact with Oracle Advanced Analytics [THT10395]
Wednesday 1st Oct.11:30-12:15Moscone South - 300A Perfect Storm: Oracle Big Data Science for Enterprise R and SAS Users [CON8331]
Wednesday 1st Oct.12:45-13:30Moscone West - 3002Predictive Analytics with Oracle Data Mining [CON8596]
Wednesday 1st Oct.14:00-14:45Moscone South - 308Developing Relevant Dining Visits with Oracle Advanced Analytics at Olive Garden [CON2898]

If I have missed any sessions then do please let me know and I can update the list above.

Friday, August 8, 2014

my Oracle Data Miner Book

Some of you may be aware that I have been writing a on Oracle Data Miner. Actually the book covers the Oracle Data Miner GUI that is part of SQL Developer, the SQL and PL/SQL functions, procedures and packages that form the Oracle Data Mining option in the database and lots of other topics for the DBA, Developer and BI/DW people.
Today is a bit day for this book as it is officially released and available for purchase. See below for some links to where you can but the book in print and e-book formats. It has been published by McGraw-Hill/Oracle Press.
The book is aimed at a variety of people and the aim of the book is to introduce them to using the Oracle Data Miner tool and how to perform various data mining and predictive analytics tasks using SQL and PL/SQL.
The book will not teach you about how each of the data mining algorithms works. There is a bit of an assumption that you know a bit about these already. There are lots of books and resources about that cover that material. You can look on my book as an getting start / how to use type of book.
Below are are the images of the front cover and the back cover.
Book Cover            Book Back Cover
For more details of the book and for some updates keep an eye on my ODM Book page. On this page I'm adding a FAQ secion. This will be based on questions that I receive about the book.
If you buy the book then I hope you will find it helpful. If you are going to attend one of my presentations at an Oracle User Group meeting then bring the book along and I can sign it for you. Alternatively if you are at Oracle Open World 2014, come along to the Oracle Press Book Store, as I will be there to sign books on Wednesdays 1st October between 13:00 and 13:30.
Where can you Buy my Oracle Data Miner book (print and e-book).
You can buy the book from the McGraw-Hill/Oracle Press website and from Amazon. Each site will offer discounts so check out which one is the best for you.
McGraw-Hill/Oracle Press
For USA locations (enter promo code Tierney to save 20% and free delivery) www.mhprofessional.com
For UK & Ireland locations (enter promo code Tierney to save 20% and free delivery) www.mcgraw-hill.co.uk/tpr
Amazon
Click here to buy it on www.amazom.com
Click here to but it on www.amazon.co.uk

Thursday, July 17, 2014

OTN Latin America (North) Tour 2014

For a few years now I (and I'm sure you have too) have heard about and followed the various Oracle User Group tours that OTN arranges/facilitates. A tour consists of a number of Oracle User Groups in a region coordinating together to have their conferences organised so that they can get speakers from across the world to come and present.

For most presenters it involves lots of travel. So instead of them doing all that travelling to present at one conference, they can now extend their travels a little and present in a number of countries. Most of the speakers are Oracle ACE Directors and OTN is very generous with their support in that they pay for all the flights, transportation and hotels. Without the generous support of OTN these tours and perhaps many of the conference would not take place.

With envy I used to follow the various speakers on tweeter as they talked about their travels from country to country and their experiences of meeting the people and exploring the various countries. Yes their time in each country seemed to be limited but they always got to see and do so much.

Earlier this year there was an call for presentations for the various OTN Tours in 2014. I submitted 3 presentations that coverd Oracle Advanced Analytics Option (Oracle Data Mining and Oracle R Enterprise). I thought I didn't stand a chance given the speakers that have participated in previous years.

A couple of weeks ago I received an email saying that I had been accepted onto the OTN Latin America (North) Tour. So you can imagine my excitement. The full OTN Tour North leg covers a number of countries across central and south America and is over a 2 week period. Unfortunately I'm not able to be away for that long, so I was accepted for the conferences on the first week of the tour. This will include Panama, Costa Rica and Mexico :-)

Some of you might think this is a bit of a golly and a holiday. What I've discovered over the past week or more is that it will be far from that. There is a lot of work in preparing the presentations, giving the presentation, setting up live demos between presentation, various meetings with people at the conferences etc etc etc. Then there is all the travel, all the airports, all the airport transfers, all the overnights in hotels. Over the course of 7 days I will be staying 6 different hotels.

I have spent the last week just trying to arrange my flights and hotels. This also involved trying to coordinate with other speakers so that we can travel together as much as possible.

Here are the dates and the presentations that I will be giving at these conferences:

4th August : Panama (in Panama City)

     10:00-11:00 : Getting Started with Oracle Data Miner & Predictive Analytics

     11:00-12:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

     13:00-13:40 : Sentiment Analysis Using Oracle Data Mining

6th August : Costa Rica (in San Carlos)

     10:00-11:00 : Getting Started with Oracle Data Miner & Predictive Analytics

     13:00-14:00 : Sentiment Analysis Using Oracle Data Mining

     16:00-17:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

8th August : Mexico (in Mexico City)

     14:00-15:00 : Getting Started with Oracle Data Miner & Predictive Analytics

     15:00-16:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

When the agenda for the conferences are available I will have another blog post with their details.

If you are at one of these conference do please say hello :-)

I've finally booked all my flights and hotels. Many thanks to my fellow ACE Director presenters for your research and sharing of travel plans. It looks like there will be a groups of us all travelling together.

Now the next challenge is to prepare the presentations and live demos (yes live demos).

I hope to blog about each of the conferences and my travels to/from each country. It really depends on what time I will have and access to the internet. Perhaps this is something I will try to do on my various plane flights or waiting at the airports. So watch out for these :-)


Updated with some stats on my travels

My travel plans for the OTN Latin America tour of user group conferences involves

  • 12,200 flying miles,
  • 29.75 of flying time,
  • way too many hours hanging around in airports
  • over 8 days
  • staying in 6 hotels
  • plus 1 over night flight,
  • giving 8 hours of presentations in 3 countries

Why do we do this? Because we love sharing with the Oracle User Groups around the world. I'm only doing 1 week of the tour. Some people are doing 2 weeks :-(

Monday, June 2, 2014

ore.parallel

In ORE there are a number ways to get you R scripts to run in parallel in the database. One way is to enable the Parallel option in ORE. This is what will be shown in this post. There are other methods of running various ORE commands/scripts in parallel. With these the scripts are divided out and several parallel R processes are started on the server.

But what if you want to use the database parallel feature on some of your ORE other commands?

Why would you want to do this?

Well the main answer is that you might want to use the parallel option of the database for the creation on objects (tables etc) and for selecting and manipulating the data in the database.

How can you enable your ORE connection to use the in-database parallel feature?

ORE 1.4 has a new option that enables the parallel option for your ORE connection in the database. This option is called ore.parallel.

When you enable or set the ore.parallel option, it seems to be the equivalent of running the following:

ALTER SESSION ENABLE PARALLEL DDL;

ALTER SESSION ENABLE PARALLEL DML;

ALTER SESSION ENABLE PARALLEL QUERY;

The exact details is a little unclear, but it seems to be above commands.

The following commands illustrates some options for using the ore.parallel option.

> #

> # Check to see if the ore.parallel is enabled for your ORE connection

> options("ore.parallel")

$ore.parallel

NULL

The NULL returned value tells us that your ORE connections does not have the Parallel option enabled. If the schema had Parallel enabled by default then we would have have a response of TRUE.

The following command turns on the Parallel option for your ORE connection / schema.

> options("ore.parallel" = TRUE)

> options("ore.parallel")

$ore.parallel

[1] TRUE

When the Parallel option is enabled (TRUE above) the database will use the degree of parallel that is set as default for the schema or the degree of parallel that is defined for the table when it is being used in your ORE commands.

You can changed the degree of parallelism by passing the required degree as a value to the ore.parallel command. In the following, the degree of parallelism is set to 8. We then as ORE what the degree is set to and it tells us that it is 8. So it was set correctly.

> options("ore.parallel" = 8)

> options("ore.parallel")

$ore.parallel

[1] 8

Monday, May 26, 2014

Oracle R Enterprise (ORE) Tasks for the Oracle DBA

In previous posts I gave the steps required to install Oracle R Enterprise on your Database server and your client machine.

One of the steps that I gave was the initial set of Database privileges that the DB needed to give to the RQUSER. The RQUSER is a little bit like the SCOTT/TIGER schema in the Oracle Database. Setting up the RQUSER as part of the installation process allows you to test that you can connect to the database using ORE and that you can issue some ORE commands.

After the initial testing of the ORE install you might consider locking this RQUSER schema or dropping it from the Database.

So when a new ORE user wants access to the database what steps does the DBA have to perform.

  1. Create a new schema for the user
  2. Grant the new schema the standard set of privileges to connect to the DB, create objects, etc.
  3. Create any data sets in their schema
  4. Create any views to data that exists in other schemas (and grant the necessary privileges, etc

Now we get onto the ORE specific privileges. The following are the minimum required for your user to be able to connect to their Oracle schema using ORE.

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

In most cases the first 3 privileges (TABLE, PROCEDURE and VIEW) will be standard for most schemas that you will set up. So in reality the only command or extra privilege that you will need to execute is:

GRANT CREATE MINING MODEL TO RQUSER;

This command will allow the user to connect to their Oracle schema using ORE, but what it will not allow them to do is to create any embedded R. These are R scripts that are stored in the database and can be called in their R/ORE scripts or by using the SQL API to R (I'll have more blog posts on these soon). To allow the user to create and use embedded R the DBA will also have to grant the following privilege as SYS:

GRANT RQADMIN to RQUSER;

To summarise the DBA will have to grant the following to each schema that wants to use the full power of ORE.

GRANT CREATE MINING MODEL TO RQUSER;

GRANT RQADMIN to RQUSER;

A note of Warning: Be careful what schemas you grant the RQADMIN privilege to. It is a powerful privilege and opens the database to the powerful features of R. So using the typical DBA best practice of granting privileges, the DBA should only grant the RQADMIN privilege to only the people who require it.

Tuesday, April 29, 2014

Installing ORE - Part C - Issue installing ORE on Windows Server

In my previous two blog posts (Part-A and Part-B) I detailed 4 steps for how you can install ORE on your servers and on your client machines.

I also mentioned a possible issue you may encounter if you try to install ORE on a Windows server. This blog post will look at this issue and how you can workaround it and get ORE installed.

The problem occurs when I when to install the ORE Supporting packages.

I was prompted to install these into a new library directory. If you get this error message then something is wrong and you should not proceed with installing these packages. If you do proceed and install them in a new library directory then they will not be seen by ORE and the database (as they were not installed in the $ORACLE_HOME/R/library) and when you go to run ORE from within R you will get errors like the following

package ‘Cairo’ successfully unpacked and MD5 sums checked

package ‘DBI’ successfully unpacked and MD5 sums checked

package ‘png’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘png’

package ‘ROracle’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘ROracle’

If I try the ore.connect I get the following errors.

ore.connect(user="RQUSER", sid="orcl", host="localhost", password="RQUSER", port=1521, all=TRUE)

Loading required package: ROracle

Error in .ore.oracleQuerySetup() :

ORACLE connection requires ROracle package

In addition: Warning message:

In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ROracle’


To overcome this ORE install issue all you need to do is to close down your R Gui, then add the following lines to the Rprofile file. The Rprofile file is located in R\etc directory C:\Program Files\R\R-3.0.1\etc. Add the following lines:

# Add $ORACLE_HOME/R/library to .libPaths() for ORE packages

.libPaths("C:/app/oracle/product/11.2.0/dbhome_1/R/library")

The above line will tell R to look in or to include the R directory in the Oracle home as part of its search path. You many need to change the directory above to point to your Oracle home. When you log into the R Gui the path above will be included. Now you can install the packages and then import the packages. This time they will be installed in the $ORACLE_HOME/R/library.

When you open the R Gui and run the command to load the ORE package and to connect to your ORE schema you should not receive any error messages.

> library(ORE)

> ore.connect(user="RQUSER", sid="orcl", host="localhost", password="RQUSER", port=1521, all=TRUE)


Now you should have ORE installed and working on your Windows server.

Thursday, April 24, 2014

Installing ORE - Part B

This is the second part of a two part blog post on installing ORE.

In reality there are 3 blog posts on installing ORE. The third and next blog post will be on a particular issue you might encounter on a Windows server and how you can over come the issue.

In the previous blog post I outlined the steps needed to install ORE on the database server and on the client machine. Click here to go to this post.

In this blog post I will show you how to setup a schema for ORE and how to get connected to the schema using ORE.


Step 3 : Setting up your Schema to use ORE / Tasks for your DBA

On the server when you unzipped the ORE download, you will find a demo_user.bat script (something similar like demo_user.sh on Linux).

After the script has performed some checks, you will be asked do you want to create a demo schema. Enter yes for this task to be completed and the RQUSER schema will be created in your schema. Then enter the password for the RQUSER.

The RQUSER can as a small set of system privileges that allow it to connect to and perform some functions on the database. This include:

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;


NOTE: If you cannot connect to the database using the RQUSER and the password you set, then you might need to also grant CONNECT and RESOURCE to it too.

For every schema that you want to access using ORE you will need to grant the above to them.

In addition to these grants, if you want a schema to be able to create and drop R scripts in the database then you will need to grant them the addition role of RQADMIN.

sqlplus / AS SYSDBA

GRANT RQADMIN to RQUSER;


NB: You will need to grant RQADMIN to an schema where you want to use the embedded ORE in the database.


Step 4 : Connecting to the Database

If you have complete all of the above steps you are now ready to use ORE to connect to your database. The following is an example of the ore.connect command that you can use. It is assuming the RQUSER has the password RQUSER, and the the host is on the local machine (localhost). Replace localhost with the host name of your database server and also change the SID to that of your database.

ore.connect(user="rquser", sid="orcl", host="localhost", password="rquser", port=1521, all=TRUE);

If you get no errors and you get the R prompt back then you are connected to the RQUSER schema in your database.

To test that the connection was made you can run the following ORE command and then list the tables in the schema.

> ore.is.connected()

[1] TRUE

> ore.ls()

character(0)

The output of the last line above tells us that we do not have any tables in our RQUSER schema. I will have more blog posts on how you can use ORE and perform various ORE analytics in future posts.

There are a series of demonstrations that come with ORE. To access these type in the following command which will list the available ORE demos.

> demo(package="ORE")

The following command illustrates how you can run the ORE demo called basic.

> demo(basic, package="ORE")

Also check out the Part C blog post on how to resolve a potential install issue on a Windows server.

Tuesday, April 22, 2014

Installing ORE - Part A

This blog post will look at how you can go about installing ORE in your environment.

The install involves a 4 steps. The first step is the install on the Oracle Database server. The second step involves the install on your client machine. The third steps involves creating a schema for ORE. The fourth steps is connecting to the database using ORE.

In this Part A blog post I will cover the first two steps in this process. The other steps will be coved in another blog post.

NB : A the time of writing this blog post ORE 1.4 cannot be installed on a 12c database if it has a CDB/PDB configuration. If you want to use ORE with 12c then you need to do a traditional install that does not create a CDB with a PDB. The ORE team are working hard on this and I'm sure it will be available in the next release (or two or ...) of ORE.

Step 1 : Installing ORE on the Database Server

Before you being looking at ORE you need to ensure that you have the correct version of database. If you have version 11.2.0.3 or 11.2.0.4 then you can go ahead and perform the installation below. But if you have 11.2.0.1 or 11.2.0.2 then you will need to apply a patch to your database. See my note above about 12c.

Download the Oracle R Distribution from their website. Download here.

Although you can use the standard version of R, Oracle R Distribution comes with some highly tuned packages. If you are going to use the standard R download then you will need to ensure that you download the correct version. ORE 1.4 will require R version 3.0.1. Yes this is not the current version of R.

Accept at the defaults during the installation of ROracle, and within a minute or two ROracle will be installed.

Download the Oracle R Enterprise software. Download here. This will include the Server and Supporting downloads.

Uncompress the downloaded ORE files and go to the server directory. Here you will find the install.bat (other other similar name for your platform).

Make sure your ORACLE_HOME and ORACLE_SID environment variables are set.

A number of environment and environment variables are checked. When prompted accept the defaults.

When prompted for the password for the RQSYS user, enter an appropriate password and take careful note of it.

Now go back to the Oracle download page for ORE and download the supporting packages. Unzip the downloaded file. Noting the directory that they were installed in you can now load them in R. To do this open R and run the following commands. You will need to change the directory to where these are located on your server.

install.packages("C:/app/supporting/ROracle_1.1-11.zip", repos=NULL)

install.packages("C:/app/supporting/DBI_0.2-7.zip", repos=NULL)

install.packages("C:/app/supporting/png_0.1-7.zip", repos=NULL)

install.packages("C:/app/supporting/cairo_1.5-5.zip", repos=NULL)


Or you can use the R Gui to import these packages

WARNING:If you are installing on a Windows server you may encounter some issues when importing these packages. I will have a separate blog post on this soon.

NB: The ORE installation instructions make reference to Cario-_1.5-2.zip. This is incorrect. ORE 1.4 comes with Cario-_1.5-5.zip.

At this point, assuming you didn't have any errors, you now have ORE installed on your server.


Step 2 : Installing ORE on the Client

Download the Oracle R Distribution from their website. Download here.

NOTE: If your database and client are on the one machine then there is no need to install ROracle again.

The client install is much simpler and less involved. After you have installed ROracle the next step is to install the client packages for ORE. These can be downloaded from here.

After you have unzipped the file you can use the import packages from zip feature of the R Gui tool or using RStudio. Then import the supporting packages that you also installed as part of the server install.

Now you can install the supporting packages. Unzip them and then use the R Gui or RStudio to importing them. These supporting packages can be downloaded from here.

That should be the client R software and ORE packages installed on your client machine. The next steps is to test a connection to your Oracle database using ORE. Before you can do that you will need to setup a Schema in the database to use R and also grant the necessary privileges to your other schemas that you want to access using R


Check out my next blog post (Installing ORE - Part B) for Steps 3 and 4.

Also check out the Part C blog post on how to resolve a potential install issue on a Windows server.

Monday, April 14, 2014

Oracle Advanced Analytics and Oracle Fusion Apps

At a recent Oracle User Group conference, I was part of a round table discussion on Apps and BI. Unfortunately most of the questions were focused on Apps and the new Fusion Applications from Oracle. I mentioned that there was data mining functionality (using the Oracle Advanced Analytics Option) built into the Fusion Apps, it seems to come as a surprise to the Apps people. They were not aware of this built in functionality and capabilities. Well Oracle Data Mining and Oracle Advanced Analytics has been built into the following Oracle Fusion Applications.
  • Oracle Fusion HCM Workforce Predictions
  • Oracle Fusion CRM Sales Prediction Engine
  • Oracle Spend Classification
  • Oracle Sales Prospector
  • Oracle Adaptive Access Manager
Oracle Data Mining and Oracle Advanced Applications are also being used in the following applications:
  • Oracle Airline Data Model
  • Oracle Communications Data Model
  • Oracle Retail Data Model
  • Oracle Security Governor for Healthcare
I intend to submit a presentation on this topic to future Oracle User Group conferences as a way of spreading the Advanced Analytics message within the Oracle user community. If you would like me to present on this topic at your conference or SIG drop me an email and we can make the necessary arrangement :-)

Sunday, April 6, 2014

The ORE Packages

If you are interested in using ORE or just to get an idea of what does ORE give you that does not already exist in one of the other R packages then the table below lists the packages that come as part of ORE.

Before you can use then you will need to load these into your workspace. To do this you can issue the following command from the R prompt or from the prompt in RStudio.

> library(ORE)

RStudio is my preferred R interface and is widely used around the world.
ORE Installed Packages Description
ORE Oracle R Enterprise
OREbase ORE - base
OREdm The ORE functions that use the in-database Oracle Data Miner algorithms
OREeda The ORE functions used for exploratory data analysis
OREgraphics The ORE functions used for graphics
OREpredict The ORE functions used for model predictions
OREstats The ORE stats functions
ORExml The ORE functions that convert R objects to XML
DBI R Database Interface
ROracle OCI based Oracle database interface for R
XML Tools for parsing and generating XML within R and S-Plus.
bitops Functions for Bitwise operations
png Read and write PNG images

In addition to these core ORE packages, ORE also uses some R packages as part of the core ORE packages listed above. The following table lists the R packages that are used in the ORE packages. So make sure you have these packages installed. They should have come with your installation of R, but if something has happened then you can download them again.

R Packages used by ORE Description
base The R Base Package
boot Bootstrap Functions (originally by Angelo Canty for S)
class Functions for Classification
cluster Cluster Analysis Extended Rousseeuw et al
codetools Code Analysis Tools for R
compiler The R Compiler Package
datasets The R Datasets Package
foreign Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, dBase, ..
graphics The R Graphics Package
grDevices The R Graphics Devices and Support for Colours and Fonts
grid The Grid Graphics Package
KernSmooth Functions for kernel smoothing for Wand & Jones (1995)
lattice Lattice Graphics
MASS Support Functions and Datasets for Venables and Ripley's MASS
Matrix Sparse and Dense Matrix Classes and Methods
methods Formal Methods and Classes
mgcv GAMs with GCV/AIC/REML smoothness estimation and GAMMs by PQL
nlme Linear and Nonlinear Mixed Effects Models


I've been using R a lot over the past few years and I've had a number of projects involving R particularly over the past 12 month. I just found out that I will now have another short duration R project in May and June.

So watch out for lots more blog posts on R and ORE. Plus the usual blog posts on using Oracle Data Mining. ORE and Oracle Data Mining are very closely linked.

Wednesday, March 26, 2014

Predicting using ORE package

In a previous post I gave a an overview of the various in-database data mining algorithms that you can use in your Oracle R Enterprise scripts.

To create data mining models based on those algorithms you need to use the ore.odm functions.

After you have developed and tested your models you will select one of these to score your new data.

How can you do this using ORE? There is a suite of ORE functions called ore.predict that you can use to apply your data mining model to score or label new data.

The following table lists the ore.predict functions:

ORE Predict Function Description
ore.predict-glm Generalized linear model
ore.predict-kmeans k-Means clustering mode
ore.predict-lm Linear regression model
ore.predict-matrix A matrix with no more than 1000 rows
ore.predict-multinom Multinomial log-linear model
ore.predict-nnet Neural network models
ore.predict-ore.model An Oracle R Enterprise model
ore.predict-prcomp Principal components analysis on a matrix
ore.predict-princomp Principal components analysis on a numeric matrix
ore.predict-rpart Recursive partitioning and regression tree model


As you will see from the above table there are more ore.predict functions than there are ore.odm functions. The reason for this is that ORE comes with some additional data mining algorithms. These are in addition to the sub-set of Oracle Data Mining algorithms that it uses. These include the ore.glm, ore.lm, ore.neural and ore.stepwise.

You also need to watch out for the data mining algorithms that are not used in prediction. These include the Minimum Description Length, Apriori and Non-Negative Matrix Factorization.

Remember that these ore.predict functions are run inside the Oracle Database. No data is extracted to the data analyst laptop or desktop. All the data stays in the database. The ORE functions are run in the database on the data in the database

Thursday, March 20, 2014

Issues with using latest release of ODM

The title of this blog post makes it sound more dramatic than it actually is.

The reason for this blog post is down to me receiving a recent comment on the blog, plus having received numerous emails and a recent OTN Discussion Forum topic for Oracle Data Mining.

The main thing that they have in common is that if I use the latest version of Oracle Data Mining (ODM) it tells me that I need to upgrade my ODM Repository. What impact will this have?

The ODM Repository stores lots of information about the workflows you create using the (free) Oracle Data Mining tool that comes as part of SQL Developer. Yes you do have to pay for the OAA option, so is it really free? Well some part are like the explore node and the graph node.

If you download and want to use the latest version of the ODM tool or you want to try it out before rolling it out to others then you will need to upgrade your ODM repository.

And this the problem that people are facing.

If you upgrade then the ODM Repository it is updated to work with the latest version of the ODM tool. But what happens to everyone else who is using the previous release of the tool? The answer to that is they can no longer use ODM against their database.

Why is that? Well the version of the tool is tied to a version of the Repository. If you upgrade to the newer tool and repository then your older versions of the ODM tool no longer work.

The result of all of this is that you cannot have a mixture of versions of the ODM tool (SQL Developer) being used in your team/company.

There is a very simple solution to all of this. Everyone uses the same version of the ODM tool (i.e. the same version of SQL Developer). For example your team might be using SQL Dev 4 that was released last December. But in early March there was a new patch release 4.1. In order to use this new version of the tool all of your team needs to start using it at the same time. The first person to use it will be prompted to migrate the ODM repository. This is automatically done once you enter the password for SYS.

But in some teams this is not possible to do, you want to try out the tool to see that it works correctly before getting others to use it. The way around this is to have a separate database and use it for your testing. You can easily copy across your workflows and ODM objects to the test database.

This might not be possible for everyone, so what can you do. Create a Virtual Machine and try it out on your own desktop is one way.

The answer to this problem is not ideal, but hopefully you have a better idea of why things are happening this way and what you can or cannot do about it.

Like I said at the topic of this blog post that the title is a bit more dramatic than is really the case :-)


My next blog post will be on another question I've been asked a few times and this is 'When I go to use the ODM tool it tells me that the Oracle Text feature of Oracle needs to be enabled'

Sunday, March 16, 2014

ORE 1.4 New Parallel feature

Oracle R Enterprise (ORE) 1.4 has just been released and can downloaded from here. Remember there is a client and server side install required and ORE 1.4 is certified against R 3.0.1 and the Oracle R Distribution

ORE

One of the interesting new features is the PARALLEL option. You can set this to significantly improve the performance of your R server side code by using the PARALLEL database option. You can set the degree of PARALLEL at a global level in your code by using the ore.parallel setting.

The default setting for this ore.parallel setting is FALSE or 1. Otherwise it must be set to a minimum of 2 of more to enable the Parallel database option.

Alternatively you can set the ore.parallel setting to TRUE to use the default degree of parallelism that is set for the database object or set to NULL to use the default database setting

You will also be able to set the degree of parallel (DOP) using the parallel enabled functions ore.groupApply, ore.rowApply and ore.indexApply.

They have also made available or as they say exposed some more of the in-database Oracle Data Mining algorithms. These include the ODM algorithms for Association rules (ore.odmAssocRules), the feature extraction algorithm called Non-Negative Matrix Factorization (NMF) (ore.odmNMF) and the ODM Clustering algorithm O-Cluster (ore.odmOC)

Watch out of some blog posts on these over the coming weeks.


Check out the OTN page for the R Technologies from Oracle

R

Wednesday, March 12, 2014

ODM: Changing the bar chart format in Explore Node

In Oracle Data Miner you can use the Explore Node to gather an initial set of statistics for your dataset. As part of this you will also get a bar chart that shows the distributions of the values contained within each attribute. The following example shows the default layout of the bar charts. Explore1

These graphs a very useful for presenting the initial data exploration results from to your business users. In addition to these graphs you can also use the Graph node to give some additional graphical representations.

But the default bar chart that is produced by the Explore Node can appear to be a bit basic.

So what if we could change the layout to have a 3-D effect. People like 3-D bar charts.

Is this possible in Oracle Data Miner? If so then how can we do it?

Well it is possible and you can use the following steps to change your bar charts to 3-D.

To access the Explore Node settings go the the Tools menu and then select Preferences from the drop down menu.

Explore2

Then the Preferences window opens scroll down to the Data Miner option and expand the available options.

Explore3

The Explorer Data Viewer allows you to change the Precision settings. The section option is the Graphical Settings. You can change the Depth Radius setting. By default this is set to Zero. By increasing this value you can change the degree of the 3-D effect of the bar charts. You can also change the colour scheme too.

Explore4

I'm not a fan of the other colour schemes that are available and mu favourite is still the default Nautical. The following bar chart is the same as the one at the top of this post but has the 3-D effect.

Explore5

Friday, June 7, 2013

DBMS_PREDICTIVE_ANALYTICS & Predict

In this blog post I will look at the PREDICT procedure that is part of the DBMS_PREDICTIVE_ANALTYICS package. This package allows you to perform data mining in an automated way without having to go through the steps of building, testing and scoring data.

I had a previous blog post that showed how to use the EXPLAIN function to create an Attribute Importance model.

The predictive analytics procedures analyze and prepare the input data, create and test mining models using the input data, and then use the input data for scoring. The results of scoring are returned to the user. The models and supporting objects are not persisted and are removed from the database when the procedure is finished.

The PREDICT procedure should only be used for a Classification problem and data set.

The PREDICT procedure create a model based on the supplied data (out input table) and a target value,  and returns scored data set in a new table. When using PREDICT you do not get to select an algorithm to use.

The input data source should contain records that already have the target value populated.  It can also contain records where you do not have the target value. In this case the PREDICT function will use the records that have a target value to generate the model. This model will then score all records a the predicted target value

The syntax of the PREDICT procedure is:

DBMS_PREDICTIVE_ANALYTICS.PREDICT (
   accuracy OUT NUMBER,
   data_table_name IN VARCHAR2,
   case_id_column_name IN VARCHAR2,
   target_column_name IN VARCHAR2,
   result_table_name IN VARCHAR2,
   data_schema_name IN VARCHAR2 DEFAULT NULL);

Where

Parameter Name Description
accuracy This output parameter from the procedure. You do not pass anything into this parameter. The Accuracy value returned is the predictive confidence of the model generated/used by the PREDICT procedure
data_table_name The name of the table that contains the data you want to use
case_id_column_name The case id for each record. This is unique for each record/case.
target_column_name The name of the column that contains the target column to be predicted
result_table_name The name of the table that will contain the results. This table should not exist in your schema, otherwise an error will occur.
data_schema_name The name of the schema where the table containing the input data is located. This is probably in your current schema, so you can leave this parameter NULL.

The PREDICT procedure will produce an output tables (result_table_name parameter) and will contain 3 attributes.

CASE_ID This is the Case Id of the record from the original data_table_name. This will allow you to link up the data in the source table to the prediction in the result_table_name
PREDICTION This will be the predicted value of the target attribute
PROBABILITY This is the probability of the prediction being correct

Using the sample example data set that I have given in previous blog posts and in the blog post on the EXPLAIN procedure, the following code illustrates how to use the PREDICT procedure.

set serveroutput on

DECLARE
   v_accuracy NUMBER(10,9);
BEGIN
   DBMS_PREDICTIVE_ANALYTICS.PREDICT(
      accuracy => v_accuracy,
      data_table_name => 'mining_data_build_v',
      case_id_column_name => 'cust_id',
      target_column_name => 'affinity_card',
      result_table_name => 'PA_PREDICT');
   DBMS_OUTPUT.PUT_LINE('Accuracy of model = ' || v_accuracy);
END;

image

This took about 15 seconds to run on my laptop, which is surprisingly quick given all the work that is doing internally. To see the predictions and the results from the PREDICT procedure, you will need to query the PA_PREDICT table.

image

The final step that you might be interested in is to compare the original target value with the prediction value.

SELECT v.cust_id,
       v.affinity_card,
       p.prediction,
       p.probability
FROM   mining_data_build_v  v,
       pa_predict p
WHERE  v.cust_id = p.cust_id
AND    rownum <= 12;

image

Remember we do not get to see how or what Oracle did to generate these results. We do not get the opportunity to tune the process and the model.

So you have to be careful when you use the PREDICT function and on what data. Would you use this as a way to explore your data and to see if predictive analytics/data mining might be useful for your? Yes it would. Would you use it in a production scenario? the answer is maybe but it depends on the scenario. In reality if you want to do this in a production environment you will put some work into developing data mining models that best fit your data. To do this you will need to move onto the ODM tool and the DBMS_DATA_MINING package. But the PREDICT function is a quick way to get some small data scored (in some way) based on your existing data. If your marketing department says they want to start a tele marketing campaign in a couple of hours then PREDICT is what you need to use. It may not give you the most accurate of results, but it does give you results that you can start using quickly.

Monday, May 20, 2013

DBMS_PREDICTIVE_ANALYTICS & Explain

There are 2 PL/SQL packages for performing data mining/predictive analytics in Oracle. The main PL/SQL package is DBMS_DATA_MINING. This package allows you to build data mining models and to apply them to new data. But there is another PL/SQL package.

The DBMS_PREDICTIVE_ANALYTICS package is very different to the DBMS_DATA_MINING package. The DBMS_PREDICTIVE_ANALYTICS package includes routines for predictive analytics, an automated form of data mining. With predictive analytics, you do not need to be aware of model building or scoring. All mining activities are handled internally by the predictive analytics procedure.

Predictive analytics routines prepare the data, build a model, score the model, and return the results of model scoring. Before exiting, they delete the model and supporting objects.

The package comes with the following functions: EXPLAIN, PREDICT and PROFILE. To get some of details about these functions we can run the following in SQL.

image

This blog post will look at the EXPLAIN function.

EXPLAIN creates an attribute importance model. Attribute importance uses the Minimum Description Length algorithm to determine the relative importance of attributes in predicting a target value. EXPLAIN returns a list of attributes ranked in relative order of their impact on the prediction. This information is derived from the model details for the attribute importance model.

Attribute importance models are not scored against new data. They simply return information (model details) about the data you provide.

I’ve written two previous blog posts on Attribute Importance. One of these was on how to calculate Attribute Importance using the Oracle Data Miner tool. In the ODM tool it is now called Feature Selection and is part of the Filter Columns node and the Attribute Importance model is not persisted in the database.  The second blog post was how you can create the Attribute Importance using the DBMS_DATA_MINING package.

EXPLAIN ranks attributes in order of influence in explaining a target column.

The syntax of the function is

DBMS_PREDICTIVE_ANALYTICS.EXPLAIN (
data_table_name IN VARCHAR2,
explain_column_name IN VARCHAR2,
result_table_name IN VARCHAR2,
data_schema_name IN VARCHAR2 DEFAULT NULL);


where


data_table_name = Name of input table or view



explain_column_name = Name of column to be explained



result_table_name = Name of table where results are saved. It creates a new table in your schema.



data_schema_name = Name of schema where the input table or view resides. Default: the current schema.



So when calling the function you do not have to include the last parameter.



Using the same example what I have given in the previous blog posts (see about for the links to these) the following command can be run to generate the Attribute Importance.



BEGIN

    DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(


        data_table_name      => 'mining_data_build_v',


        explain_column_name  => 'affinity_card',


        result_table_name    => 'PA_EXPLAIN');


END;



One thing that stands out is that it is a bit slower to run than the DBMS_DATA_MINING method. On my laptop it took approx. twice to three time longer to run. But in total it was less than a minute.



To display the results,



image



The results are ranked in a 0 to 1 range. Any attribute that had a negative value are set to zero.

Friday, May 10, 2013

Getting Real Business Value from Oracle Data Mining and OBIEE

Over the past 16 months (or so) I have give a join presentation with Anthony Heljula called ‘Getting Real Business Value from Oracle Data Mining and OBIEE’, at a number of conferences and OUG SIGs.

We have had a lot of very positive feedback on this presentation. The presentation is a busy 45 minutes (questions only at the end) that walks through a pilot data science project we did for a University in the UK.

We used Oracle Data Miner to build a predictive model that looks at student churn. We then integrated this Student Churn model into OBIEE Dashboards to illustrate how combining an Oracle Data Miner model into our data analysis we can gain a greater insight of our data.

image

We have submitted this presentation for Oracle Open World 2013 but we have renamed the title of the presentation to

“How UK Universities are using Oracle Data Science to protect their income”

If you are involved in presentation selection or know someone who is then maybe you might select this to be presented at OOW13 in September.

We submitted the presentation for OOW12 with not luck. So fingers crossed this time.