Brendan Tierney - Oralytics Blog

Sunday, December 18, 2011

Recent Wood Carvings

I’ve managed to get enough time over the past couple of days to finish some wood carvings that I started a couple of months ago.

An Angel for the Christmas Tree (beech)

A name plate for the house (beech)

A Sun face for the shed door (Ash)

Tuesday, December 13, 2011

Mark Townsend, Database Product Manager at Oracle gave a presentation on Big Data at the UKOUG conference and used the following videos to illustrate how a company can evolve their Big Data into useful and meaningful information.

Big Data – The Challenge

Big Data – Gold Mine or just Stuff

Big Data – Big Data Speaks

Big Data – Everything You Always Wanted to Know

Big Data – Little Data

Monday, December 12, 2011

My UKOUG Presentation on ODM PL/SQL API

On Wednesday 7th Dec I gave my presentation at the UKOUG conference in Birmingham. The main topic of the presentation was on using the Oracle Data Miner PL/SQL API to implement a model in a production environment.

There was a good turn out considering it was the afternoon of the last day of the conference.

I asked the attendees about their experience of using the current and previous versions of the Oracle Data Mining tool. Only one of the attendees had used the pre 11g R2 version of the tool.

From my discussions with the attendees, it looks like they would have preferred an introduction/overview type presentation of the new ODM tool. I had submitted a presentation on this, but sadly it was not accepted. Not enough people had voted for it.

For for next year, I will submit an introduction/overview presentation again, but I need more people to vote for it. So watch out for the vote stage next June and vote of it.

Here are the links to the presentation and the demo scripts (which I didn’t get time to run)

My Presentation

Demo Script 1 – Exploring and Exporting model

Demo Script 2 – Import, Dropping and Renaming the model. Plus Queries that use the model

Monday, December 5, 2011

Ireland table at Focus Pub tonight

Today (Monday 5th Dec) is the first day of the UKOUG Conference in Birmingham.

Tonight we have the Focus Pubs session starting at 8:45pm. This year we have a Ireland table for all of the Irish people at the conference to gather at and to meet.

I’ll be there so drop along and say hello.

Friday, December 2, 2011

I’m an Oracle ACE

At 5:20pm today (Friday 2nd December), I received an email from the Oracle ACE program. I had been nominated for the award of Oracle ACE.

“You have been chosen based on your significant contribution and activity in the Oracle technical community. Like your fellow Oracle ACEs, you have demonstrated a proficiency in Oracle technology as well as a willingness to share your knowledge and experiences with the community.”

I am so honoured, considering the experts from around the world that are members of the Oracle ACE program.

The Oracle ACE Award is issued by the Oracle Corporation and the award is made to people who are know for their strong credentials in the Oracle community as enthusiasts, advocates and technical knowledge.

Thursday, December 1, 2011

Oracle Big Data & Analytics Sessions at UKOUG Conference

There are a number of BIG Data and Analytics presentations at the UKOUG Conference in Birmingham (4th Dec – 7th Dec).

I’ve worked my way through the agenda grids for each day of the conference and I’ve come up with the following list. If you are interested in BIG Data and Analytics these presentations are a must see

Monday	12:15-13:15	Exadata Live – Graham Wood – Hall 7A
5th Dec
Tuesday	9:00-10:00	Big Data-Are you ready – Mark Townsend – Hall 1
6th Dec	10:10-10:55	Who’s afraid of Analytic Functions – Alex Nuijten – Hall 5
	11:15-12:15	Analysing Your Data with Analytic Functions – Carl Dudley – Hall 9
	16:40-17:40	Mobile Analytics using OBIEE 11g – Jon Mead – Exec Room 1
Wednesday	9:00-10:00	Oracle 11g Database Automatic Parallelism – Joel Goodman – Hall 9
7th Dec	15:20-16:05	How to Deploy your Oracle Data Miner in a Live Environment - me

Monday, November 28, 2011

Exalytics Events over the next week

The BIWA SIG is hosting a techcast called “Using Oracle R Enterprise” on Wednesday 30th November, 2011 at noon EST (approx 6pm GMT).

The TechCast is being presented by Mark Hornick, Senior Manager, Oracle Advanced Analytics Development

URL for TechCast: https://stbeehive.oracle.com/bconf/confDetails?confID=334B:3BF0:owch:38893C00F42F38A1E0404498C8A6612B0004075AECF7&guest=true&confKey=608880
-- Web Conference ID: 303397
-- Web Conference Key: 608880
-- Dialup: 1-866-682-4770, ID 5548204, passcode 1234

Several analytic tool vendors have added R-integration to their software. However, Oracle is the largest company to throw their weight behind R. On October 3, Oracle unveiled their integration of R: Oracle R Enterprise (http://www.oracle.com/us/corporate/features/features-oracle-r-enterprise-498732.html) as part of their Oracle Big Data Appliance announcement (http://www.oracle.com/us/corporate/press/512001).

Oracle R Enterprise allows users to perform statistical analysis with advanced visualization on data stored in Oracle Database. Oracle R Enterprise enables scalable R solutions, while facilitating production deployment of R scripts and Hadoop based solutions, as well as integration of R results with Oracle BI Publisher and OBIEE dashboards.

Check out the Oracle YouTube video (5min), that demos how an Exalytics application that can analyse almost a billion records instantly.

If you are attending the UKOUG Conference in Birmingham, Jon Mead (RittmanMead) is giving a presentation called “What can Exalytics do for me?” and is on Tuesday 5th December @15:35, in the area above the box office.

Thursday, November 24, 2011

Applying an ODM Model to new data in Oracle – Part 2

This is the second of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. The first part looked at how you can score data the DBMS_DATA_MINING.APPLY procedure for scoring data batch type process.

This second part looks at how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.

PREDICTION Function

The PREDICTION SQL function can be used in many different ways. The following examples illustrate the main ways of using it. Again we will be using the same data set with data in our (NEW_DATA_TO_SCORE) table.

The syntax of the function is

PREDICTION ( model_name, USING attribute_list);

Example 1 – Real-time Prediction Calculation

In this example we will select a record and calculate its predicted value. The function will return the predicted value with the highest probability

SELECT cust_id, prediction(clas_decision_tree using *)
FROM NEW_DATA_TO_SCORE
WHERE cust_id = 103001;

CUST_ID PREDICTION(CLAS_DECISION_TREEUSING*)
---------- ------------------------------------
103001 0

So a predicted class value is 0 (zero) and this has a higher probability than a class value of 1.

We can compare and check this results with the result that was produced using the DBMS_DATA_MINING.APPLY function (see previous blog post).

SQL> select * from new_data_scored
2 where cust_id = 103001;

   CUST_ID PREDICTION PROBABILITY
---------- ---------- -----------
    103001          0           1
    103001          1           0

Here we can see that the class value of 0 has a probability of 1 (100%) and the class value of 1 has a probability of 0 (0%).

Example 2 – Selecting top 10 Customers with Class value of 1

For this we are selecting from our NEW_DATA_TO_SCORE table. We want to find the records that have a class value of 1 and has the highest probability. We only want to return the first 10 of these

SELECT cust_id
FROM NEW_DATA_TO_SCORE
WHERE PREDICTION(clas_decision_tree using *) = 1
AND rownum <=10;

   CUST_ID
----------
    103005
    103007
    103010
    103014
    103016
    103018
    103020
    103029
    103031
    103036

Example 3 – Selecting records based on Prediction value and Probability

For this example we want to find our from what Countries do the customer come from where the Prediction is 0 (wont take up offer) and the Probability of this occurring being 1 (100%). This example introduces the PREDICTION_PROBABILITY function. This function allows use to use the probability strength of the prediction.

select country_name, count(*)
from new_data_to_score
where prediction(clas_decision_tree using *) = 0
and prediction_probability (clas_decision_tree using *) = 1
group by country_name
order by count(*) asc;

COUNTRY_NAME                               COUNT(*)
---------------------------------------- ----------
Brazil                                            1
China                                             1
Saudi Arabia                                      1
Australia                                         1
Turkey                                            1
New Zealand                                       1
Italy                                             5
Argentina                                        12
United States of America                        293

The examples that I have give above are only the basic examples of using the PREDICTION function. There are a number of other uses that include using the PREDICTION_COST, PREDICTION_SET, PREDICTION_DETAILS. Examples of these will be covered in a later blog post

Tuesday, November 22, 2011

Oracle Ireland: Data Centre Transformation Event 7th December

Oracle in Ireland is hosting a session called Data Centre Transformation on 7th December (9:30-13:00), in the Guinness Storehouse, St James Gate, Dublin 8.

The agenda for this session is

9:00	Registration & Coffee
10:00	The 21st Century Data Centre, Delivered by Oracle Solaris – Mike Ramchand
10:30	Oracle Enterprise Manager 12c – John Caulfield, Solutions Director
11:00	Oracle Virtualised Systems (VM 3.0) – Dave Patterson, Oracle Hardware
11:30	Coffee Break
12:00	Transformative Oracle Storage Solutions – Neil Caughey, Oracle Storage Business Unit
12:30	Extreme Performance with Oracle Exadata and Exalogic – Brian Grant, Oracle Exalogic Business Development Manager

To book your place on this event email oracle.events@ketchumpleon.com

Or register by following this web link.

I wont be at this event as I’ll be presenting in the afternoon at the UKOUG conference in Birmingham.

Monday, November 21, 2011

Applying an ODM Model to new data in Oracle – Part 1

This is the first of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. This first part looks at the how you can score data using the DBMS_DATA_MINING.APPLY procedure in a batch type process.

The second part will be posted in a couple of days and will look how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.

DBMS_DATA_MINING.APPLY

Instead of applying the model to data as it is captured, you may need to apply a model to a large number of records at the same time. To perform this bulk processing we can use the APPLY procedure that is part of the DBMS_DATA_MINING package. The format of the procedure is

DBMS_DATA_MINING.APPLY (
      model_name           IN VARCHAR2,
      data_table_name      IN VARCHAR2,
      case_id_column_name IN VARCHAR2,
      result_table_name    IN VARCHAR2,
      data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameter Name	Description
Model_Name	The name of your data mining model
Data_Table_Name	The source data for the model. This can be a tree or view.
Case_Id_Column_Name	The attribute that give uniqueness for each record. This could be the Primary Key or if the PK contains more than one column then a new attribute is needed
Result_Table_Name	The name of the table where the results will be stored
Data_Schema_Name	The schema name for the source data

The main condition for applying the model is that the source table (DATA_TABLE_NAME) needs to have the same structure as the table that was used when creating the model.

Also the data needs to be prepossessed in the same way as the training data to ensure that the data in each attribute/feature has the same formatting.

When you use the APPLY procedure it does not update the original data/table, but creates a new table (RESULT_TABLE_NAME) with a structure that is dependent on what the underlying DM algorithm is. The following gives the Result Table description for the main DM algorithms:

For a Classification algorithms

case_id VARCHAR2/NUMBER
prediction NUMBER / VARCHAR2 -- depending a target data type
probability NUMBER

For Regression

case_id VARCHAR2/NUMBER
prediction NUMBER

For Clustering

case_id VARCHAR2/NUMBER
cluster_id NUMBER
probability NUMBER

Example / Case Study

My last few blog posts on ODM have covered most of the APIs for building and transferring models. We will be using the same data set in these posts. The following code uses the same data and models to illustrate how we can use the DBMS_DATA_MINING.APPLY procedure to perform a bulk scoring of data.

In my previous post we used the EXPORT and IMPORT procedures to move a model from one database (Test) to another database (Production). The following examples uses the model in Production to score new data. I have setup a sample of data (NEW_DATA_TO_SCORE) from the SH schema using the same set of attributes as was used to create the model (MINING_DATA_BUILD_V). This data set contains 1500 records.

SQL> desc NEW_DATA_TO_SCORE
Name                                 Null?    Type
------------------------------------ -------- ------------
CUST_ID                              NOT NULL NUMBER
CUST_GENDER                          NOT NULL CHAR(1)
AGE                                           NUMBER
CUST_MARITAL_STATUS                           VARCHAR2(20)
COUNTRY_NAME                         NOT NULL VARCHAR2(40)
CUST_INCOME_LEVEL                             VARCHAR2(30)
EDUCATION                                     VARCHAR2(21)
OCCUPATION                                    VARCHAR2(21)
HOUSEHOLD_SIZE                                VARCHAR2(21)
YRS_RESIDENCE                                 NUMBER
AFFINITY_CARD                                 NUMBER(10)
BULK_PACK_DISKETTES                           NUMBER(10)
FLAT_PANEL_MONITOR                            NUMBER(10)
HOME_THEATER_PACKAGE                          NUMBER(10)
BOOKKEEPING_APPLICATION                       NUMBER(10)
PRINTER_SUPPLIES                              NUMBER(10)
Y_BOX_GAMES                                   NUMBER(10)
OS_DOC_SET_KANJI                              NUMBER(10)

SQL> select count(*) from new_data_to_score;

COUNT(*)
----------
1500

The next step is to run the the DBMS_DATA_MINING.APPLY procedure. The parameters that we need to feed into this procedure are

Parameter Name	Description
Model_Name	CLAS_DECISION_TREE -- we imported this model from our test database
Data_Table_Name	NEW_DATA_TO_SCORE
Case_Id_Column_Name	CUST_ID -- this is the PK
Result_Table_Name	NEW_DATA_SCORED -- new table that will be created that contains the Prediction and Probability.

The NEW_DATA_SCORED table will contain 2 records for each record in the source data (NEW_DATA_TO_SCORE). For each record in NEW_DATA_TO_SCORE we will have one record for the each of the Target Values (O or 1) and the probability for each target value. So for our NEW_DATA_TO_SCORE, which contains 1,500 records, we will get 3,000 records in the NEW_DATA_SCORED table.

To apply the model to the new data we run:

BEGIN
dbms_data_mining.apply(
model_name => 'CLAS_DECISION_TREE',
data_table_name => 'NEW_DATA_TO_SCORE',
case_id_column_name => 'CUST_ID',
result_table_name => 'NEW_DATA_SCORED');
END;
/

This takes 1 second to run on my laptop, so this apply/scoring of new data is really quick.

The new table NEW_DATA_SCORED has the following description

SQL> desc NEW_DATA_SCORED
Name                            Null?    Type
------------------------------- -------- -------
CUST_ID                         NOT NULL NUMBER
PREDICTION                               NUMBER
PROBABILITY                              NUMBER

SQL> select count(*) from NEW_DATA_SCORED;

COUNT(*)
----------
3000

We can now look at the prediction and the probabilities

SQL> select * from NEW_DATA_SCORED where rownum <=12;

   CUST_ID PREDICTION PROBABILITY
---------- ---------- -----------
    103001          0           1
    103001          1           0
    103002          0 .956521739
    103002          1 .043478261
    103003          0 .673387097
    103003          1 .326612903
    103004          0 .673387097
    103004          1 .326612903
    103005          1 .767241379
    103005          0 .232758621
    103006          0           1
    103006          1           0

12 rows selected.

How Many Sleeps to Santa

select
to_date('25/12/2011','DD/MM/YYYY') - trunc(sysdate) "How Many Sleep to Santa"
from dual;

How Many Sleep to Santa
-----------------------
34

Thursday, November 17, 2011

Call for Presentations : OUG Ireland Conference 2012

The call for presentations for the annual Oracle User Group Ireland conference has been posted in last few days.
The conference is planned for March 2012 and the venue will be picked over the next few weeks.
I’m on organising committee this year. It is hoped to have a number of parallel streams covering core Database Technology, BI (&EPM), Development (including Fusion).
If you are interested in presenting a short presentation of approx. 45 minutes (including time for questions), then you will need to submit your Topic and Abstract using the following link : www.oug.org/Irelandpapers
The conference is not limited to presenters from Ireland and it is hoped to get a number of well known Oracle experts and Oracle ACEs to come to Dublin for the day.
What kind of topics are of interest. Well pretty much anything Oracle. We have all come across something interesting in our jobs that we could share, be it using a particular technique, new features, sharing experiences, best practices, product demos, etc
I’ve already submitted a presentation on Oracle Data Miner.
There is a Twitter hash tag for the Oracle Conference #oug_ire2012. So add this to your Twitter tool to follow developments and announcements about the conference.
If you have any question about the conference drop me a email.