Brendan Tierney - Oralytics Blog

Monday, February 25, 2013

Review of Oracle Magazine–Jan/Feb 1999

The headline articles for the Jan/Feb 1999 edition of Oracle Magazine were all on the Year 2000 issue (Y2K). There was lots of work for the consultancy companies around the work on this. Most of the work was not very interesting but was vital to ensure that our applications continued to work.

Tuesday, February 19, 2013

Clustering in Oracle Data Miner–Part 2

This is a the second part of a five (5) part blog post on building and using Clustering in Oracle Data Miner. The following outlines the contents of each post in this series on Clustering.

The first part we looked at what clustering features exist in ODM and how to setup the data that we will be using in the examples
The second part will focus on how to building Clusters in ODM .
The third post will focus on examining the clusters produced by ODM and how to use the Clusters to apply to new data using ODM.
The fourth post will look at how you can build and evaluate a Clustering model using the ODM SQL and PL/SQL functions.
The fifth and final post will look at how you can apply your Clustering model to new data using the ODM SQL and PL/SQL functions.

With Clustering we are trying to find hidden patterns in our data. Unlike classification we a not directing the algorithms on what areas/attributes to focus on.

In our scenario we want to look to see what distinct groupings or Segments that our Customer data naturally fit into. For each of these Segment Oracle Data Miner will tell us what attributes and the values of these attributes that determine if a customer belongs to one segment or another.

Step 1 - Define the Data Source

The first step involves us creating a Data Source Node for the table that we created and loaded in the previous blog post. We called this table INSURANCE_CUST_LTV.

To create the Data Source Node go to the Component Palette. Under the Data tab you will find the Data Source option. Click on this and then go to the workflow worksheet and click. The Data Source node will be created and the wizard to specify the name of the table/view will open. Select INSURANCE_CUST_LTV from the list.

Click on the Next and then the Finish button to take in all the attribute.

Our data is now read to use.

Step 2 – Explore the Data

We can use the Explore Node to gather some statistics on the data and to produce some graphs.

To create the Explore Node, go to the Component Palette and under the Data tab you will find the Explore Data node. Click on this and then click again on the workflow worksheet, near the Data node.

You need to connect the Data node to the Explore Data node. Move your mouse to the Data node. Right-click this node and select Connect from the drop down menu. Then more the mouse to the Explore Data node and click on it. You will now have an arrowed line joining these two nodes

The next step we need to do is to right click on the Explore Data node and select Run from the drop down menu. ODM will go off to the database and gather various statistics and create a number of graphs based on the data in the table.

NB. If you click on the Explore Data node and then look in the Property Inspector you will see that ODM will take a sample of 2,000 records to produce the statistics and graphs. If you would like ODM to use all the records then you need to click the ‘Use All Data’ check box. Or you can change the sample size.

For your initial data investigation you might use the default of sampling 2,000 records before you increase the size of the sample.

In scenarios like this you may want to explore the data in more detail and to look at how the data is distributed in relation to certain attributes. In our data we have an attribute called LTV_BIN. In this attribute we have four values including, Very High, High, Medium and Low.

In our scenario, it might be more interesting to explore the data based on this attribute and it’s values. To do this we need to tell the Explore Data node to group the data analysis based on the values in this attribute.

Double-click the Explore Data node. In the Group By drop down select LTV_BIN. Click the OK button. You are now ready to run the Explore Data Node. To do this, right click on Explore Data node and select Run from the drop down list.

To view the statistics gathered and the graphs produced on the default sample of 2,000 records, right click the Explore Data node and select View Data from the drop down menu. You will get a new tab/window opening in SQL Developer with all the results.

This kind of data analysis only works with an attribute that has a low number of possible values.

Step 3 – Defining the data we will used to Build our Cluster models

We are going to divide the data in our CUST_INSURANCE_LTV into two data sets. The first data set will be used to build the Cluster models. The second data set will be used as part of the Apply node in my next blog post (part 3).

To divide the data we are going to use the Sample Node that can be found under the Transformation tab of the Component Palette.

Create your first Sample Node. In the Settings tab of the Property Inspector set the sample size to 60% and in the Details tab rename the node to Sample Build.

Create a second Sample node and give it a sample size of 40%. Rename this node to Sample Apply.

Right click on each of these Sample nodes to run them and have them ready for the next step of building the Clustering models.

Step 4 – Creating the Clustering Build Node

When you have finished exploring the data you are now ready to move on to creating the Clustering models. As ODM has two clustering algorithms, ODM will default to creating two Clustering models.

To create the Clustering models, go to the Component Palette. Under the Models tab, select Clustering.

Move the mouse to the workflow worksheet, near the Sample Build node and click the worksheet. The Clustering node will be created. Now we need to connect the data with the Clustering node. To do this right click on the Sample Build node and select Connect from the drop down list. Then move the mouse to the Clustering node and click. An arrowed line will be created connecting the two nodes.

At this point we can run the Clustering Build node or we can have a look at the setting for each algorithm.

Step 5 – The Clustering Algorithm settings

To setup the Cluster Build node you will need to double click on the node to open the properties window. The first thing that you need to do is to specify the Case ID (i.e. the primary key). In our example this is the CUSTOMER_ID.

Oracle Data Miner has two clustering algorithms. The first of these is the well know k-Means (it is an enhanced version of it) and the O-Cluster. To look at the settings for each algorithm, click on the model listed under Model Settings and then click on the Advanced button.

A new window will open that lists all the attributes for the in the data source. The CUSTOMER_ID is unchecked as we said that this was the CASE_ID.

Click on the Algorithm Settings tab to see the internal settings for the k-means algorithm. All of these settings have a default value. Oracle has worked out what the optimal setting are for you. The main setting that you might want to play with is the Number of Clusters to build. The default is 10, but you might want to play with numbers between 5 and 15 depending on the number of clusters or segments you want to see in your data.

To view the algorithm settings for O-Cluster click on this under the Model Setting. We have less internal settings to worry about here, but we again can determine how many clusters we want to produce.

For our scenario we are going to take the default settings.

Step 6 – Run/Generate the Clustering models

At this stage we have the data set-up, the Cluster Build node created and the algorithm setting all set to what we want.

Now we are ready to run the Cluster Build node.

To do this, right click on the Cluster Build node and click run. ODM will go create a job that will contain PL/SQL code that will generate a cluster model based on K-Means and a second cluster model based on O-Cluster. This job will be submitted to the database and when it is completed we will get the little green tick mark on the top right hand corner of the Cluster Build node.

In the next blog post we will look at how to examine what clusters were produced by ODM and how we can take one of these and apply them to new data.

Wednesday, February 13, 2013

Clustering in ODM–Part 1

This is a the first part of a five (5) part blog post on building and using Clustering in Oracle Data Miner. The following outlines the contents of each post in this series on Clustering.

This post part we will look at what clustering features exist in ODM and how to setup the data that we will be using in the examples
The second part will focus on how to building Clusters and examining the clusters produced in ODM .
The third post will focus on using the Clusters to apply to new data using ODM.
The fourth post will look at how you can build and evaluate a Clustering model using the ODM SQL and PL/SQL functions.
The fifth and final post will look at how you can apply your Clustering model to new data using the ODM SQL and PL/SQL functions.

Clustering is an unsupervised technique designed groupings of related data that are more similar to each other and are less similar to other groups. Typically clustering is used in customer segmentation analysis to try an better understand what type of customers you have.

Like with all data mining techniques, Clustering will not tell you or give you some magic insight into your data. Instead it gives you more information for you to interpret and add the business meaning to them. With Clustering you can explore the data that forms each cluster to understand what it really means.

The Clusters give by Oracle Data Miner are just patterns that it has found in the data.

Oracle has two Clustering algorithms:

K-Means : Oracle Data Miner runs an enhanced version of the typical k-means algorithm. ODM builds models in a hierarchical manner, using a top-down approach with binary splits and refinements of all nodes at the end. The centroid of the inner nodes in the hierarchy are updated to reflect changes as the tree grows. The tree grows one node at a time. The node wit the largest variance is split to increase the size of the tree until the desired number of clusters is reached.

O-Cluster : O-Cluster is an Orthogonal Partitioning Clustering that creates a hierarchical grid based clustering model. It operates recursively, generating a hierarchical structure. The resulting clusters define dense areas.

The Data Set for out Clustering examples

I’m going to use a data set that is available on OTN (somewhere) and has been used for demos in the prior versions of ODM before the 11gR2 version (SQL Developer 3). It has gone by many names but the table name we care going to use is INSURANCE_CUST_LTV.

The file is in CSV format and we will use the Import feature in SQL Developer to import it.

1. In the schema you are using for Oracle Data Miner, right click Tables in the Connections tab. The Import option will appear on the menu. Select this.

2. Go to the directory where you saved the file, select it and then click on the Open button.

3. You need to set the file Format to be ‘Delimited’ and the Delimiter set to ‘|’

4. In the next step give the table name as INSURANCE_CUST_LTV

5.In the next step Select all the Attributes. It should default to this. Click next.

6. In Step 4 of the Wizard you can set the data types for each attribute. The simplest way is to set the character attributes to VARCHAR2 (50) :

CUSTOMER_ID, LAST, FIRST, STATE, REGION, SEX, PROFESSION, BUY_INSURANCE (set this one to 3), MARITAL_STATUS, LTV_BIN

Set all the number attributes (all the others) to NUMBER without any precision or scale.

7. Click the next button and then the finish button. SQL Developer will now load 15,342 records into the INSURANCE_CUST_LTV table, with no errors (hopefully!)

We are now ready to start our work with the Clustering algorithms in ODM.

In the next blog post we will look at exploring the data, building our Clustering models and examining the clusters that were produced by ODM.

Monday, February 11, 2013

Access Control List (ACL) in Oracle

Over the past couple of weeks I have been looking at some PL/SQL code that allows me to post messages to my twitter account (@brendantierney). You may have seen these. They looked something link the following.

I found some scripts that does all the work for me at the SomeCodingHero blog. Although the blog had most of the code needed, it did need some corrections and changes necessary for my twitter customer details. I will have some blog posts on these over the coming weeks.

Before you can setup and use these scripts, you need to have setup and configured your database so that you are allowing the database to access websites outside of database. To do this you need to setup what is called the Access Control List. This allows you to setup fine grained access to external services. If you do not do this then you will get the error:

You need to setup the ACL if you are going to use any of the following UTL_STMP, UTL_TCP, UTL_MAIL, UTL_HTTP and UTL_INADDR. For the PL/SQL code to post the messages to Twitter we will be using the UTL_HTTP package.

OK, so here are the steps that I went through to get my ACL setup so that I can send twitter posts using PL/SQL in my Oracle 11.2.0.3 database running on my Dell Windows 7 laptop.

1. To setup the ACL you need to log into the database as SYS or get your DBA to set this up for you.

2. Create the ACL

BEGIN

DBMS_NETWORK_ACL_ADMIN.create_acl (

acl => 'Twitter.xmll',

description => 'ACL for Twitter',

principal => 'Twitter', -- schema name

is_grant => TRUE,

privilege => 'connect',

start_date => SYSTIMESTAMP,

end_date => NULL);

COMMIT;

END;

3. Now you can what websites you are going to allow access to. In our case we want to access Twitter and the Twitter API.

BEGIN
   DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL (
      acl => 'Twitter.xmll',
      host => 'twitter.com',
      lower_port => 80,
      upper_port => 80);

   DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL (
      acl => 'Twitter.xmll',
      host => 'api.twitter.com',
      lower_port => 80,
      upper_port => 80);
END;

4. Instead of having two ASSIGN_ACL statements I could have just used one like the follow, but I only found this out after I had done the above.

BEGIN
   DBMS_NETWORK_ACL_ADMIN.ASSIGN_ACL (
      acl => 'Twitter.xmll',
      host => '*.twitter.com',
      lower_port => 80,
      upper_port => 80);
END;

5. I was then ready to start running the PL/SQL scripts to get my Twitter setup and running to start posting tweets

Resources

Technical Note on using ACL by Tim Hall

Don Seiler (Pythian) Blog Post

Oracle Documentation – Fine Grain Access Control to External Services

Oracle Documentation – PL/SQL Package DBMS_NETWORK_ACL_ADMIN

Friday, February 8, 2013

What Conferences I’m speaking at soon!

OUG Norway 17th to 19th April

Thursday, February 7, 2013

OUG Ireland 2013 Agenda is now live

The agenda for the OUG Ireland 2013 event is now live. The event will be in the Dublin Convention Centre on the 12th March. There are lots of excellent sessions, across 7 tracks!! So there will be something (or lots of things) for everyone who works in the Oracle world here in Ireland.

I’m sure the Oracle Database track will be very popular. I wonder why!!!

Agenda : http://www.ukoug.org/2013-events/oug-ireland-2013/agenda/

Remember registration is FREE. You don’t have to be a member of the User Group to come to this event. It is open to everyone and did I mention that it is FREE. Registration is now open.

I’ll be there. Well I suppose I have to as I’ll be presenting Smile

I hope to see you there.

Wednesday, February 6, 2013

Oracle ACE Director

Towards the end of last week I received and email from Oracle saying that I had been nominated and accepted, by Oracle, to become an Oracle ACE Director.

This is something that makes me very proud and honours the work I have been doing over the past few years on Data Mining in Oracle (Advanced Analytics Option, Data Science/Predictive Analaytics, or whatever you want to call it). Thank you to everyone who nominated me.

If you are not familiar with the Oracle ACE Program, it is a way for Oracle to acknowledge not only technical skills but also personal engagement with the Oracle Community and Technology overall. There is even a FAQ that explains how this program works.

There are a few perks that come with the title, and Oracle have a few expectations too. Most of these expectations I’m already doing!! What I’m looking forward to later this years is my first Oracle ACE Director briefing at Oracle Open World (22-26 September)

Wednesday, January 30, 2013

Oracle Magazine-Nov/Dec. 1998

The headline articles for the Nov/Dec 1998 edition of Oracle Magazine were on building web based applications and thin client computing. A large part of the magazine was dedicated to these topics. This was a bumper edition with a total of 152 pages of content.

Monday, January 28, 2013

Rexer Analytics 2013 Data Miner Survey

Rexer Analytics has been conducting the Data Miner Survey since 2007. Each survey explores the analytic behaviors, views and preferences of data miners and analytic professionals. Over 1300 people from around the globe participated in the 2011 survey. Summary reports (PDFs of about 40 pages) from previous surveys are available FREE to everyone who requests them by emailing us at DataMinerSurvey@RexerAnalytics.com. Also, highlights of earlier Data Miner Surveys are available online, including best practices shared by respondents on analytic success measurement, overcoming data mining challenges, and other topics. The FREE Summary Report for this 2013 Data Miner Survey will be available to everyone in the fall of 2013.

Your survey responses are completely confidential. This research is not being conducted on behalf of any third party, but is solely for Rexer Analytics to disseminate the findings throughout the data mining and analytics community.

To participate, please click on the link below, then click on the “Start Survey” link on the bottom of the webpage. Please enter the access code in the space provided. The survey should take approximately 15-20 minutes to complete. Anyone who has had this email forwarded to them should use the access code in the forwarded email.

Survey Link: www.RexerAnalytics.com/Data-Miner-Survey-2013-Intro.html

Access Code: UL3X7

Friday, January 25, 2013

OUG Norway Agenda is now live

The OUG Norway spring conference (17th April – 19th April) agenda is now live and is open for registrations.

Click here for the Conference Agenda

Click here for the Conference Registration

This is a 3 day conference. The first day (17th April) will be held in the Radisson BLU Scandinavia ( Holbergsplass ) and the next two (and a bit) days will be on the Color Magic boat that will be travelling between Oslo and Kiel in Germany and back to Oslo. The boat will be arriving back in Oslo on the Saturday morning (20th April).

There will be some presentations in Norwegian, but it looks like most of the presentations will be in English. There will also be some well known names from the Oracle world presenting at this conference.

In addition to these people, I will be giving two presentations on using Predictive Analytics in Oracle using the Oracle Data Miner tool and in-database functionality.

My first presentation will be an overview of the advanced analytics option and a demonstration of what you can do using the Oracle Data Miner tool (part of SQL Developer). This presentation is currently scheduled for Thursday (18th April) at 5pm.

My second presentation will be at 9:30am on the Friday morning (19th April). In this presentation we will look at the in-database features, what can we do in SQL and PL/SQL, and we will look at what you need to do deploy you Oracle Data Mining models in a production environment.

If possible we might be able to review some new 12c new features for Oracle Data Miner Smile

Tuesday, January 22, 2013

Agenda Planning for OUG Ireland Annual event 2013

Over the past week there has been a number of meetings by the organising committee of the Annual OUG Ireland event, to arrange the agenda for the 2013 event. This will be held on the 12th March in the DCC (same as last year)

We have had a large number of submitted presentations from experts from around the world. The choices we have had to make were very difficult.

The agenda is almost complete. A few finishing touches and we should have it all sorted out by the end of the week.

This year we will have the largest conference/annual event yet. There will be a good mixture of presentations from Oracle, Customers, Partners, Oracle ACE’s and other people (with an interesting story to tell) from around the world.

At the moment it looks like we will have tracks on

Oracle Database
Tech
Development
Fusion
EBS
Product & JDE
BI & EPM (this may have 2 parallel tracks)

When the agenda is available and live, I put up a new post.

Registration is now live and by reports a lot of people have already registered.

OUG Ireland Annual event – online Registrations

In previous years this event used to be called the Annual OUG Ireland Conference, but the Conference part has been dropped. I’ll try to explain it over a drink some time.

Monday, January 21, 2013

Oracle Magazine-Sept/Oct 1998

The headline articles for the Sept/Oct1998 edition of Oracle Magazine were on all on how to build and deploy intranets within an organisation, using Oracle products. There were a few case studies illustrating the benefits that using intranets can bring to an organisation

Pages