Brendan Tierney - Oralytics Blog: Irish Oracle SIG

Showing posts with label Irish Oracle SIG. Show all posts

Monday, November 21, 2011

Applying an ODM Model to new data in Oracle – Part 1

This is the first of a two part blog posting on using an Oracle Data Mining model to apply it to or score new data. This first part looks at the how you can score data using the DBMS_DATA_MINING.APPLY procedure in a batch type process.

The second part will be posted in a couple of days and will look how you can apply or score the new data, using our ODM model, in a real-time mode, scoring a single record at a time.

DBMS_DATA_MINING.APPLY

Instead of applying the model to data as it is captured, you may need to apply a model to a large number of records at the same time. To perform this bulk processing we can use the APPLY procedure that is part of the DBMS_DATA_MINING package. The format of the procedure is

DBMS_DATA_MINING.APPLY (
      model_name           IN VARCHAR2,
      data_table_name      IN VARCHAR2,
      case_id_column_name IN VARCHAR2,
      result_table_name    IN VARCHAR2,
      data_schema_name     IN VARCHAR2 DEFAULT NULL);

Parameter Name	Description
Model_Name	The name of your data mining model
Data_Table_Name	The source data for the model. This can be a tree or view.
Case_Id_Column_Name	The attribute that give uniqueness for each record. This could be the Primary Key or if the PK contains more than one column then a new attribute is needed
Result_Table_Name	The name of the table where the results will be stored
Data_Schema_Name	The schema name for the source data

The main condition for applying the model is that the source table (DATA_TABLE_NAME) needs to have the same structure as the table that was used when creating the model.

Also the data needs to be prepossessed in the same way as the training data to ensure that the data in each attribute/feature has the same formatting.

When you use the APPLY procedure it does not update the original data/table, but creates a new table (RESULT_TABLE_NAME) with a structure that is dependent on what the underlying DM algorithm is. The following gives the Result Table description for the main DM algorithms:

For a Classification algorithms

case_id VARCHAR2/NUMBER
prediction NUMBER / VARCHAR2 -- depending a target data type
probability NUMBER

For Regression

case_id VARCHAR2/NUMBER
prediction NUMBER

For Clustering

case_id VARCHAR2/NUMBER
cluster_id NUMBER
probability NUMBER

Example / Case Study

My last few blog posts on ODM have covered most of the APIs for building and transferring models. We will be using the same data set in these posts. The following code uses the same data and models to illustrate how we can use the DBMS_DATA_MINING.APPLY procedure to perform a bulk scoring of data.

In my previous post we used the EXPORT and IMPORT procedures to move a model from one database (Test) to another database (Production). The following examples uses the model in Production to score new data. I have setup a sample of data (NEW_DATA_TO_SCORE) from the SH schema using the same set of attributes as was used to create the model (MINING_DATA_BUILD_V). This data set contains 1500 records.

SQL> desc NEW_DATA_TO_SCORE
Name                                 Null?    Type
------------------------------------ -------- ------------
CUST_ID                              NOT NULL NUMBER
CUST_GENDER                          NOT NULL CHAR(1)
AGE                                           NUMBER
CUST_MARITAL_STATUS                           VARCHAR2(20)
COUNTRY_NAME                         NOT NULL VARCHAR2(40)
CUST_INCOME_LEVEL                             VARCHAR2(30)
EDUCATION                                     VARCHAR2(21)
OCCUPATION                                    VARCHAR2(21)
HOUSEHOLD_SIZE                                VARCHAR2(21)
YRS_RESIDENCE                                 NUMBER
AFFINITY_CARD                                 NUMBER(10)
BULK_PACK_DISKETTES                           NUMBER(10)
FLAT_PANEL_MONITOR                            NUMBER(10)
HOME_THEATER_PACKAGE                          NUMBER(10)
BOOKKEEPING_APPLICATION                       NUMBER(10)
PRINTER_SUPPLIES                              NUMBER(10)
Y_BOX_GAMES                                   NUMBER(10)
OS_DOC_SET_KANJI                              NUMBER(10)

SQL> select count(*) from new_data_to_score;

COUNT(*)
----------
1500

The next step is to run the the DBMS_DATA_MINING.APPLY procedure. The parameters that we need to feed into this procedure are

Parameter Name	Description
Model_Name	CLAS_DECISION_TREE -- we imported this model from our test database
Data_Table_Name	NEW_DATA_TO_SCORE
Case_Id_Column_Name	CUST_ID -- this is the PK
Result_Table_Name	NEW_DATA_SCORED -- new table that will be created that contains the Prediction and Probability.

The NEW_DATA_SCORED table will contain 2 records for each record in the source data (NEW_DATA_TO_SCORE). For each record in NEW_DATA_TO_SCORE we will have one record for the each of the Target Values (O or 1) and the probability for each target value. So for our NEW_DATA_TO_SCORE, which contains 1,500 records, we will get 3,000 records in the NEW_DATA_SCORED table.

To apply the model to the new data we run:

BEGIN
dbms_data_mining.apply(
model_name => 'CLAS_DECISION_TREE',
data_table_name => 'NEW_DATA_TO_SCORE',
case_id_column_name => 'CUST_ID',
result_table_name => 'NEW_DATA_SCORED');
END;
/

This takes 1 second to run on my laptop, so this apply/scoring of new data is really quick.

The new table NEW_DATA_SCORED has the following description

SQL> desc NEW_DATA_SCORED
Name                            Null?    Type
------------------------------- -------- -------
CUST_ID                         NOT NULL NUMBER
PREDICTION                               NUMBER
PROBABILITY                              NUMBER

SQL> select count(*) from NEW_DATA_SCORED;

COUNT(*)
----------
3000

We can now look at the prediction and the probabilities

SQL> select * from NEW_DATA_SCORED where rownum <=12;

   CUST_ID PREDICTION PROBABILITY
---------- ---------- -----------
    103001          0           1
    103001          1           0
    103002          0 .956521739
    103002          1 .043478261
    103003          0 .673387097
    103003          1 .326612903
    103004          0 .673387097
    103004          1 .326612903
    103005          1 .767241379
    103005          0 .232758621
    103006          0           1
    103006          1           0

12 rows selected.

Thursday, November 17, 2011

Call for Presentations : OUG Ireland Conference 2012

The call for presentations for the annual Oracle User Group Ireland conference has been posted in last few days.
The conference is planned for March 2012 and the venue will be picked over the next few weeks.
I’m on organising committee this year. It is hoped to have a number of parallel streams covering core Database Technology, BI (&EPM), Development (including Fusion).
If you are interested in presenting a short presentation of approx. 45 minutes (including time for questions), then you will need to submit your Topic and Abstract using the following link : www.oug.org/Irelandpapers
The conference is not limited to presenters from Ireland and it is hoped to get a number of well known Oracle experts and Oracle ACEs to come to Dublin for the day.
What kind of topics are of interest. Well pretty much anything Oracle. We have all come across something interesting in our jobs that we could share, be it using a particular technique, new features, sharing experiences, best practices, product demos, etc
I’ve already submitted a presentation on Oracle Data Miner.
There is a Twitter hash tag for the Oracle Conference #oug_ire2012. So add this to your Twitter tool to follow developments and announcements about the conference.
If you have any question about the conference drop me a email.

Wednesday, November 16, 2011

My UKOUG Conference 2011 Schedule

The UKOUG conference will be in a couple of weeks. I have my flights and hotel booked, and I’ve just finished selecting my agenda of presentations. I really enjoy this conference as it serves many purposes including, finding new directions Oracle is taking, new product features, some upskilling/training, confirming that the approaches that I have been using on projects are valid, getting lots of hints and tips, etc.

One thing that I always try to do and I strongly everyone (in particular first timers) to do is to go to 1 session everyday that is on a topic or product that you know (nearly) nothing about. You might discover that you know more than you think or you may learn something new that can be feed into some project on your return or over the next 12 months.

My agenda for the conference currently looks Very busy and in between these session, there is the exhibition hall, meetings with old and new friends, meetings with product/business unit managers, asking people to write articles for Oracle Scene, checking out possible presenters to come to Ireland for our conference in March 2012, etc. Then there is my presentation on the Wednesday afternoon.

Sunday

I’ll miss most of the Oak Table event on the Sunday but I hope to make it in time for

16:40-17:30 : Performance & High Availability Panel Session

Monday

9:20-9:50 : Keynote by Mark Sunday, Oracle (H1)
10:00-10:45 : The Future of BI & Oracle roadmap, Mike Durran, Oracle (H5)
11:05-12:05 : Implementing Interactive Maps with OBIEE 11g, Antony Heljula, Peak Indicators (H10A)
12:15-13:15 : OBI 11g Analysis & Reporting New Features, Mark Rittman (8A)
14:30-15:15 : Master Data Management – What is it & how to make it work – Robert Barnett, Hub Solutions Designs (H10A)
16:20-17:35 : Dummies Guide to Oracle ADF, Grant Ronald, Oracle, (Media Suite)
16:35-18:30 : The DB Time Performance Method, Graham Wood, Oracle (H8A)
17:45-18:30 : Performance & Stability with Oracle 11g SQL Plan Management, Doug Burns (H1)
17:45-18:30 : Experiences in Virtualization, Michael Doherty (H10A)
19:45-20:45 : Exhibition Welcome Drinks
20:45-Late : Focus Pubs

Tuesday

9:00-11:00 : Next Generation BI Architectures Masterclass, Andrew Bond, Oracle (H10B)
10:10-10:55 : Who’s afraid of Analytic Functions, Alex Nuijten, Maxima (H5)
11:15-12:15 : Analysing Your Data with Analytic Functions, Carl Dudley, (H9)
11:25-13:25 : Using a Physical Standby to Minimize Downtime for DB Release or Server Change, Michael Abbey, Pythian (Media Suite)
14:40-15:25 : How note to make the headlines, Mark Clewett, Hitachi (H10A)
14:40-15:25 : APEX Back to Basics, Paul Broughton, APEX Evangelists (H9)
15:35-16:20 : Can People be identified in the database, Pete Finnigan (H1)
16:40-18:35 : OTN Hands-on Workshop, Todd Trichler, Oracle (H8A)
17:50-18:35 : SQL Developer Data Modeler as a replacement for Oracle Designer, Paul Bainbridge, Fujitsu, (H8B)
18:45-19:45 : Keynote : Future of Enterprise Software and Oracle, Ray Wang, Constellation Research (H1)
20:00-Late : Evening Social & Networking

Wednesday

9:00-10:00 : Oracle 11g Database: Automatic Parallelism, Joel Goodman, Oracle (H9)
9:00-10:00 : Big Data: Learn how to predict the future, Keith Laker, Oracle (H8B)
10:10-10:55 : All about indexes – What to index, when and how, Mark Bobak, ProQuest (H5)
11:20-12:30 : Using Application Express to Build Highly Accessible Products, Anthony Rayner, Oracle (H8A)
12:30-13:30 : Practical uses for APEX Dictionary, John Scott, APEX Evangelists (H8A)
15:20-16:05 : How to deploy you Oracle Data Miner 11g R2 Workflows in a Live Environment – Me (H7B)
16:15-17:00 : Next Generation Data Warehousing, Kulvinder Hari, Oracle (H8A)
16:15-17:00 : Beyond RTFM and WTF Message Moments. Introducing a new standard: Oracle Fusion Applications User Assistance, Ultan O’Broin (Executive Room 7)

I know I have some overlapping sessions, but I will decide on the date which of these I will attend.

As you an see I will be following the BI stream mainly, with a few sessions on the Database and Development streams too.

This year there is a smart phone app help us organise our agenda, meetings, etc, The only downside is that the app does not import the agenda that I created on the website. So I have to do it again. Maybe for next year they will have an import agenda feature.

New UKOUG mobile app – Launched October 2011

Wednesday, November 2, 2011

Tom Kyte Seminar Day–Dublin

On Wednesday 2nd December, I attended a full day of presentations given by Tom Kyte of Oracle (asktom.oracle.com). Tom covered a number of topics and these included some of his Oracle Open World presentations.

The topics that were covered included

5 things about SQL (OOW11)
Database Option Packs
5 things about PL/SQL (OOW11)
Q&A Ask Tom Session

All of these presentations can be downloaded from Tom’s website www.asktom.com.

Tom wont be presenting at the annual UKOUG conference in December, but he is hoping to be there next year (2012).

Wednesday, October 5, 2011

Oracle Events in Ireland (Q4 2011)

Over the coming months (Q4 2011) there are a number of Oracle related events being run in Ireland. The schedule for these is below with the relevant links to the agenda webpages or to where you can book your place.

The OUG BI SIG meetings you can book your place with the UKOUG.
Venue Address - Dublin:
Oracle Block H, East Point Business Park, Dublin 3
Venue Address - Belfast:
The Mount Conference Center, 2 Woodstock Link, Belfast BT6 8DD
For questions about logistics please contact the marketing team on marketing-ie_ie@oracle.com
If you have any question about the content please contact: mina.sagha.zadeh@oracle.com
If you know of any other events that are not listed, let me know and I’ll update the list

Monday, September 19, 2011

Oak Table Event in Dec 2011

This year the Annual UKOUG event will be in Birmingham (again) from the 5th to 7th December.

This year there is a slight difference to the usual schedule. On Sunday 4th December there is an Oak Table event, with two parallel tracks. It has all the well known experts presenting at this event.

If I had the time turner from Harry Potter, I would be able to go to all the sessions.

Presenters include Morgens Norgaard, Johnathan Lewis, Frits Hoogland, Martin Widlake, Christian Antognini, Connor McDonald, James Morle and Wolfgang Breitling.

This is an impressive line up and hopefully the UKOUG will run a similar event in 2012.

Check out of the full agenda at

http://2011.ukoug.org/personalisedagenda

This is one event that I would love to go to but unfortunately I wont be able to make it. I’ll be attending the Annual UKOUG conference alright, and I have already booked my airline tickets. But there are no flights from Dublin that will get me to Birmingham on time. I would need to fly to Birmingham on the Saturday, involving another hotel night and another night away from the family.

The best I’m hoping for is to get to the ICC in time for the Panel Session on Performance and High Availability.

Depending on weather and travel delays I might even miss this last session. If I do, I can always meet up with everyone in the pub on the Sunday evening for a chat.

Maybe next year.

Monday, August 8, 2011

Oracle Scene–Next Submission Date 26th Aug

The Winter edition of the UKOUG Oracle Scene magazine is now looking for articles to be submitted for consideration.

The due date for the article submission is Friday 26 August. So you have just over 2 weeks to put together your article.

Lots of people have asked me what kind of articles are you looking for. The simple answer is anything as long as it is Oracle related. The following list should give you some ideas:

Technical article
Application article
Business articles
Tool article
SIG meetings, new, updates and plans
New features
Something that you discovered
Your likes, dislikes or anything relating to Oracle
Oracle Book reviews
Oracle Conference reviews
etc

So you can see anything goes really.

How long should an article be ? It can be any length really. Anything from 1/4 page to a full 5 page article, and anything in between.

Selection of Articles Process

All submitted articles are assessed by a review panel, comprised of volunteers from a variety of businesses and specialties. The review panel rates the articles and makes comments where appropriate.

An editorial meeting takes place after the submissions have been rated. The articles are assessed and the review panel’s scoring and comments are taken into account. The editorial team makes the final decision as to which articles will be selected for publication, or to be held over for a future edition. You will be notified of the result as soon as this process has been completed.

You will be contacted near the publication date by the publishing company for you to review to print version of your article.

Submitting your Article

Check out the Article Formatting Guidelines before submitting.

All pictures and images should be 300dpi.

Include a 100(max) word Bio and your photo

Email your article and images to

articles@ukoug.org.uk

Brendan Tierney

Deputy Editor

Monday, July 18, 2011

VirtaThon Presentation

Today I gave my VirtaThon presentation on the new Oracle Data Miner 11gR2 tool.

It was an interesting experience as VirtaThon was a virtual conference. The organisation and administration of the conference was excellent.

I had over 25 participants for my presentation, including Carolyn Hamm who has written a book on using Oracle Data Miner 10g. She seemed to enjoy my presentation as she was asking for more at the end, but we had run out of time.

The presentation was an unusual but interesting experience. All the participants were muted, so I could not hear anyone or be asked questions as the presentation progressed. I was not able to judge the body language or facial expressions, for me to work out how the presentation was going.

I was sitting in my living room when giving the presentation and spent almost an hour talking to myself. At time the concentration levels dipped and I have to refocus and used some visualisation to help me concentrate.

The presentation was divided into 2 parts. The first part was a presentation consisting of some background to ODM, how to get setup and running with ODM, and finally a discussion of some of the new features. This first part took approx. 30 minutes which surprised me as during my rehearsals it was talking 16 minutes. The second part of the presentation was a demo of using ODM to create a workflow, generating a classification model and then applying this model to some new data. During my rehearsals this was taking approx. 40 minutes.

I only had 50-55 minutes for my VirtaThon presentation so after my presentation I had 20-25 minutes for the demo. So I had to get through the demo quickly and I had to cut out a discussion of how the data exploration functionality in ODM can be used to get an insight into the data before you start using the data mining features. I will put together a blog post and video of this in a couple of weeks time that will explain it in more detail.

I managed to finish at 49 minutes, which left 6 minutes for questions. There was only a couple of questions, plenty of Thank You’s along with Good Presentation, which is always good to hear.

Thank you to everyone who attended the presentation and to the organisers of VirtaThon.

Brendan Tierney

Friday, June 24, 2011

Irish BI SIG–23rd June 2011

On Thursday 23rd June the Irish BI SIG had a networking event on the Mv Cillairne boat. This is a former training boat that has been converted into a restaurant and bar. The boat is moored beside the new convention centre on the quays in Dublin near the O2.

There was a good turn out, with a mixture of people from An Post, ICON, Vertice, Fijitsu and some independent consultants/contractors.

There was a few fee drinks and some food provided by the UKOUG. Many thanks for these.

There was lots of sharing of what BI related projects people have worked. There was some discussions of how the SIG can progress in the future, with the consensus that people will need to be willing to present on their projects and experiences.

Tony Cassidy, the SIG Chair, is hoping to get a few volunteers to present at the next SIG or maybe have another social networking event.

I also did my bit for the Oracle Scene magazine, in asking people would they write an article (even if it is a short one) for the next edition. I’ve recently joined the editorial team of Oracle Scene.

Thursday, June 2, 2011

Irish Oracle BI SIG Meeting - 23rd June

The next Irish Oracle BI SIG meeting will be on Thursday 23rd June starting at 6:30pm.

The format of this SIG meeting is a bit different from the previous ones.

This time the SIG meeting will be an informal networking event and there will be no demos or presentations.

The SIG event will be in the River View Bistro Bar, which is on the the MV Cillairne boat, that is moored beside the new convenion center on the quays. Check out its website
http://www.mvcillairne.com/

Pages