Friday, February 10, 2012

ODM–Attribute Importance using PL/SQL API

In a previous blog post I explained what attribute importance is and how it can be used in the Oracle Data Miner tool (click here to see blog post).

In this post I want to show you how to perform the same task using the ODM PL/SQL API.

The ODM tool makes extensive use of the Automatic Data Preparation (ADP) function. ADP performs some data transformations such as binning, normalization and outlier treatment of the data based on the requirements of each of the data mining algorithms. In addition to these transformations we can specify our own transformations.  We do this by creating a setting tables which will contain the settings and transformations we can the data mining algorithm to perform on the data.

ADP is automatically turned on when using the ODM tool in SQL Developer. This is not the case when using the ODM PL/SQL API. So before we can run the Attribute Importance function we need to turn on ADP.

Step 1 – Create the setting table

CREATE TABLE Att_Import_Mode_Settings (
  setting_name  VARCHAR2(30),
  setting_value VARCHAR2(30));

Step 2 – Turn on Automatic Data Preparation

BEGIN
   INSERT INTO Att_Import_Mode_Settings (setting_name, setting_value)
   VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
  COMMIT;
END;

Step 3 – Run Attribute Importance

BEGIN
  DBMS_DATA_MINING.CREATE_MODEL(
    model_name => 'Attribute_Importance_Test',
    mining_function  => DBMS_DATA_MINING.ATTRIBUTE_IMPORTANCE,
    data_table_name  > 'mining_data_build_v',
    case_id_column_name => 'cust_id',
    target_column_name  => 'affinity_card',
    settings_table_name => 'Att_Import_Mode_Settings');
END;

Step 4 – Select Attribute Importance results

SELECT *
FROM TABLE(DBMS_DATA_MINING.GET_MODEL_DETAILS_AI('Attribute_Importance_Test'))
ORDER BY RANK;

ATTRIBUTE_NAME       IMPORTANCE_VALUE       RANK
-------------------- ---------------- ----------
HOUSEHOLD_SIZE             .158945397          1
CUST_MARITAL_STATUS        .158165841          2
YRS_RESIDENCE              .094052102          3
EDUCATION                  .086260794          4
AGE                        .084903512          5
OCCUPATION                 .075209339          6
Y_BOX_GAMES                .063039952          7
HOME_THEATER_PACKAGE       .056458722          8
CUST_GENDER                .035264741          9
BOOKKEEPING_APPLICAT       .019204751         10
ION

CUST_INCOME_LEVEL                   0         11
BULK_PACK_DISKETTES                 0         11
OS_DOC_SET_KANJI                    0         11
PRINTER_SUPPLIES                    0         11
COUNTRY_NAME                        0         11
FLAT_PANEL_MONITOR                  0         11

Thursday, February 9, 2012

What has Oracle done to R to give us ORE

Oracle R Enterprise (ORE) was officially launched over the past couple of days and it has been receiving a lot of interest in the press.

We now have the Oracle Advanced Analytics (OAA) option which comprises, the already existing, Oracle Data Mining and now Oracle R Enterprise. In addition to the Oracle Advanced Analytics option we also 2 free set of tools available to use to use. The first of these free tools are the statistical functions which are available in all versions of the Oracle Database and the second free tool is the Oracle Data Miner tool that is part of the newly released SQL Developer 3.1 (7th Feb).

What has Oracle done to Oracle to make Oracle R Enterprise ?

The one of the main challenges with using R is that it is memory constrained, resulting in the amount of data that it can process. So the ORE development team have worked ensuring R can work transparently with data within the database. This removes the need extract the data from the database before it can be used by R. We still get all the advanced on in-Database Data Mining.

They have also embedded R functions within the database, so we an run R code on data within the database. By having these functions with the database, this allows R to use the database parallelism and so we get quicker execution of our code. Most R implementation are constrained to being able to process dataset containing 100Ks of records. With ORE we can now process 10M+ records

In addition to the ORE functions and algorithms that are embedded in the database we can also use the R code to call the suite of data mining algorithms that already exist as part of Oracle Data Miner.

For more details of what Oracle R Enterprise is all about check out the following links.

Oracle Advanced Analytics Options website

ORE Webpage

ORE Blog

ORE Download

ORE Forum

Wednesday, February 8, 2012

Oracle Magazine-Fall 1992

I’ve been collecting Oracle Magazine for almost 20 years now. I have almost the entire collection, but I’m still missing some of the editions. Some people have donated some of the editions I was missing. But I still missing some.  Can you help me ? Check out my Oracle Magazine Collection.

Every 2 to 3 weeks I intend to write a blog post on each of the Oracle Magazines that I have, starting with the earliest edition that is from Fall 1992.

image

The main theme of the Fall 1992 edition was about the new Oracle 7 Database. I didn’t get using an Oracle 7 Database until mid-1994.There are a few customer case studies of Oracle 7 implementations. In the article Migrating to Oracle 7, they list some of the new feature. The following is an extract from the new features section.

Standard Oracle 7 provides the functionality and performance to tackle most mission critical applications:

- Multithreaded server
- Shared SQL
- Cost Based optimiser
- Row level locking
- Declarative integrity
- Role based security
- 100% ANSI/ISO standard SQL
- Enhanced national language support

The procedural option for Oracle 7 provides additional capabilities that enhance the database server in the application environment:

- Stored Procedures and Functions
- Procedure packages
- Database Triggers
- Lock manager package
- Database alerts

The distributed option for Oracle 7 makes a physically distributed database appear as a single, logical database. Features of this option include:

- Distributed updates
- TP monitor (XA) interface
- Transparent two-phase commit
- Remote procedure calls
- Table replication (snapshots)
- Oracle Mail Interface

Another article that stands out is by Richard Barker (do you remember his?) who was responsible for the CASE*Method and Oracle’s CASE Tools.

There was several articles on the new Oracle Forms 3 and Oracle Menu 5. Talking about some of the new features like List of Values (LOVs), pop-up Field editor and Pop-up Pages.

There also also the first articles on using Oracle a Microsoft Windows environment. Oh how I remember the frequent blue screens when developing and compiling my forms and in particular my report, with does early releases on Windows.

 

The editorial staff of Oracle Magazine have kindly given me permission to make a PDF of the front cover and the table of contents available for each edition. To get this PDF click on the above image or follow this link to see what Oracle Magazine used to look like 20 years ago.

My next blog post on Oracle Magazine, will look at the Winter 1993 edition.

If you have any of the editions that I’m missing from my collection and you would like to donate then, then drop me an email and we can arrange delivery.  You wont see any of them on eBay, I promise.

Friday, February 3, 2012

ODM 11gR2–Attribute Importance

I had a previous blog post on Data Exploration using Oracle Data Miner 11gR2. This blog post builds on the steps illustrated in that blog post.

After we have explored the data we can identity some attributes/features that have just one value or mainly one value, etc.  In most of these cases we know that these attributes will not contribute to the model build process.

In our example data set we have a small number of attributes. So it is easy to work through the data and get a good understanding of some of the underlying information that exists in the data. Some of these were pointed out in my previous blog post.

The reality is that our data sets can have a large number of attributes/features. So it will be very difficult or nearly impossible to work through all of these to get a good understanding of what is a good attribute to use, and keep in our data set, or what attribute does not contribute and should be removed from the data set.

Plus as our data evolves over time, the importance of the attributes will evolve with some becoming less important and some becoming more important.

The Attribute Importance node in Oracle Data Miner allows use to automate this work for us and can save us many hours or even days, in our work on this task.

The Attribute Importance node using the Minimum Description Length algorithm.

The following steps, builds on our work in my previous post, and shows how we can perform Attribute Importance on our data.

1. In the Component Palette, select Filter Columns from the Transforms list

2. Click on the workflow beside the data node.

3. Link the Data Node to the Filter Columns node. Righ-click on the data node, select Connect, move the mouse to the Filter Columns node and click. the link will be created

image

4. Now we can configure the Attribute Importance settings.Click on the Filter Columns node. In the Property Inspector, click on the Filters tab.

- Click on the Attribute Importance Checkbox

- Set the Target Attribute from the drop down list. In our data set this is Affinity Card

5. Right click the Filter Columns node and select Run from the menu

After everything has run, we get the little green box with the tick mark on the Filter Column node. To view the results we right clicking on the Filter Columns node and select View Data from the menu. We get the list of attributes listed in order of importance and their Importance measure.

image

We see that there are a number of attributes that have a zero value. It algorithm has worked out that these attributes would not be used in the model build step. If we look back to the previous blog post, some of the attributes we identified in it have also been listed here with a zero value.

Wednesday, February 1, 2012

Oracle Ireland Security Seminar–Dublin 8th March 2012

Oracle Ireland will be hosting an security event on Thursday 8th March, between 9:30-13:30, in their East Point offices.

The seminar is titled Defending Against Hackers and Hacking.

There will be a couple of demos what show some of the simple and not so simple techniques that hack and compromise your systems. This can lead to loss of confidential information.

To register for the event go to

http://www.oracle.com/webapps/events/ns/EventsDetail.jsp?p_eventId=143168&src=7302155&src=7302155&Act=58

Wednesday, January 25, 2012

Update on Exalytics Pricing

In my previous blog post (Exalytics : How much will it cost me ?) I gave an outline of the pricing you might expect for an Exalytics machine.

The final pricing that I gave of approx $3+M was based on the per processor licencing.

Yesterday (24th Jan) the Oracle Business Intelligence blog by Manan, included the pricing based on the per user licences.

The following is a breakdown of the Exalytics pricing based on the minimum 100 user licencing.

Licence Costs (100 users)

Exalytics machine = $135,000

TimesTen = $300 x 100 users = $30,000

BI Foundation Suite = $3,675 x 100 users = $367,500

Giving a grand total of $532,500.

Support Costs (100 users)

But we need to add the annual support costs to this.

Exalytics machine support = $29,700.

TimesTen support = $66 x 100 users = $6,600

BI Foundations suite = $809 x 100 users = $80,900

Total support costs (100 users) = $116,500

First year & on-going costs costs

Total first year cost for an Exalytics machine = $532,500 + $117,200 = $649,700

Plus on going annual support costs of $117,200 in year 2 and subsequent years.

Discounted Costs

If you are one of the lucky customer who can If I use the same discounts, as I did in my previous blog post, of 25% discount on hardware and 60% discount on the software, we get:

Year 1 cost of : ($135,000*0.75) + ($397,500*0.40) = $260,250

So it might be possible to get an Exalytics machine for $260+K, plus annual support costs.

Monday, January 23, 2012

Exalytics : How much will it cost me ?

Over the past couple of weeks the costing for the Oracle Exalytics machine has been made public by Oracle and there has been a number of articles. What I’ve done in this blog post is to collate this information. I give what I understand to be the cost of purchasing an Exalytic machine and to get setup and running.

The pricing structure starts at

Exalytics machine + cost of BI Foundation Suite + TimesTen licences

Exalytics machine = $135,000

TimesTen = $34,500 per processor licence or $300 per named user(min 100 users)

BI Foundation Suite = $450,000 per processor licence or $3,675 per named user (same number of users as for TimesTen = min 100 users)

Annual Support Costs

Exalytics machine = $29,700

TimesTen = 22% of software licence – $7,590 per processor licence or $66 per named user (min 100 users)

BI Foundation Suite = $99,000 per processor licence or $809 per named user(min 100 users)

The Exalytics machine consists of a single server with 1TB of RAM and 4 Intel Xeon E7-4800 processors, with 10 cores each.

 

So the total cost of an Exalytics machine based on the processor licence will be something towards the $10M. Now this is before the discounts that you can negotiate. There are reports of discounts ranging up to 25% on hardware and 60% on software. The size of the discount is depended on your size etc. So this initial $10M cost could be reduced to $3M+.

 

Please note that I may have gotten some or all of this pricing wrong. If I have then forgive me and let me know what is wrong. I can correct it to ensure that we have the correct costs.

Friday, January 20, 2012

OUG Ireland 2012 Conference & Re-launching the SIGs in Ireland

The schedule of presentations for the 2012 Oracle User Group Ireland conference has just been released.

This year the conference will be in the Dublin Convention Centre.

There are 5 parallel streams with sessions running all day. Key notes presentations will be given by Eileen O’Mara from Oracle Ireland (given in the morning) and the key note at the end of the day will be given by Mogen Norgaard from Miracle AS in Denmark and founder of the OakTable Network.

There will be 7 Oracle ACE Directors and 3 Oracle ACEs presenting at the conference in addition to other speakers from Ireland, UK and Netherlands. Here is the full list of Oracle ACEs that are presenting

Debra Lilley, ACE Director
Mark Rittman, ACE Director
Roel Hartman, ACE Director
Simon Haslam, ACE Director
Frits Hoogland, ACE Director
Mogens Norgaard, ACE Director
Lonneke Dikmans, ACE Director
Brendan Tierney, ACE
Marcin Przepiorowski, ACE
Grant Ronald, ACE

It is also FREE to attend. So go ahead an book yourself a place for a full day Oracle training.

To register your place – Click here

The agenda for the conference can be found hereClick here

There are a few important web and social media links that might be of interest for all Oracle people in Ireland.

Web :       www.oug.og/ireland
Twitter :   @oug_ire
                @oug_ire2012 For the 2012 conference related tweets
LinkedIn : http://www.linkedin.com/groups?home=&gid=3441701

Re-launching the SIGS in Ireland

Over the past few years the User Group SIGs have been a bit quiet here in Ireland. Except for the BI (&EPM) SIG, which has been getting a good attendance since it launched back in 2009.

In particular the Oracle Technology SIG basically ceased to exist over the past 5 or 6 years, since the days when John Knox and Tagdh Cashman were running it.

At the conference it is hoped that to get some people to volunteer to get this SIG back up and running again. Maybe we could get 3 SIG events each year.  What do you think ?

Drop me an email if you have any questions regarding the SIGs in Ireland (I’m deputy chair of the BI & EPM SIG).Hopefully

I’ll see you there on the 21st March.

Monday, January 9, 2012

New additions to my Oracle Magazine Collection

I put out a call a few months ago looking for help in completing my Oracle Magazine collection. I also had a letter in the Nov/Dec 2011 edition of Oracle Magazine looking for help.

The following people contacted me in December offering to send me some of my missing editions.

Lisa Dobson - Oracle DBA at Durham University and Vice President of UKOUG
Debra Lilley - President of UKOUG
Christian Antognini - Trivadis AG, Switzerland
Dan Vlamis - Vlamis Software Solutions, MO, USA

Many thanks for your donations.

It seems that people don’t keep their Oracle Magazines!

I’m now just missing a very small number of editions since 1992.  Can anyone else help ?

Does anyone have any Oracle Magazines from before 1992 ?

The following table gives the current collection (printed editions). The grey boxes are the editions that I’m still missing, and keep an eye on my Oracle Magazine page for updates.

oracle mag3

Friday, January 6, 2012

ODM 11gR2–Real-time scoring of data

In my previous posts I gave sample code of how you can use your ODM model to score new data.

Applying an ODM Model to new data in Oracle – Part 2

Applying an ODM Model to new data in Oracle – Part 1

The examples given in this previous post were based on the new data being in a table.

In some scenarios you may not have the data you want to score in table. For example you want to score data as it is being recorded and before it gets committed to the database.

The format of the command to use is

prediction(ODM_MODEL_NAME USING <list of values to be used and what the mode attribute they map to>)

prediction_probability(ODM_Model_Name, Target Value, USING <list of values to be used and what model attribute they map to>)

So we can list the model attributes we want to use instead of using the USING *  as we did in the previous blog posts

Using the same sample data that I used in my previous posts the command would be:

Select prediction(clas_decision_tree
USING
20 as age,
'NeverM' as cust_marital_status,
'HS-grad' as education,
1 as household_size,
2 as yrs_residence,
1 as y_box_games) as scored_value
from dual;

SCORED_VALUE
------------
           0

Select prediction_probability(clas_decision_tree, 0
USING
20 as age,
'NeverM' as cust_marital_status,
'HS-grad' as education,
1 as household_size,
2 as yrs_residence,
1 as y_box_games) as probability_value
from dual;

PROBABILITY_VALUE
-----------------
                1

So we get the same result as we got in our previous examples.

Depending of what data we have gathered we may or may not have all the values for each of the attributes used in the model. In this case we can submit a subset of the values to the function and still get a result.

Select prediction(clas_decision_tree
USING
20 as age,
'NeverM' as cust_marital_status,
'HS-grad' as education) as scored_value2
from dual;

SCORED_VALUE2
-------------
            0

Select prediction_probability(clas_decision_tree, 0
USING
20 as age,
'NeverM' as cust_marital_status,
'HS-grad' as education) as probability_value2
from dual;

PROBABILITY_VALUE2
------------------
                 1

Again we get the same results.

Tuesday, January 3, 2012

ODM 11gR2–Using different data sources for Build and Testing a Model

There are 2 ways to connect a data source to the Model build node in Oracle Data Miner.

The typical method is to use a single data source that contains the data for the build and testing stages of the Model Build node. Using this method you can specify what percentage of the data, in the data source, to use for the Build step and the remaining records will be used for testing the model. The default is a 50:50 split but you can change this to what ever percentage that you think is appropriate (e.g. 60:40). The records will be split randomly into the Built and Test data sets.

image

The second way to specify the data sources is to use a separate data source for the Build and a separate data source for the Testing of the model.

To do this you add a new data source (containing the test data set) to the Model Build node. ODM will assign a label (Test) to the connector for the second data source.

image

If the label was assigned incorrectly you can swap what data sources. To do this right click on the Model Build node and select Swap Data Sources from the menu.

image

image

Saturday, December 31, 2011

My first set of Oracle Products

I started working with Oracle back in 1993 and my first project involved working with Oracle 5, Forms 2.3 and for reports RPT.

The Oracle Database and tools were very simple back then, but there was lots of “features” to work around.

Check out this video, for a short demo of Oracle 5 and Forms 2