Brendan Tierney - Oralytics Blog

Monday, July 8, 2013

12c Roundup so far and Events

I’m on vacation at the moment. As a result I’ve missed all the 12c launch and excitement that goes with it. I’ve managed to get a few minutes to put this post together. The aim of this post is to list some interesting blog posts (by other people over the past few days). I intend to expand the list when I get time.

I also wanted to highlight two 12c launch events. The first of these is the official Oracle 12c webcast. It is on Wednesday 10th July. Click on the following image to register etc. The webcast will have Mark Hurd, Andy Mendelsohn and Tom Kyte.

The second 12c launch event will be hosted by Oracle in Ireland. This will be on the 5th September in the Gibson Hotel (Dublin) between 13:00 and 17:30. I believe their might be some 12c goodies available for the attendees. Again click on the image below to register and to check out the agenda.

The following are some articles and blog posts that have been published since 12c has been launched. This is not a complete list or and indication of quality, but I’ve noted them for me to come back to after my vacation to read. You might have come across others. If so let me know and I will add them to the list.

12.1c Download page

12.1c Documentation page

12.1c New Features Guide

Oracle Advanced Analytics Option 12c and SQL Dev 4 new features

Oracle Database 12c: Oracle Multitenant Option

Oracle website for Multitenent

New DB12c feature involves invisibility

12c - SQL Text Expansion

Ever expanding SQL for 12c

Oracle 12c Magazine by @leight0nn in Flipboard

How long can you hold off on Oracle 12c

Oracle 12c Install articles by Tim Hall (oraclebase) on Linux5 and Linux6

Over the coming weeks (after my vacation) I will be posting some articles on the Advanced Analytics Option in 12c. There are a number of new features. Also when SQL Developer 4 comes out I will be including all the new functionality that is included in the updated ODM tool.

Thursday, June 27, 2013

Oracle Magazine-Jan/Feb 2000

The headline articles of Oracle Magazine for January/February 2000 were focused on looking forward to what is to come, now that the year 2000 bomb. These articles include large scale, 24x7 data warehouses and marts, more development using Java, more and better B2B with XML.

This issue of Oracle Magazine introduced a new layout and design.

Monday, June 17, 2013

Oracle Magazine-Nov/Dec 1999

The headline articles of Oracle Magazine for November/December 1999 were E-Business and how you can use the Oracle product set to put your business online. These articles included features on companies such as AMR, Fogdog, Cognitiative, Drug Emporium, Click-fil-A, Living, CD Now, Trilux and Lycos Networks.

Wednesday, June 12, 2013

Part 3–Getting start with Statistics for Oracle Data Science projects

This is the Part 3 blog post on getting started with Statistics for Oracle Data Science projects.

The first blog post in the series looked at the DBMS_STAT_FUNCS PL/SQL package, what it can be used for and I give some sample code on how to use it in your data science projects. I also give some sample code that I typically run to gather some additional stats.
The second blog post will look at some of the other statistical functions that exist in SQL that you will/may use regularly in your data science projects.This is the second blog on getting started with Statistics for Oracle Data Science projects.
The third blog post will provide a summary of the other statistical functions that exist in the database.

The table below is a collection of most of the statistical functions in Oracle 11.2. The links in the table bring you to the relevant section of the Oracle documentation where you will find a description of each function, the syntax and some examples of each.

ABS	LENGTH2	REGR_AVGX
ACOS	LENGTH4	REGR_ACGY
Aggregrate functions	LENGTHB	REGR_COUNT
Analytic functions	LENGTHC	REGR_INTERCEPT
Arithmetic operators	LN	REGR_R2
ASIN	LNNVL	REGR_SLOPE
ATAN	LOG	REGR_SXX
ATAN2	LOWER	REGR_SXY
AVG	LPAD	REGR_SYY
CAST	LTRIM	ROLLUP clause
Comparison functions	MAX	ROUND
CONCAT	MEDIAN	SAMPLE
CORR	MIN	SIN
CORR_K	MOD	SINH
CORR_S	MODEL clause	SQRT
COS	NTH_VALUE	STATS_BINOMIAL_TEST
COSH	Numeric Functions	STATS_CROSSTAB
COUNT	PERCENT_RANK	STATS_F_TEST
COVAR_POP	PERCENTILE_CONT	STATS_KS_TEST
COVAR_SAMP	PERCENTILE_DISC	STATS_MODE
CUBE clause	Pivot operations	STATS_MW_TEST
CUME_DIST	POWER	STATS_ONE_WAY_ANOVA
CV	PREDICTION	STATS_T_TEST_INDEP
Data functions	PREDICTION_BOUNDS	STATS_T_TEST_INDEPU
DENSE_RANK	PREDICTION_COST	STATS_T_TEST_ONE
EXP	PREDICTION_PROBABILITY	STATS_T_TEST_PAIRED
FLOOR	PREDICTION_SET	STATS_WSR_TEST
GREATEST	PRESENTNNV	STDDEV
Grouping Sets	PRESENTNTV	STDEEV_POP
INTERSECT	Prior clause	STDDEV_SAMP
Interval arithmetic	PRIOR	SUM
INTERVAL	RANK	TAN
Julian dates	RAWTOHEX	TANH
LAG	REGEXP_COUNT	t-test
LAST	REGEXP_INSTR	VAR_POP
LEAD	REGEXP_LIKE	VAR_SAMP
LEAST	REGEXP_REPLACE	VARIANCE
LENGTH	REGEXP_SUBSTR	WIDTH_BUCKET

The list about may not be complete (I’m sure it is not), but it will cover most of what you will need to use in your Oracle projects.

If you come across or know of other useful statistical functions in Oracle let me know the details and I will update the table above to include them.

Friday, June 7, 2013

DBMS_PREDICTIVE_ANALYTICS & Predict

In this blog post I will look at the PREDICT procedure that is part of the DBMS_PREDICTIVE_ANALTYICS package. This package allows you to perform data mining in an automated way without having to go through the steps of building, testing and scoring data.

I had a previous blog post that showed how to use the EXPLAIN function to create an Attribute Importance model.

The predictive analytics procedures analyze and prepare the input data, create and test mining models using the input data, and then use the input data for scoring. The results of scoring are returned to the user. The models and supporting objects are not persisted and are removed from the database when the procedure is finished.

The PREDICT procedure should only be used for a Classification problem and data set.

The PREDICT procedure create a model based on the supplied data (out input table) and a target value, and returns scored data set in a new table. When using PREDICT you do not get to select an algorithm to use.

The input data source should contain records that already have the target value populated. It can also contain records where you do not have the target value. In this case the PREDICT function will use the records that have a target value to generate the model. This model will then score all records a the predicted target value

The syntax of the PREDICT procedure is:

DBMS_PREDICTIVE_ANALYTICS.PREDICT (
   accuracy OUT NUMBER,
   data_table_name IN VARCHAR2,
   case_id_column_name IN VARCHAR2,
   target_column_name IN VARCHAR2,
   result_table_name IN VARCHAR2,
   data_schema_name IN VARCHAR2 DEFAULT NULL);

Where

Parameter Name	Description
accuracy	This output parameter from the procedure. You do not pass anything into this parameter. The Accuracy value returned is the predictive confidence of the model generated/used by the PREDICT procedure
data_table_name	The name of the table that contains the data you want to use
case_id_column_name	The case id for each record. This is unique for each record/case.
target_column_name	The name of the column that contains the target column to be predicted
result_table_name	The name of the table that will contain the results. This table should not exist in your schema, otherwise an error will occur.
data_schema_name	The name of the schema where the table containing the input data is located. This is probably in your current schema, so you can leave this parameter NULL.

The PREDICT procedure will produce an output tables (result_table_name parameter) and will contain 3 attributes.

CASE_ID	This is the Case Id of the record from the original data_table_name. This will allow you to link up the data in the source table to the prediction in the result_table_name
PREDICTION	This will be the predicted value of the target attribute
PROBABILITY	This is the probability of the prediction being correct

Using the sample example data set that I have given in previous blog posts and in the blog post on the EXPLAIN procedure, the following code illustrates how to use the PREDICT procedure.

set serveroutput on

DECLARE
   v_accuracy NUMBER(10,9);
BEGIN
   DBMS_PREDICTIVE_ANALYTICS.PREDICT(
      accuracy => v_accuracy,
      data_table_name => 'mining_data_build_v',
      case_id_column_name => 'cust_id',
      target_column_name => 'affinity_card',
      result_table_name => 'PA_PREDICT');
   DBMS_OUTPUT.PUT_LINE('Accuracy of model = ' || v_accuracy);
END;

This took about 15 seconds to run on my laptop, which is surprisingly quick given all the work that is doing internally. To see the predictions and the results from the PREDICT procedure, you will need to query the PA_PREDICT table.

The final step that you might be interested in is to compare the original target value with the prediction value.

SELECT v.cust_id,
       v.affinity_card,
       p.prediction,
       p.probability
FROM   mining_data_build_v v,
       pa_predict p
WHERE v.cust_id = p.cust_id
AND    rownum <= 12;

Remember we do not get to see how or what Oracle did to generate these results. We do not get the opportunity to tune the process and the model.

So you have to be careful when you use the PREDICT function and on what data. Would you use this as a way to explore your data and to see if predictive analytics/data mining might be useful for your? Yes it would. Would you use it in a production scenario? the answer is maybe but it depends on the scenario. In reality if you want to do this in a production environment you will put some work into developing data mining models that best fit your data. To do this you will need to move onto the ODM tool and the DBMS_DATA_MINING package. But the PREDICT function is a quick way to get some small data scored (in some way) based on your existing data. If your marketing department says they want to start a tele marketing campaign in a couple of hours then PREDICT is what you need to use. It may not give you the most accurate of results, but it does give you results that you can start using quickly.

Friday, May 31, 2013

Introducing Java EE7–Live Webcast 12th June, 2013

There will be live Wecast on June 12 (2013) on Introducing Java EE7. There will be some keynotes, some break out sessions that you can attend and you will have the opportunity to chat with some Java experts. The highlights of this event include:

Business Keynote (Hasan Rizvi and Cameron Purdy)
Technical Keynote (Linda DeMichiel)
Breakout Sessions on different JSRs by specification leads
Live Chat
Lots of Demos
Community, Partner, and Customer video testimonials

This is a free event, so sign up now to book your place.

I joined a conference call on Thursday that was organised for members of the Oracle ACE program. This a full 1 hour conference call, presented by Arun Gupta. He spent the one hour call going through some of the new features coming in Java EE7

Wednesday, May 22, 2013

OUG Ireland BI & Tech SIGs June 2013

On 11th and 12th June we will be having our next SIG meetings for BI and Tech. The BI SIG will be on the 11th June in the Oracle offices in East Point. We then move the the Conrad Hotel on the 12th June for the Tech SIG. Here are the agendas for the 2 days.

BI SIG

Tech SIG

These events are open to everyone, are free for members and a small fee for non-members.

To register for these event go to the following links

Monday, May 20, 2013

DBMS_PREDICTIVE_ANALYTICS & Explain

There are 2 PL/SQL packages for performing data mining/predictive analytics in Oracle. The main PL/SQL package is DBMS_DATA_MINING. This package allows you to build data mining models and to apply them to new data. But there is another PL/SQL package.

The DBMS_PREDICTIVE_ANALYTICS package is very different to the DBMS_DATA_MINING package. The DBMS_PREDICTIVE_ANALYTICS package includes routines for predictive analytics, an automated form of data mining. With predictive analytics, you do not need to be aware of model building or scoring. All mining activities are handled internally by the predictive analytics procedure.

Predictive analytics routines prepare the data, build a model, score the model, and return the results of model scoring. Before exiting, they delete the model and supporting objects.

The package comes with the following functions: EXPLAIN, PREDICT and PROFILE. To get some of details about these functions we can run the following in SQL.

This blog post will look at the EXPLAIN function.

EXPLAIN creates an attribute importance model. Attribute importance uses the Minimum Description Length algorithm to determine the relative importance of attributes in predicting a target value. EXPLAIN returns a list of attributes ranked in relative order of their impact on the prediction. This information is derived from the model details for the attribute importance model.

Attribute importance models are not scored against new data. They simply return information (model details) about the data you provide.

I’ve written two previous blog posts on Attribute Importance. One of these was on how to calculate Attribute Importance using the Oracle Data Miner tool. In the ODM tool it is now called Feature Selection and is part of the Filter Columns node and the Attribute Importance model is not persisted in the database. The second blog post was how you can create the Attribute Importance using the DBMS_DATA_MINING package.

EXPLAIN ranks attributes in order of influence in explaining a target column.

The syntax of the function is

DBMS_PREDICTIVE_ANALYTICS.EXPLAIN (
     data_table_name     IN VARCHAR2,
     explain_column_name IN VARCHAR2,
     result_table_name   IN VARCHAR2,
     data_schema_name    IN VARCHAR2 DEFAULT NULL);

where

data_table_name = Name of input table or view

explain_column_name = Name of column to be explained

result_table_name = Name of table where results are saved. It creates a new table in your schema.

data_schema_name = Name of schema where the input table or view resides. Default: the current schema.

So when calling the function you do not have to include the last parameter.

Using the same example what I have given in the previous blog posts (see about for the links to these) the following command can be run to generate the Attribute Importance.

BEGIN

    DBMS_PREDICTIVE_ANALYTICS.EXPLAIN(

        data_table_name      => 'mining_data_build_v',

        explain_column_name => 'affinity_card',

        result_table_name    => 'PA_EXPLAIN');

END;

One thing that stands out is that it is a bit slower to run than the DBMS_DATA_MINING method. On my laptop it took approx. twice to three time longer to run. But in total it was less than a minute.

To display the results,

The results are ranked in a 0 to 1 range. Any attribute that had a negative value are set to zero.

Thursday, May 16, 2013

Outputting your data using inbuilt SQL Dev formatting

Oracle has build a number of formatting options into SQL Developer to allow you to output your data in some standard formats. This removes the need to use other tools or to write extra code or performs various follow up steps.
All you need to do is to add a comment and use the Scrip button
SELECT /*csv*/ * FROM scott.emp;
SELECT /*xml*/ * FROM scott.emp;
SELECT /*html*/ * FROM scott.emp;
SELECT /*delimited*/ * FROM scott.emp;
SELECT /*insert*/ * FROM SCOTT.EMP;
SELECT /*loader*/ * FROM scott.emp;
SELECT /*fixed*/ * FROM scott.emp;
SELECT /*text*/ * FROM scott.emp;
Hint: for some of these it is best to list the schema and table name in upper case
These are comments and not hints so they will not work in SQL*Plus.

Wednesday, May 15, 2013

Review of Oracle Magazine-July/August 1999

The headline articles for the July/August 1999 edition of Oracle Magazine were focused on Business Intelligence and included topics on architectures, business plans, data integration, portals, dashboards, Oracle Express, data marts and data warehouses.

Monday, May 13, 2013

Recent Big Data and Analytics related articles

Over the past couple of weeks I’ve come across the following articles, blog posts and discussions about Big Data and Analytics. There seems to be an underlying theme of ‘let’s get back to the core of the problem’ and big data is not that useful and only in certain cases.

As the Analytics 3.0 article indicates we should be concentrating on how we can use analytics to achieve a real goal for the organisation.

Analytics 3.0

Most data isn’t “big,” and businesses are wasting money pretending it is

- There is a LinkedIn discussion about this article

7 Myths about Big Data

Most data sets are 40-60GB range – I can do that on my laptop, so that cannot be Big Data

Big Data Hype (and Reality)

It is also interesting to note that most of the people who have been working in the area for years (10+) are not believes in Big Data or they don’t even consider calling themselves Data Scientists.

The 10 Most Influential People in Data Analytics, Data Mining, Predictive Analytics

The purpose of this post to to record these links in one place and to share with everyone else who might be interested.

Friday, May 10, 2013

Getting Real Business Value from Oracle Data Mining and OBIEE

Over the past 16 months (or so) I have give a join presentation with Anthony Heljula called ‘Getting Real Business Value from Oracle Data Mining and OBIEE’, at a number of conferences and OUG SIGs.

We have had a lot of very positive feedback on this presentation. The presentation is a busy 45 minutes (questions only at the end) that walks through a pilot data science project we did for a University in the UK.

We used Oracle Data Miner to build a predictive model that looks at student churn. We then integrated this Student Churn model into OBIEE Dashboards to illustrate how combining an Oracle Data Miner model into our data analysis we can gain a greater insight of our data.

We have submitted this presentation for Oracle Open World 2013 but we have renamed the title of the presentation to

“How UK Universities are using Oracle Data Science to protect their income”

If you are involved in presentation selection or know someone who is then maybe you might select this to be presented at OOW13 in September.

We submitted the presentation for OOW12 with not luck. So fingers crossed this time.

Pages