Brendan Tierney - Oralytics Blog

Tuesday, July 10, 2012

Review of Oracle Magazine–July/August 1996

The headline articles for the July/August1996 edition of Oracle Magazine was on how to balance security and communication in a distributed world, extending Oracle power objects applications and automating Oracle tuning

Oracle articles included:

Oracle released three of its products on the web. These included Oracle Web Customers, Oracle Web Suppliers and Oracle Web Employees. They aimed to help make it possible for companies to conduct secure business transactions over the internet and corporate intranets. They also shipped Oracle Workflow to help support the implementation of these new products
Oracle Express Analyzer, an object-oriented reporting and analysis tool had its second release
UBS Bank implements the Oracle based operational accounting system, with over 800,000 input records daily and over 3,000 cost centre reports that needed different levels of summarisation. The new application allows the executives to view information in virtually any format choosing from 120,000 multi-level, multi-view reports.
The Egyptian Stock Exchange and Capital Market Authority implements a new trading system build on Oracle
Don Burleson in his article on Automating Oracle Tuning gives a number of scripts that would assist the DBA in finding out what is going on in the database. So instead of purchasing some expensive tools, all you needs was these scripts UTKBSTAT/UTLESTAT.

To view the cover page and the table of contents click on the image at the top of this post or click here.

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

Tuesday, June 26, 2012

Analytics Sessions at Oracle Open World 2012

The content catalog for Oracle Open World 2012 was made public during the week. OOW is on between 30th September and 4th October.

The following table gives a list of most of the Data Analytics type sessions that are currently scheduled.

Why did I pick these sessions? If I was able to go to OOW then these are the sessions I would like to attend. Yes there would be many more sessions I would like to attend on the core DB technology and Development streams.

Session Title	Presenters
CON6640 - Database Data Mining: Practical Enterprise R and Oracle Advanced Analytics	Husnu Sensoy
CON8688 - Customer Perspectives: Oracle Data Integrator	Gurcan Orhan - Software Architect & Senior Developer, Turkcell Technology R&D Julien Testut - Product Manager, Oracle
HOL10089 - Oracle Big Data Analytics and R	George Lumpkin - Vice President, Product Management, Oracle
CON8655 - Tackling Big Data Analytics with Oracle Data Integrator	Mala Narasimharajan - Senior Product Marketing Manager, Oracle Michael Eisterer - Principal Product Manager, Oracle
CON8436 - Data Warehousing and Big Data with the Latest Generation of Database Technology	George Lumpkin - Vice President, Product Management, Oracle
CON8424 - Oracle’s Big Data Platform: Settling the Debate	Martin Gubar - Director, Oracle Kuassi Mensah - Director Product Management, Oracle
CON8423 - Finding Gold in Your Data Warehouse: Oracle Advanced Analytics	Charles Berger - Senior Director, Product Management, Data Mining and Advanced Analytics, Oracle
CON8764 - Analytics for Oracle Fusion Applications: Overview and Strategy	Florian Schouten - Senior Director, Product Management/Strategy, Oracle
CON8330 - Implementing Big Data Solutions: From Theory to Practice	Josef Pugh - , Oracle
CON8524 - Oracle TimesTen In-Memory Database for Oracle Exalytics: Overview	Tirthankar Lahiri - Senior Director, Oracle
CON9510 - Oracle BI Analytics and Reporting: Where to Start?	Mauricio Alvarado - Principal Product Manager, Oracle
CON8438 - Scalable Statistics and Advanced Analytics: Using R in the Enterprise	Marcos Arancibia Coddou - Product Manager, Oracle Advanced Analytics, Oracle
CON4951 - Southwestern Energy’s Creation of the Analytical Enterprise	Jim Vick - , Southwestern Energy Richard Solari - Specialist Leader, Deloitte Consulting LLP
CON8311 - Mining Big Data with Semantic Web Technology: Discovering What You Didn’t Know	Zhe Wu - Consultant Member of Tech Staff, Oracle Xavier Lopez - Director, Product Management, Oracle
CON8428 - Analyze This! Analytical Power in SQL, More Than You Ever Dreamt Of	Hermann Baer - Director Product Management, Oracle Andrew Witkowski - Architect, Oracle
CON6143 - Big Data in Financial Services: Technologies, Use Cases, and Implications	Omer Trajman - , Cloudera Ambreesh Khanna - Industry Vice President, Oracle Sunil Mathew - Senior Director, Financial Services Industry Technology, Oracle
CON8425 - Big Data: The Big Story	Jean-Pierre Dijcks - Sr. Principal Product Manager, Oracle
CON10327 - Recommendations in R: Scaling from Small to Big Data	Mark Hornick - Senior Manager, Oracle

Monday, June 25, 2012

R resources

Download R : http://www.r-project.org/

R installation instructions : http://star-www.st-andrews.ac.uk/cran/

Oracle R Enterprise

R-Uni (A List of 85+ Free R Tutorials and Resources in Universities webpages)

R programming for those coming from other languages

Understanding big data analysis with the R language

Introduction to R for Data Mining from Revolution Analytics

Wednesday, June 20, 2012

Part 2 of the Leaning Tower of Pisa problem in ODM

In previous post I gave the details of how you can use Regression in Oracle Data Miner to predict/forecast the lean of the tower in future years. This was based on building a regression model in ODM using the known lean/tilt of the tower for a range of years.

In this post I will show you how you can do the same tasks using the Oracle Data Miner functions in SQL and PL/SQL.

Step 1 – Create the table and data

The easiest way to do this is to make a copy of the PISA table we created in the previous blog post. If you haven’t completed this, then go to the blog post and complete step 1 and step 2.

create table PISA_2
as select * from PISA;

Step 2 – Create the ODM Settings table

We need to create a ‘settings’ table before we can use the ODM API’s in PL/SQL. The purpose of this table is to store all the configuration parameters needed for the algorithm to work. In our case we only need to set two parameters.

BEGIN
delete from pisa_2_settings;
INSERT INTO PISA_2_settings (setting_name, setting_value) VALUES
(dbms_data_mining.algo_name, dbms_data_mining.ALGO_GENERALIZED_LINEAR_MODEL);
INSERT INTO PISA_2_settings (setting_name, setting_value) VALUES
(dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_off );
COMMIT;
END;

Step 3 – Build the Regression Model

To build the regression model we need to use the CREATE_MODEL function that is part of the DBMS_DATA_MINING package. When calling this function we need to pass in the name of the model, the algorithm to use, the source data, the setting table and the target column we are interested in.

BEGIN
      DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'PISA_REG_2',
        mining_function     => dbms_data_mining.regression,
        data_table_name     => 'pisa_2_build_v',
        case_id_column_name => null,
        target_column_name => 'tilt',
        settings_table_name => 'pisa_2_settings');
END;

After this we should have our regression model.

Step 4 – Query the Regression Model details

To find out what was produced as in the previous step we can query the data dictionary.

SELECT model_name,
       mining_function,
       algorithm,
       build_duration,
       model_size
from USER_MINING_MODELS
where model_name like 'P%';

select setting_name,
setting_value,
setting_type
from all_mining_model_settings
where model_name like 'P%';

Step 5 – Apply the Regression Model to new data

Our final step would be to apply it to our new data i.e. the years that we want to know what the lean/tilt would be.

SELECT year_measured, prediction(pisa_reg_2 using *)
FROM pisa_2_apply_v;

Tuesday, June 19, 2012

Using ODM Regression for the Leaning Tower of Pisa tilt problem

This blog post will look at how you can use the Regression feature in Oracle Data Miner (ODM) to predict the lean/tilt of the Leaning Tower of Pisa in the future.

This is a well know regression exercise, and it typically comes with a set of know values and the year for these values. There are lots of websites that contain the details of the problem. A summary of it is:

The following table gives measurements for the years 1975-1985 of the "lean" of the Leaning Tower of Pisa. The variable "lean" represents the difference between where a point on the tower would be if the tower were straight and where it actually is. The data is coded as tenths of a millimetre in excess of 2.9 meters, so that the 1975 lean, which was 2.9642.

Given the lean for the years 1975 to 1985, can you calculate the lean for a future date like 200, 2009, 2012.

Step 1 – Create the table

Connect to a schema that you have setup for use with Oracle Data Miner. Create a table (PISA) with 2 attributes, YEAR_MEASURED and TILT. Both of these attributes need to have the datatype of NUMBER, as ODM will ignore any of the attributes if they are a VARCHAR or you might get an error.

CREATE TABLE PISA
(
YEAR_MEASURED NUMBER(4,0),
TILT NUMBER(9,4)
);

Step 2 – Insert the data

There are 2 sets of data that need to be inserted into this table. The first is the data from 1975 to 1985 with the known values of the lean/tilt of the tower. The second set of data is the future years where we do not know the lean/tilt and we want ODM to calculate the value based on the Regression model we want to create.

Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1975,2.9642);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1976,2.9644);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1977,2.9656);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1978,2.9667);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1979,2.9673);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1980,2.9688);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1981,2.9696);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1982,2.9698);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1983,2.9713);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1984,2.9717);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1985,2.9725);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1986,2.9742);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1987,2.9757);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1988,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1989,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1990,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (1995,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2000,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2005,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2010,null);
Insert into DMUSER.PISA (YEAR_MEASURED,TILT) values (2009,null);

Step 3 – Start ODM and Prepare the data

Open SQL Developer and open the ODM Connections tab. Connect to the schema that you have created the PISA table in. Create a new Project or use an existing one and create a new Workflow for your PISA ODM work.

Create a Data Source node in the workspace and assign the PISA table to it. You can select all the attributes..

The table contains the data that we need to build our regression model (our training data set) and the data that we will use for predicting the future lean/tilt (our apply data set).

We need to apply a filter to the PISA data source to only look at the training data set. Select the Filter Rows node and drag it to the workspace. Connect the PISA data source to the Filter Rows note. Double click on the Filter Row node and select the Expression Builder icon. Create the where clause to select only the rows where we know the lean/tilt.

Step 4 – Create the Regression model

Select the Regression Node from the Models component palette and drop it onto your workspace. Connect the Filter Rows node to the Regression Build Node.

Double click on the Regression Build node and set the Target to the TILT variable. You can leave the Case ID at <None>. You can also select if you want to build a GLM or SVM regression model or both of them. Set the AUTO check box to unchecked. By doing this Oracle will not try to do any data processing or attribute elimination.

You are now ready to create your regression models.

To do this right click the Regression Build node and select Run. When everything is finished you will get a little green tick on the top right hand corner of each node.

Step 5 – Predict the Lean/Tilt for future years

The PISA table that we used above, also contains our apply data set

We need to create a new Filter Rows node on our workspace. This will be used to only look at the rows in PISA where TILT is null. Connect the PISA data source node to the new filter node and edit the expression builder.

Next we need to create the Apply Node. This allows us to run the Regression model(s) against our Apply data set. Connect the second Filter Rows node to the Apply Node and the Regression Build node to the Apply Node.

Double click on the Apply Node. Under the Apply Columns we can see that we will have 4 attributes created in the output. 3 of these attributes will be for the GLM model and 1 will be for the SVM model.

Click on the Data Columns tab and edit the data columns so that we get the YEAR_MEASURED attribute to appear in the final output.

Now run the Apply node by right clicking on it and selecting Run.

Step 6 – Viewing the results

Where we get the little green tick on the Apply node we know that everything has run and completed successfully.

To view the predictions right click on the Apply Node and select View Data from the menu.

We can see the the GLM mode gives the results we would expect but the SVM does not.

Monday, June 18, 2012

Oracle Magazine–Volume 1 Number 1

A few weeks ago I sent a few emails to some well know names in the Oracle World looking to see if they have a copy of the very first Oracle Magazine (Volume 1 Number 1).

Many thanks to Oracle ACE Director Cary Millsap of Method R, who responded to say that he had the very first Oracle Magazine. He kindly arranged to have it scanned into PDF.

To view the 12 page Oracle Magazine (Volume 1 Number 1) click on the following image. Read and Enjoy!

Some people have said that his is not the first Oracle Magazine, published in June 1987. Although this edition is labelled as Volume 1 Number 1, an Oracle Newsletter existed for a few years prior to this edition.

Do you know of anyone who has these newsletters ?

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions.

Wednesday, June 13, 2012

Data Science Is Multidisciplinary

[Update :October 2016. There appears to be some discussion about the Venn diagram I've proposed below. The central part of this diagram is not anything I can up with. It was a commonly used Venn diagram for Data Mining. Thanks to Polly Michell-Guthrie for providing the original reference for the Venn. I just added the outer ring of additional skills needed for the new area of Data Science. This was just my view of things back in 2012. Things have moved on a bit since then]

A few weeks ago I had a blog post called Domain Knowledge + Data Skills = Data Miner.
In that blog post I was saying that to be a Data Scientist all you needed was Domain Knowledge and some Data Skills, which included Data Mining.
The reality is that the skill set of a Data Scientist will be much larger. There is a saying ‘A jack of all trades and a master of none’. When it comes to being a data scientist you need to be a bit like this but perhaps a better saying would be ‘A jack of all trades and a master of some’.
I’ve put together the following diagram, which includes most of the skills with an out circle of more fundamental skills. It is this outer ring of skills that are fundamental in becoming a data scientist. The skills in the inner part of the diagram are skills that most people will have some experience in one or more of them. The other skills can be developed and learned over time, all depending on the type of person you are.

Can we train someone to become a data scientist or are they born to be a data scientist. It is a little bit of both really but you need to have some of the fundamental skills and the right type of personality. The learning of the other skills should be easy(ish)
What do you think? Are their Skill that I’m missing?

Thursday, June 7, 2012

Review of Oracle Magazine–May/June 1996

The headline articles for the May/June 1996 edition of Oracle Magazine was an introduction to the Oracle Universal Server and how it can be used to give a flexible architecture for your growing organisation

Tuesday, June 5, 2012

OUG Ireland SIG Meetings 26th June

The next Oracle User Group in Ireland SIG meetings will be on Tuesday 26th June.

This will be a full day event and will comprise 2 SIGs, the BI& EPM and Applications.

The BI & EPM SIG presentations will be in the morning and the Applications SIG presentations will be in the afternoon.

A lot of work has been put into planning this full day event to come up with an agenda that people from both communities may be interested in.

Check out the full agenda page – click here.

To register for the event – click here.

Monday, June 4, 2012

OTN Developer Days–Dublin 12th to 14 June

The OTN Developer Days events return to the Oracle Dublin office in East Point this month from the 12th to the 14th.

These are free events, but places are limited, and allow you to get some hands-on training with these tools. Depending on the day and the topic there is a mixture of lecture and workshop, to just being a hands-on workshop.

12th June – Golden Gate 11g, Oracle Data Integrator 11g and Enterprise Data Quality (full day : 9:45-17:00)

13th June – Partitioning and Advanced Compression (9:45-13:00)

14th June – Unlocking the value of Oracle Database 11g Core Features (9:45-15:00)

These are free events and you will even get a free lunch from Oracle.

Monday, May 28, 2012

VM for Oracle Data Miner

Recently the OTN team have updated the ‘Database App Development’ Developer Day virtual machine to include Oracle 11.2.0.2 DB and SQL Developer 3.1. This is all you need to try out Oracle Data Miner.

So how do you get started with using Oracle Data Miner on your PC. The first step is to download and install the latest version of Oracle VirtualBox.

The next step is to download and install the OTN Developer Day appliance. Click on the above link to go to the webpage and follow the instructions to download and install the appliance. Download the first appliance on this page ‘Database App Development’ VM. This is a large download and depending on your internet connection it can take anything from 30 minutes to hours. So I wouldn’t recommend doing this over a wifi.

When you start up the VM your OS username and password is oracle. Yes it is case sensitive.

When the get logged into the VM you can close or minimise the host window

There are two important icons, the SQL Developer and the ODDHandsOnLab.html icons.

The ODDHandsOnLab.html icon loads a webpage what contains a number of tutorials for you to follow.

The tutorial we are interest in is the Oracle Data Miner Tutorial. There are 4 tutorials given for ODM. The first two tutorials need to be followed in the order that they are given. The second two tutorials can be done in any order.

If you have not used SQL Developer before then you should work through this tutorial before starting the Oracle Data Miner tutorials.

The first tutorial takes you through the steps needed to create your ODM schema and to create the ODM repository within the database. This tutorial will only take you 10 to 15 minutes to complete.

In the second tutorial you get to use the ODM to build your first ODM model. This tutorial steps your through how to get started with an ODM project, workflow, the different ODM features, how to explore the data, how to create classification models, how to explore the model and then how to apply one of these models to new data. This second tutorial will take approx. 30 to 40 minutes to complete.

It is all very simple and easy to use.

Thursday, May 24, 2012

UKOUG Conference-Submissions Deadline is 1st June

The call for presentations for Europe’s largest Oracle conference is currently open, but the deadline for submissions is approaching fast. The submission deadline is 1st June.

If you are interested in presenting there are a couple of things you need to do. The first step is that you need to register as a speaker. This just involves you registering your interest in being a speaker. The second step is to submit your presentation abstracts.

The conference will be in Birmingham between 3rd and 5th December. There are multiple streams including BI, Technology, Fusion, Middleware, Development, MySQL, Infrastructure, eBusiness Suite, Core Database, etc.

I gave my very first presentation at the annual UKOUG Conference a few years ago and I’ve presented a few times since. I would encourage everyone to give it a go. Pick a topic or topics that you have been working on over the past 12 months or more, or if you used a particular technique on a recent project, or you have discovered a particular work around, etc. submit a presentation on it.

I’ve submitted a few presentations, all of which are about data mining and the advanced analytics option in the Oracle Database. Two of these presentations will be co-presented with Antony Heljula, Peak Indicators. The presentations will be on including and using Oracle Data Mining models in OBIEE and on how we went about developing the Oracle Data Mining models for our project. Will they get accepted, I hope so, but the presentation selection is based on user voting. Everyone can get involved in the judging and voting of presentations.

Notifications of Acceptances Smile or Rejections Sad smile typically come out around the end of July or early August.

I’ve already booked by flight to Birmingham in December, so if my presentations get accepted or not, I’ll be there. It is a great conference.

Pages