Brendan Tierney - Oralytics Blog: oracle big data

Showing posts with label oracle big data. Show all posts

Friday, August 14, 2015

Managing ORE in-database Data Stores using SQL

When working with ORE you will end up creating a number of different data stores in the database. Also as your data science team increases the number of data stores can grow to a very large number.

When you install Oracle R Enterprise you will get a number of views that are made available to ORE users to see what ORE Data stores they have and what objects exist in them. All using SQL.

Perhaps some of the time the ORE developers and data analysts will use the set of ORE functions to manage the in-database ORE Data stores. These include:

NewImage

When using these ORE function the schema user/data scientist can see what ORE Data stores they have. You can use the ore.delete to delete an ORE Data store when it is no longer needed.

But the problem here is that over time your schemas can get a bit clogged up with ORE Data stores. Particularly when the data scientist is not longer working on the project or there is no need to maintain ORE Data stores. This is common on data science projects when you might have a number of data scientists work in/sharing the one database schema.

For a DBA, who's role will be to clean up the ORE Data store that are no longer needed, you have 4 options.

The first of these, is if all the ORE Data stores exist in the data scientists schema and nothing else in the schema is needed then you can just go ahead and drop the schema.

The second option is to log into the schema using SQL and drop the ORE Data stores. See an example of this below.

The third option is to connect to the Oracle schema using R and ORE and then use the ore.delete function to drop the ORE Data stores.

The fourth option is to connect to the RQSYS schema. This schema is the owner of the views used to query the ORE Data stores in each schema. After the RQSYS schema was created it was locked as part of the ORE installation. You as the DBA will need to unlock and then connect.

The following SQL lists the ORE Data stores that were created for that schema.

column dsname format a20
column description format a35

SELECT * FROM rquser_DataStoreList;

DSNAME                     NOBJ     DSSIZE CDATE     DESCRIPTION
-------------------- ---------- ---------- --------- -----------------------------------
ORE_DS                        2       5104 04-AUG-15 Example of ORE Datastore
ORE_FOR_DELETION              1       1675 14-AUG-15 Need to Delete this ORE Datastore
ORE_DS2                       5   51466509 04-AUG-15 DS for all R Env Data

You can also view what objects have saved in the ORE Data store.

column objname format a15
column class format a15
SELECT * FROM rquser_DataStoreContents;
 
DSNAME               OBJNAME         CLASS              OBJSIZE     LENGTH       NROW       NCOL
-------------------- --------------- --------------- ---------- ---------- ---------- ----------
ORE_DS               CARS_DATA       ore.frame             1306         11         32         11
ORE_DS               cars_ds         data.frame            3798         11         32         11
ORE_DS2              cars_ds         data.frame            3798         11         32         11
ORE_DS2              cars_ore_ds     ore.frame             1675         11         32         11
ORE_DS2              sales_ds        data.frame        51455575          7     918843          7
ORE_DS2              usa_ds          ore.frame             2749         23      18520         23
ORE_DS2              usa_ds2         ore.frame             2712         23      18520         23
ORE_FOR_DELETION     cars_ore_ds     ore.frame             1675         11         32         11

To drop an ORE Data store for you current schema you can use the rqDropDataStore SQL function.

BEGIN
   rqDropDataStore('ORE_FOR_DELETION');
END;
/

For the DBA when you unlock and connect to the RQSYS schema you will be able to see all the ORE Data stores in the data. The views will contain an additional column.

But if you use the above SQL function to delete an ORE Data store it will not work. This because this SQL function will only drop and ORE Data store if it exists in your schema. If we have connected to the RQSYS schema we will not have any ORE Data stores in it.

We can create a procedure that will allow use to delete/drop any ORE Data store in any schema.

create or replace PROCEDURE my_ORE_Datastore_Drop(
  ds_owner  in VARCHAR2,
  ds_name  IN VARCHAR2
  )
IS
  del_objIds rqNumericSet;
BEGIN
  del_objIds := rq$DropDataStoreImpl(ds_owner, ds_name);
  IF del_objIds IS NULL THEN
    raise_application_error(-20101, 'DataStore ' ||
                            ds_name || ' does not exist');
  END IF;

  -- remove from rq$datastoreinventory
  BEGIN
    execute immediate
       'delete from RQ$DATASTOREINVENTORY c where c.objID IN (' ||
       'select column_value from table(:del_objIds))' using del_objIds;
     COMMIT;
  EXCEPTION WHEN others THEN null;
  END;
END;

We are the DBA, logged into the RQSYS schema can now delete any ORE Data store in the database, using the following.

BEGIN
   my_ORE_Datastore_Drop('ORE_USER', 'ORE_FOR_DELETION');
END;
/

Thursday, July 30, 2015

Check out What Sauron is saying about Oracle

Over past year we have been (hopefully) hearing about Oracle Big Data SQL.

This is a new(-ish) option from Oracle that allows us to run our SQL queries not just on the data in our Oracle Database but also against NoSQL databases and Hadoop. No extra coding is needed, no extra formatting is needed, etc.

All the hard work in connecting to the data in this systems, translating it into executable code on these systems, executing it, capturing the results and presenting the results back to us sitting in our schema in our Oracle Database.

How cool is that.

To learn more about Oracle Big Data SQL check out their webpage.

But what let us get back to the title of this blog post, 'What Sauron is saying about Oracle'. I used these back at one of my presentation at BIWA Summit in January 2015 and I've been meaning to post these since.

If you have read books or watch the movies you will remember the phrase.

NewImage

We can apply this phrase to Oracle SQL now.

NewImage

or maybe my alternative version might be better.

NewImage

Wednesday, September 17, 2014

Analytics Hands on Labs at OOW 14

I had an previous blog post listing the various Oracle Advanced Analytics sessions/presentation at Oracle Open World 2014.

After trawling through the list of Hands-on-Labs it was disappointing to see that there was no Oracle Data Mining or Oracle R Enterprise hands-on-labs this year.

But there is a hands on lab that looks are how to use the new SQL for Big Data feature (announced over the summer).

Here is the abstract for the session.

Data warehouses contain the critical data required for managing and running organizations. Increasingly, Hadoop and NoSQL databases are capturing additional information—such as web logs, social media, and weather data that can augment the warehouse—enabling users to uncover new insights and opportunities. This hands-on lab illustrates how Oracle Big Data SQL is used to unify these environments. First you will learn how to securely access these big data sources from Oracle Database 12c. Then you will utilize Oracle’s analytical SQL across all your data, regardless of where it resides. Welcome to Oracle’s new big data management system!

There will be a lab session each day for this session and I will certainly be doing my best to get to one of these.

Date	Time	Location	Hands-on-Lab Session Title
Monday 29th Sept.	11:45-12:45	Hotel Nikko - Peninsula	Oracle Big Data SQL: Unified SQL Analysis Across the Big Data Platform [HOL9348]
Tuesday 30th Sept.	15:45-16:45	Hotel Nikko - Peninsula
Wednesday 1st Oct.	13:15-14:15	Hotel Nikko - Peninsula
Thursday 2nd Oct.	11:30-12:30	Hotel Nikko - Peninsula

If any new hands-on-labs appear that are related to the Big Data and Advanced Analytics areas/options I will update the above table.

Some other Hands-on-Labs that you might be interested in include:

Date	Time	Location	Hands-on-Lab Session Title
Monday 29th Sept.	17:45-18:45	Hotel Nikko - Peninsula	Oracle NoSQL Database for Application Developers [HOL9349]
Tuesday 30th Sept.	10:15-11:10	Hotel Nikko - Peninsula	Oracle NoSQL Database for Application Developers [HOL9349]
Tuesday 30th Sept.	15:45-16:45	Hotel Nikko - Nikko Ballroom III	Oracle Data Integrator 12c New Features Deep Dive [HOL9439]
Tuesday 30th Sept.	17:15-18:15	Hotel Nikko - Nikko Ballroom III	Oracle Data Integrator for Big Data [HOL9414]
Wednesday 1st Oct.	13:15-14:15	Hotel Nikko - Mendocino I/II	Set Up a Hadoop 2 Cluster with Oracle Solaris Zones, Oracle Solaris ZFS, and Unified Archive [HOL2086]
Wednesday 1st Oct.	14:45-15:45	Hotel Nikko - Peninsula	Oracle NoSQL Database for Administrators [HOL9327]
Thursday 2nd Oct.	14:30-15:30	Hotel Nikko - Peninsula	Oracle NoSQL Database for Administrators [HOL9327]

Monday, May 26, 2014

Oracle R Enterprise (ORE) Tasks for the Oracle DBA

In previous posts I gave the steps required to install Oracle R Enterprise on your Database server and your client machine.

One of the steps that I gave was the initial set of Database privileges that the DB needed to give to the RQUSER. The RQUSER is a little bit like the SCOTT/TIGER schema in the Oracle Database. Setting up the RQUSER as part of the installation process allows you to test that you can connect to the database using ORE and that you can issue some ORE commands.

After the initial testing of the ORE install you might consider locking this RQUSER schema or dropping it from the Database.

So when a new ORE user wants access to the database what steps does the DBA have to perform.

Create a new schema for the user
Grant the new schema the standard set of privileges to connect to the DB, create objects, etc.
Create any data sets in their schema
Create any views to data that exists in other schemas (and grant the necessary privileges, etc

Now we get onto the ORE specific privileges. The following are the minimum required for your user to be able to connect to their Oracle schema using ORE.

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

In most cases the first 3 privileges (TABLE, PROCEDURE and VIEW) will be standard for most schemas that you will set up. So in reality the only command or extra privilege that you will need to execute is:

GRANT CREATE MINING MODEL TO RQUSER;

This command will allow the user to connect to their Oracle schema using ORE, but what it will not allow them to do is to create any embedded R. These are R scripts that are stored in the database and can be called in their R/ORE scripts or by using the SQL API to R (I'll have more blog posts on these soon). To allow the user to create and use embedded R the DBA will also have to grant the following privilege as SYS:

GRANT RQADMIN to RQUSER;

To summarise the DBA will have to grant the following to each schema that wants to use the full power of ORE.

GRANT CREATE MINING MODEL TO RQUSER;

GRANT RQADMIN to RQUSER;

A note of Warning: Be careful what schemas you grant the RQADMIN privilege to. It is a powerful privilege and opens the database to the powerful features of R. So using the typical DBA best practice of granting privileges, the DBA should only grant the RQADMIN privilege to only the people who require it.

Tuesday, April 22, 2014

Installing ORE - Part A

This blog post will look at how you can go about installing ORE in your environment.

The install involves a 4 steps. The first step is the install on the Oracle Database server. The second step involves the install on your client machine. The third steps involves creating a schema for ORE. The fourth steps is connecting to the database using ORE.

In this Part A blog post I will cover the first two steps in this process. The other steps will be coved in another blog post.

NB : A the time of writing this blog post ORE 1.4 cannot be installed on a 12c database if it has a CDB/PDB configuration. If you want to use ORE with 12c then you need to do a traditional install that does not create a CDB with a PDB. The ORE team are working hard on this and I'm sure it will be available in the next release (or two or ...) of ORE.

Step 1 : Installing ORE on the Database Server

Before you being looking at ORE you need to ensure that you have the correct version of database. If you have version 11.2.0.3 or 11.2.0.4 then you can go ahead and perform the installation below. But if you have 11.2.0.1 or 11.2.0.2 then you will need to apply a patch to your database. See my note above about 12c.

Download the Oracle R Distribution from their website. Download here.

Although you can use the standard version of R, Oracle R Distribution comes with some highly tuned packages. If you are going to use the standard R download then you will need to ensure that you download the correct version. ORE 1.4 will require R version 3.0.1. Yes this is not the current version of R.

Accept at the defaults during the installation of ROracle, and within a minute or two ROracle will be installed.

Download the Oracle R Enterprise software. Download here. This will include the Server and Supporting downloads.

Uncompress the downloaded ORE files and go to the server directory. Here you will find the install.bat (other other similar name for your platform).

Make sure your ORACLE_HOME and ORACLE_SID environment variables are set.

A number of environment and environment variables are checked. When prompted accept the defaults.

When prompted for the password for the RQSYS user, enter an appropriate password and take careful note of it.

Now go back to the Oracle download page for ORE and download the supporting packages. Unzip the downloaded file. Noting the directory that they were installed in you can now load them in R. To do this open R and run the following commands. You will need to change the directory to where these are located on your server.

install.packages("C:/app/supporting/ROracle_1.1-11.zip", repos=NULL)

install.packages("C:/app/supporting/DBI_0.2-7.zip", repos=NULL)

install.packages("C:/app/supporting/png_0.1-7.zip", repos=NULL)

install.packages("C:/app/supporting/cairo_1.5-5.zip", repos=NULL)

Or you can use the R Gui to import these packages

WARNING:If you are installing on a Windows server you may encounter some issues when importing these packages. I will have a separate blog post on this soon.

NB: The ORE installation instructions make reference to Cario-_1.5-2.zip. This is incorrect. ORE 1.4 comes with Cario-_1.5-5.zip.

At this point, assuming you didn't have any errors, you now have ORE installed on your server.

Step 2 : Installing ORE on the Client

Download the Oracle R Distribution from their website. Download here.

NOTE: If your database and client are on the one machine then there is no need to install ROracle again.

The client install is much simpler and less involved. After you have installed ROracle the next step is to install the client packages for ORE. These can be downloaded from here.

After you have unzipped the file you can use the import packages from zip feature of the R Gui tool or using RStudio. Then import the supporting packages that you also installed as part of the server install.

Now you can install the supporting packages. Unzip them and then use the R Gui or RStudio to importing them. These supporting packages can be downloaded from here.

That should be the client R software and ORE packages installed on your client machine. The next steps is to test a connection to your Oracle database using ORE. Before you can do that you will need to setup a Schema in the database to use R and also grant the necessary privileges to your other schemas that you want to access using R

Check out my next blog post (Installing ORE - Part B) for Steps 3 and 4.

Also check out the Part C blog post on how to resolve a potential install issue on a Windows server.

Monday, April 14, 2014

Oracle Advanced Analytics and Oracle Fusion Apps

At a recent Oracle User Group conference, I was part of a round table discussion on Apps and BI. Unfortunately most of the questions were focused on Apps and the new Fusion Applications from Oracle. I mentioned that there was data mining functionality (using the Oracle Advanced Analytics Option) built into the Fusion Apps, it seems to come as a surprise to the Apps people. They were not aware of this built in functionality and capabilities. Well Oracle Data Mining and Oracle Advanced Analytics has been built into the following Oracle Fusion Applications.

Oracle Fusion HCM Workforce Predictions
Oracle Fusion CRM Sales Prediction Engine
Oracle Spend Classification
Oracle Sales Prospector
Oracle Adaptive Access Manager

Oracle Data Mining and Oracle Advanced Applications are also being used in the following applications:

Oracle Airline Data Model
Oracle Communications Data Model
Oracle Retail Data Model
Oracle Security Governor for Healthcare

I intend to submit a presentation on this topic to future Oracle User Group conferences as a way of spreading the Advanced Analytics message within the Oracle user community. If you would like me to present on this topic at your conference or SIG drop me an email and we can make the necessary arrangement :-)

Thursday, March 27, 2014

Oracle BigDataLite version 2.5.1 is now available

Back at the end of January Oracle finally go round to releasing the updated version of the Oracle BigDataLite virtual machine. Check out my previous blog post of this.

Yesterday (27th March) I say on Facebook that a new updated versions of the BigDataLite VM was released. I must have missed the tweet and other publicity on this somewhere :-(

This is a great VM that allows you to play with the various Big Data technologies without the hassle of going through the who install and configuration thing.

If you are interested in this then here are the details of what it contains and where you can find more details.

The following components are included on Oracle Big Data Lite Virtual Machine v 2.5:

Oracle Enterprise Linux 6.4

Oracle Database 12c Release 1 Enterprise Edition (12.1.0.1)

Cloudera’s Distribution including Apache Hadoop (CDH4.6)

Cloudera Manager 4.8.2

Cloudera Enterprise Technology, including:

Cloudera RTQ (Impala 1.2.3)

Cloudera RTS (Search 1.2)

Oracle Big Data Connectors 2.5

Oracle SQL Connector for HDFS 2.3.0

Oracle Loader for Hadoop 2.3.1

Oracle Data Integrator 11g

Oracle R Advanced Analytics for Hadoop 2.3.1

Oracle XQuery for Hadoop 2.4.0

Oracle NoSQL Database Enterprise Edition 12cR1 (2.1.54)

Oracle JDeveloper 11g

Oracle SQL Developer 4.0

Oracle Data Integrator 12cR1/

Oracle R Distribution 3.0.1

Go to the Oracle Big Data Lite Virtual Machine landing page on OTN to download the latest release.

Thursday, January 17, 2013

The ‘Oh No You Don’t’ of (Oracle) Data Science

Over the past couple of weeks I’ve had conversations with a large number of people about Data Science in the Oracle arena.

A few things have stood out. The first and perhaps the most important of these is that there is confusion of what Data Science actually means. Some think it is just another name for Statistics or Advanced Statistics, some Predictive Analytics or Data Mining, or Data Analysis, Data Architecture, etc.. The reality is it is not. It is more than what these terms mean and this is a topic for discussion for another day.

During these conversations the same questions or topics keep coming up and the simplest answer to all of these is taken from a Pantomime (Panto).

We need to have lots of statisticians
       'Oh No You Don't !'
We can only do Data Science if we have Big Data
        'Oh No You Don't !'
We can only do data mining/data science if we have 10’s or 100’s of Million of records
        'Oh No You Don't !'
We need to have an Exadata machine
        'Oh No You Don't !'
We need to have an Exalytics machine
        'Oh No You Don't !'
We need extra servers to process the data
        'Oh No You Don't !'
We need to buy lots of Statistical and Predictive Analytics software
        'Oh No You Don't !'
We need to spend weeks statistically analysing a predictive model
        'Oh No You Don't !'
We need to have unstructured data to do Data Science
        'Oh No You Don't !'
Data Science is only for large companies
        'Oh No You Don't !'
Data Science is very complex, I can not do it
        'Oh No You Don't !'

Let us all say it together for one last time ‘Oh No You Don’t’

In its simplest form, performing Data Science using the Oracle stack, just involves learning and using some simple SQL and PL/SQL functions in the database.

Maybe we (in the Oracle Data Science world and those looking to get into it) need to adopt a phrase that is used by Barrack Obama of ‘Yes We Can’, or as he said it in Irish when he visited Ireland back in 2011, ‘Is Feidir Linn’.

Remember it is just SQL.

Friday, November 23, 2012

Association Rules in ODM–Part 1

This is a the first part of a four part blog post on building and using Association Rules in Oracle Data Miner. The following outlines the contents of each post in the series on Association Rules

This first part will focus on how to building an Association Rule model
The second post will be on examining the Association Rules produced by ODM – This blog post
The third post will focus on using the Association Rules on your data.
The final post will look at how you can do some of the above steps using the ODM SQL and PL/SQL functions.

The data set we will be using for Association Rule Analysis will be the sample data that comes with the SH schema in the database. Access to this schema and it’s data was setup when we created our data mining schema and ODM Repository.

Step 1 – Getting setup

As with all data mining projects you will need a workspace that will contain your workflows. Based on my previous ODM blog posts you will have already created a Project and some workflows. You can either reuse an existing workflow you have used for one of the other ODM modeling algorithms or you can create a new Workflow called Association Rules.

Step 2 – Define your Data Set

Assuming that your database has been setup to have the Sample schemas and their corresponding data, we will be using the data that is in the SH schema. In a previous post, I gave some instructions on setting up your database to use ODM and part of that involved a step to give your ODM schema access to the sample schema data.

We will start off by creating a Data Source Node. Click on the Data Source Node under the Component Palette. Then move your mouse to your your workspace area and click. A Data Source Node will be created and a window will open. Scroll down the list of Available Tables until you find the SH.SALES table. Click on this table and then click on the Next button. We want to include all the data so we can now click the Finish Button.

Our Data Source Node will now be renamed to SALES.

Step 3 – Setup the Association Build Node

Under the Model section of the Component Palette select Association. Move the mouse to your work area (and perhaps just the to right of the SALES node) click. Our Association Node will be created.

For the next step we need to join the our data source (SALES) with the Association Build Node. Right click on the SALES data node and select Connect from the drop down menu. Then move the mouse to the Association Build node and click. You should now have the two nodes connected.

We will now get the Edit Association Build Node property window opening for us. We will need to enter the following information:

Transaction ID: This is the attribute(s) that can be used to uniquely identify each transaction. In our example the Customer ID and the Time ID of the transaction allows us to identify what we want to analyse by i.e. the basket. This will group all the related transactions together
Item ID: What is the attribute of the thing you want to analyse. In our case we want to analyse the Products purchased, so select PROD_ID in this case
Value: This is an identifier used to specify another column with the transaction data to combine with the Item ID. <Existence> means that you want to see if there are any type of common bundling among all values of the selected Item ID. Use this.

Like all data mining products, Oracle has just one Algorithm to use for Association Rule Analysis, the Apriori Algorithm.

Click the OK button. You are now ready to run the Association Build Node. Right click on the node and select Run from the menu. After a short time everything should finish and we will have the little green tick makes on each of the nodes.

Check out the next post in the series (Part 2) where we will look at how you can examine the rules produced by our model in ODM.

Friday, November 16, 2012

Accepted for BIWA Summit–9th to 10th January

I received an email today to say that I had a presentation accepted for the BIWA Summit. This conference will be in the Sofitel Hotel beside the Oracle HQ in Redwood City.

The title of the presentation is “The Oracle Data Scientist” and the abstract is

Over the past 18 months we have seen a significant increase in the demand for Data Scientists. But how does someone become a data scientist. If we examine the requirements and job descriptions of this role we can see that being able to understand and process data are fundamental skills. So an Oracle developer is ideally suited to being a Data Scientist. The presentation will show how an Oracle developer can evolve into a data scientist through a number of stages, including BI developer, OBIEE developer, statistical analysis, data miner and data scientist. The tasks and tools will be discussed and explored through each of these roles. The second half of the presentation will focus on the data mining functionality available in SQL and PL/SQL. This will consist of a demonstration of an Analytics Development environment and how you can migrate (and use) your models in a Production environment

For some reason Simon Cowell of XFactor fame kept on popping into my head and it now looks like he will be making an appearance in the presentation too. You will have to wait until the conference to find out what Simon Cowell and Being an Oracle Data Scientist have in common.

Check out the BIWA Summit website for more details and to register for the event.

I’ll see you there Smile

Sunday, November 4, 2012

Events for Oracle Users in Ireland-November 2012

November (2012) is going to be a busy month for Oracle users in Ireland. There is a mixture of Oracle User Group events, with Oracle Day and the OTN Developer Days. To round off the year we have the UKOUG Conference during the first week in December.

Here are the dates and web links for each event.

Oracle User Group

The BI & EPM SIG will be having their next meeting on the Tuesday 20th November. This is almost a full day event, with presentations from End Users, Partners and Oracle product management. The main focus of the day will be on EPM, but will also be of interest to BI people.

As with all SIG meetings, this SIG will be held in the Oracle office in East Point (Block H). Things kick off at 9am and are due to finish around 4pm with plenty of tea/coffee and a free lunch too.

Remember to follow OUG Ireland on twitter using #oug_ire

Oracle Day

Oracle will be having their Oracle Day 2012, on Thursday 15th, in Croke Park. Here is some of the blurb about the event, “…to learn how Oracle simplifies IT, whether it’s by engineering hardware and software to work together or making new technologies work for the modern enterprise. Sessions and keynotes feature an elite roster of Oracle solutions experts, partners and business associates, as well as fascinating user case studies and live demos.”

This is a full day event from 9am to 5pm with 3 parallel streams focusing on Big Data, Enterprise Applications and the Cloud.

Click here to register for this event.

Click here for the full details and agenda.

OTN Developer Days

Oracle run their developer days about 3 times a year in Dublin. These events are run like a Hands-on Lab. So most of the work during the day is by yourself. You are provided with a workbook, a laptop and a virtual machine configured for the hands-on lab. This November we have the following developers days in the Oracle office in East Point, Dublin.

Tuesday 27th November (9:45-15:00) : Real Application Testing

Wednesday 28th November (9:00-14:00) : Partitioning/Advanced Compression

Thursday 29th November (9:15-13:30) : Database Security

Friday 30th November (9:45-16:00) : Business Process Management Using BPM Suite 11g

As you can see we have almost a full week of FREE training from Oracle. So there is no reason not to sign up for these days.

UKOUG Conference – in Birmingham

In December we have the annual UKOUG Conference. This is the largest Oracle User Group conference in Europe and the largest outside of the USA. At this conference you will have some of the main speakers and presentations from Oracle Open World, along with a range of speakers from all over the work.

In keeping with previous years there will be the OakTable Sunday and new this year there will be a Middleware Sunday. You need to register separately for these events. Here are the links

OakTable Sunday

Middleware Sunday

The main conference kicks off on the Monday morning with a very full agenda for Monday, Tuesday and Wednesday. There are a number of social events on the Monday and Tuesday, so come well rested.

On the Monday evening there is the focus pubs. This year it seems to have an Irish Pub theme. At the focus pub event there will be table for each of the user group SIGs.

Come and join me at the Ireland table on the Monday evening.

The full agenda in now live and you can get all the details here.

I will be giving a presentation on the Tuesday afternoon titled Getting Real Business Value from Predictive Analytics (OBIEE and Oracle Data Mining). This is a joint presentation with Antony Heljula of Peak Indicators.

Saturday, October 20, 2012

Oracle Advanced Analytics Option in Oracle 12c

At Oracle Open World a few weeks ago there was a large number of presentations on Big Data and Analytics. Most of these were marketing type presentations, with a couple of presentations on using R and how it can not be integrated into the Oracle Database 11.2.

In addition this these there was one presentation that focused on the Oracle Advanced Analytics (OAA) Option.

The Oracle Advanced Analytics Option covers the Oracle Data Mining features and the Oracle R Enterprise features in the Database.

The purpose of this blog post is to outline and summarise what was mentioned at these presentations, and will include what changes are/may be coming in the “Next Release” of the database i.e. Oracle 12c.

Health Warning: As with all the presentations at OOW that talked about what may be in or may be in the next release, there is not guarantee that the features will actually be in the release version of the database. Here is the slide that gives the Safe Harbor statement.

12c will come with R embedded into it. So there will be no need for any configurations.
Oracle R client will come as part of the server install.
Oracle R client will be able to use the Analytics functions that exist in the database.
Will be able to run R code in the database.
The database (12c) will be able to spawn multiple R engines.
Will be able to emulate map-reduce style algorithms.
There will be new PREDICTION function, replacing the existing (11g) functionality. This will combine a number of steps of building a model and applying it to the data to be scored into one function. But we will still need the functionality of the existing PREDICTION function that is in 11g. So it will be interesting to see how this functionality will be kept in addition to the new functionality being proposed in 12c.
Although the Oracle Data Miner tool will still exits and will have many new features. It was also referred to as the ‘OAA Workflow’. So those this indicate a potential name change? We will have to wait and see.
Oracle Data Miner will come with a new additional graphing feature. This will be in addition to the Explore Node and will allow us to produce more typical attribute related graphs. From what I could see these would be similar to the type of box plot, scatter, bar chart, etc. graphs that you can get from R.
There will be a number of new algorithms too, including a useful One Class Support Vector Machine. This can be used when we have a data set with just one class value. This algorithm will work out what records/cases are more important and others.
There will be a new SQL node. This will allow us to write our own data transformation code.
There will be a new node to allow the calling of R code.
The tool also comes with a slightly modified layout and colour scheme.

Again, the points that I have given above are just my observations. They may or may not appear in 12c, or maybe I misunderstood what was being said.

It certainly looks like we will have a integrate analytics environment in 12c with full integration of R and the ODM in-database features.

Wednesday, October 17, 2012

Extracting the rules from an ODM Decision Tree model

One of the most interesting of important aspects of a Decision Model is that we as a user can get to see what rules the machine learning algorithm has generated for our data.

I’ve give a number of examples in various blog posts over the past few years on how to generate a number of classification models. An example of the workflow is below.

In the Class Build node we get four models being generated. These include a Generalised Linear Model, Support Vector Machine, Naive Bayes and a Decision Tree model.

We can explore the Decision Tree model by right clicking on the Class Build Node, selecting View Models and then the Decision Tree model, which will be labelled with a ‘DT’ in the name.

As we explore the nodes and branches of the Decision Tree we can see the rule that was generated for a node in the lower pane of the applications. So by clicking on each node we get a different rule appearing in this pane

Sometimes there is a need to extract this rules so that they can be presented to a number of different types of users, to explain to them what is going on.

How can we extract the Decision Tree rules?

To do this, you will need to complete the following steps:

From the Models section of the Component Palette select the Model Details node.
Click on the Workflow pane and the Model Details node will be created
Connect the Class Build node to the Model Details node. To do this right click on the Class Build node and select Connect. Then move the mouse to the Model Details node and click. The two nodes should now be connected.
Edit the Model Details node, uncheck the Auto Settings, select Model Type to be Decision Tree, Output to be Full Tree and all the columns.

Run the Model Details node. Right click on the node and select run. When complete you you will have the little green box with a tick mark, on the top right hand corner.
To view the details produced, right click on the Model Details node and select View Data
The rules for each node will now be displayed. You will need to scroll to the right of this pane to get to the rules and you will need to expand the columns for the rules to see the full details

Friday, October 12, 2012

My Presentations on Oracle Advanced Analytics Option

I’ve recently compiled my list of presentation on the Oracle Analytics Option. All these presentations are for a 45 minute period.

I have two versions of the presentation ‘How to do Data Mining in SQL & PL/SQL’, one is for 45 minutes and the second version is for 2 hour.

I have given most of these presentations at conferences or SIGS.

Let me know if you are interesting in having one of these presentations at your SIG or conference.

Oracle Analytics Option - 12c New Features - available 2013
Real-time prediction in SQL & Oracle Analytics Option - Using the 12c PREDICTION function - available 2013
How to do Data Mining in SQL & PL/SQL
From BIG Data to Small Data and Everything in Between
Oracle R Enterprise : How to get started
Oracle Analytics Option : R vs Oracle Data Mining
Building Predictive Analysts into your Forms Applications
Getting Real Business Value from OBIEE and Oracle Data Mining (This is a cut down and merged version of the follow two presentations)
Getting Real Business Value from OBIEE and Oracle Data Mining - Part 1 : The Oracle Data Miner part
Getting Real Business Value from OBIEE and Oracle Data Mining - Part 2 : The OBIEE part
How to Deploying and Using your Oracle Data Miner Models in Production
Oracle Analytics Option 101
From SQL Programmer to Data Scientist: evolving roles of an Oracle programmer
Using an Oracle Oracle Data Mining Model in SQL & PL/SQL
Getting Started with Oracle Data Mining
You don't need a PhD to do Data Mining

Check out the ‘My Presentations’ page for updates on new presentations.

Saturday, September 22, 2012

Oracle Open World Schedule

Oracle Open World is fast approaching. Over the past couple of weeks I have been using the schedule builder tool to work out what sessions I would like to attend. Unfortunately there are LOTS of sessions I would love to attend but I haven’t worked out a way to be in 10 places at the same time.

When attending a conference I try to achieve a number of things. These are find out about new topics/features, benchmark my knowledge of existing topics, try some of the hands-on labs, try something new and do something completely different. This will be my challenge at Oracle Open World.

There are a number of other people from Ireland who will be attending OOW, and there are some plans to have an Ireland social event. Plus there is lots of meetings/catch-ups planned too with people I know from the virtual Oracle world.

There will be some people from AIB who will be presenting at OOW. Their presentation will be on the Tuesday morning 10:15-11:00. I’ll be there.

Other social things that are on include the Oracle ACE dinner, the Oracle Music festival, with Kings of Leon playing support to Pearl Jam on the Wednesdays night.

It is going to be a busy week, an enjoyable week, a week of learning new things and finding out lots of what the 12c database will be like.

Will I get time to go to everything ? The simple answer is NO. So I will just have to try to get to another Oracle Open World soon.

Friday, August 24, 2012

Big Data videos by Oracle

Here are the links to the 2 different sets of Big Data videos that Oracle have produced over the past 12 months

Oracle Big Data Videos - Version 1

Episode 1 - The Challenge

Episode 2 - Gold Mine or Just Stuff

Episode 3 - Big Data Speaks

Episode 4 - Everything You Always Wanted to Know

Episode 5 - Little Data

Oracle Big Data Videos - Version 2

Episode 1 - Overview for the Boss

Episode 2 - Hadoop

Episode 3 - Acquiring Big Data

Episode 4 - Organising Big Data

Episode 5 - Analysing Big Data

Tuesday, June 26, 2012

Analytics Sessions at Oracle Open World 2012

The content catalog for Oracle Open World 2012 was made public during the week. OOW is on between 30th September and 4th October.

The following table gives a list of most of the Data Analytics type sessions that are currently scheduled.

Why did I pick these sessions? If I was able to go to OOW then these are the sessions I would like to attend. Yes there would be many more sessions I would like to attend on the core DB technology and Development streams.

Session Title	Presenters
CON6640 - Database Data Mining: Practical Enterprise R and Oracle Advanced Analytics	Husnu Sensoy
CON8688 - Customer Perspectives: Oracle Data Integrator	Gurcan Orhan - Software Architect & Senior Developer, Turkcell Technology R&D Julien Testut - Product Manager, Oracle
HOL10089 - Oracle Big Data Analytics and R	George Lumpkin - Vice President, Product Management, Oracle
CON8655 - Tackling Big Data Analytics with Oracle Data Integrator	Mala Narasimharajan - Senior Product Marketing Manager, Oracle Michael Eisterer - Principal Product Manager, Oracle
CON8436 - Data Warehousing and Big Data with the Latest Generation of Database Technology	George Lumpkin - Vice President, Product Management, Oracle
CON8424 - Oracle’s Big Data Platform: Settling the Debate	Martin Gubar - Director, Oracle Kuassi Mensah - Director Product Management, Oracle
CON8423 - Finding Gold in Your Data Warehouse: Oracle Advanced Analytics	Charles Berger - Senior Director, Product Management, Data Mining and Advanced Analytics, Oracle
CON8764 - Analytics for Oracle Fusion Applications: Overview and Strategy	Florian Schouten - Senior Director, Product Management/Strategy, Oracle
CON8330 - Implementing Big Data Solutions: From Theory to Practice	Josef Pugh - , Oracle
CON8524 - Oracle TimesTen In-Memory Database for Oracle Exalytics: Overview	Tirthankar Lahiri - Senior Director, Oracle
CON9510 - Oracle BI Analytics and Reporting: Where to Start?	Mauricio Alvarado - Principal Product Manager, Oracle
CON8438 - Scalable Statistics and Advanced Analytics: Using R in the Enterprise	Marcos Arancibia Coddou - Product Manager, Oracle Advanced Analytics, Oracle
CON4951 - Southwestern Energy’s Creation of the Analytical Enterprise	Jim Vick - , Southwestern Energy Richard Solari - Specialist Leader, Deloitte Consulting LLP
CON8311 - Mining Big Data with Semantic Web Technology: Discovering What You Didn’t Know	Zhe Wu - Consultant Member of Tech Staff, Oracle Xavier Lopez - Director, Product Management, Oracle
CON8428 - Analyze This! Analytical Power in SQL, More Than You Ever Dreamt Of	Hermann Baer - Director Product Management, Oracle Andrew Witkowski - Architect, Oracle
CON6143 - Big Data in Financial Services: Technologies, Use Cases, and Implications	Omer Trajman - , Cloudera Ambreesh Khanna - Industry Vice President, Oracle Sunil Mathew - Senior Director, Financial Services Industry Technology, Oracle
CON8425 - Big Data: The Big Story	Jean-Pierre Dijcks - Sr. Principal Product Manager, Oracle
CON10327 - Recommendations in R: Scaling from Small to Big Data	Mark Hornick - Senior Manager, Oracle

Monday, June 25, 2012

R resources

Download R : http://www.r-project.org/

R installation instructions : http://star-www.st-andrews.ac.uk/cran/

Oracle R Enterprise

R-Uni (A List of 85+ Free R Tutorials and Resources in Universities webpages)

R programming for those coming from other languages

Understanding big data analysis with the R language

Introduction to R for Data Mining from Revolution Analytics

Wednesday, June 20, 2012

Part 2 of the Leaning Tower of Pisa problem in ODM

In previous post I gave the details of how you can use Regression in Oracle Data Miner to predict/forecast the lean of the tower in future years. This was based on building a regression model in ODM using the known lean/tilt of the tower for a range of years.

In this post I will show you how you can do the same tasks using the Oracle Data Miner functions in SQL and PL/SQL.

Step 1 – Create the table and data

The easiest way to do this is to make a copy of the PISA table we created in the previous blog post. If you haven’t completed this, then go to the blog post and complete step 1 and step 2.

create table PISA_2
as select * from PISA;

Step 2 – Create the ODM Settings table

We need to create a ‘settings’ table before we can use the ODM API’s in PL/SQL. The purpose of this table is to store all the configuration parameters needed for the algorithm to work. In our case we only need to set two parameters.

BEGIN
delete from pisa_2_settings;
INSERT INTO PISA_2_settings (setting_name, setting_value) VALUES
(dbms_data_mining.algo_name, dbms_data_mining.ALGO_GENERALIZED_LINEAR_MODEL);
INSERT INTO PISA_2_settings (setting_name, setting_value) VALUES
(dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_off );
COMMIT;
END;

Step 3 – Build the Regression Model

To build the regression model we need to use the CREATE_MODEL function that is part of the DBMS_DATA_MINING package. When calling this function we need to pass in the name of the model, the algorithm to use, the source data, the setting table and the target column we are interested in.

BEGIN
      DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => 'PISA_REG_2',
        mining_function     => dbms_data_mining.regression,
        data_table_name     => 'pisa_2_build_v',
        case_id_column_name => null,
        target_column_name => 'tilt',
        settings_table_name => 'pisa_2_settings');
END;

After this we should have our regression model.

Step 4 – Query the Regression Model details

To find out what was produced as in the previous step we can query the data dictionary.

SELECT model_name,
       mining_function,
       algorithm,
       build_duration,
       model_size
from USER_MINING_MODELS
where model_name like 'P%';

select setting_name,
setting_value,
setting_type
from all_mining_model_settings
where model_name like 'P%';

Step 5 – Apply the Regression Model to new data

Our final step would be to apply it to our new data i.e. the years that we want to know what the lean/tilt would be.

SELECT year_measured, prediction(pisa_reg_2 using *)
FROM pisa_2_apply_v;

Wednesday, June 13, 2012

Data Science Is Multidisciplinary

[Update :October 2016. There appears to be some discussion about the Venn diagram I've proposed below. The central part of this diagram is not anything I can up with. It was a commonly used Venn diagram for Data Mining. Thanks to Polly Michell-Guthrie for providing the original reference for the Venn. I just added the outer ring of additional skills needed for the new area of Data Science. This was just my view of things back in 2012. Things have moved on a bit since then]

A few weeks ago I had a blog post called Domain Knowledge + Data Skills = Data Miner.
In that blog post I was saying that to be a Data Scientist all you needed was Domain Knowledge and some Data Skills, which included Data Mining.
The reality is that the skill set of a Data Scientist will be much larger. There is a saying ‘A jack of all trades and a master of none’. When it comes to being a data scientist you need to be a bit like this but perhaps a better saying would be ‘A jack of all trades and a master of some’.
I’ve put together the following diagram, which includes most of the skills with an out circle of more fundamental skills. It is this outer ring of skills that are fundamental in becoming a data scientist. The skills in the inner part of the diagram are skills that most people will have some experience in one or more of them. The other skills can be developed and learned over time, all depending on the type of person you are.

Can we train someone to become a data scientist or are they born to be a data scientist. It is a little bit of both really but you need to have some of the fundamental skills and the right type of personality. The learning of the other skills should be easy(ish)
What do you think? Are their Skill that I’m missing?

Pages

Friday, August 14, 2015

Thursday, July 30, 2015

Wednesday, September 17, 2014

Monday, May 26, 2014

Tuesday, April 22, 2014

Monday, April 14, 2014

Thursday, March 27, 2014

Thursday, January 17, 2013

Friday, November 23, 2012

Friday, November 16, 2012

Sunday, November 4, 2012

Saturday, October 20, 2012

Wednesday, October 17, 2012

Friday, October 12, 2012

Saturday, September 22, 2012

Friday, August 24, 2012

Tuesday, June 26, 2012

Monday, June 25, 2012

Wednesday, June 20, 2012

Wednesday, June 13, 2012