Brendan Tierney - Oralytics Blog: Oracle R Enterprise

Showing posts with label Oracle R Enterprise. Show all posts

Thursday, September 11, 2014

apropos("^ore")

We have all been in the position of trying to find the name of a command in a language, particularly if you are not totally sure of the full command name.

I've been working with R a lot recently and in particular Oracle R Enterprise. I was always trying to remember what the full command name was. Then I found the apropos function. The apropos function allows you to search R for commands based on a part or partial name. You can use regular expression syntax to define what part of the function name you are looking for.

What I ended up using most often was the following command. This function call looks for all functions being with 'ore'.

> apropos("^ore")

Apropos

To find out more about how to use the apropos command check out the R help.

> help(apropos)

Tuesday, September 9, 2014

ORE now available for Multitenant (PDB) version of 12c

Oracle has released an update to their Oracle R Enterprise software. We now have ORE 1.4.1 and this seems to have been released on the past day or so.

Here are the links to the important stuff:

ORE 1.4.1 Release Note

ORE 1.4.1 User Guide

ORE 1.4.1 Installation Guide

ORE 1.4.1 Download page

One of the main features of this new release is that it now supports the multi tenant option of the 12c database. Up to now if you wanted to use ORE and 12c then you needed to do a traditional install of the database. That means you would be just installing a single instance of the 12c database with no CDB or PDB.

With ORE 1.4.1 you can now install ORE into a PDB. It needs to be one of your current PDBs and should not be installed into the root PDB, otherwise it will not work. Check out the installation instructions using the links above.

As with all new releases there are a lot of bug fixes and perhaps some new ones too :-)

Monday, June 2, 2014

ore.parallel

In ORE there are a number ways to get you R scripts to run in parallel in the database. One way is to enable the Parallel option in ORE. This is what will be shown in this post. There are other methods of running various ORE commands/scripts in parallel. With these the scripts are divided out and several parallel R processes are started on the server.

But what if you want to use the database parallel feature on some of your ORE other commands?

Why would you want to do this?

Well the main answer is that you might want to use the parallel option of the database for the creation on objects (tables etc) and for selecting and manipulating the data in the database.

How can you enable your ORE connection to use the in-database parallel feature?

ORE 1.4 has a new option that enables the parallel option for your ORE connection in the database. This option is called ore.parallel.

When you enable or set the ore.parallel option, it seems to be the equivalent of running the following:

ALTER SESSION ENABLE PARALLEL DDL;

ALTER SESSION ENABLE PARALLEL DML;

ALTER SESSION ENABLE PARALLEL QUERY;

The exact details is a little unclear, but it seems to be above commands.

The following commands illustrates some options for using the ore.parallel option.

> #

> # Check to see if the ore.parallel is enabled for your ORE connection

> options("ore.parallel")

$ore.parallel

NULL

The NULL returned value tells us that your ORE connections does not have the Parallel option enabled. If the schema had Parallel enabled by default then we would have have a response of TRUE.

The following command turns on the Parallel option for your ORE connection / schema.

> options("ore.parallel" = TRUE)

> options("ore.parallel")

$ore.parallel

[1] TRUE

When the Parallel option is enabled (TRUE above) the database will use the degree of parallel that is set as default for the schema or the degree of parallel that is defined for the table when it is being used in your ORE commands.

You can changed the degree of parallelism by passing the required degree as a value to the ore.parallel command. In the following, the degree of parallelism is set to 8. We then as ORE what the degree is set to and it tells us that it is 8. So it was set correctly.

> options("ore.parallel" = 8)

> options("ore.parallel")

$ore.parallel

[1] 8

Monday, May 26, 2014

Oracle R Enterprise (ORE) Tasks for the Oracle DBA

In previous posts I gave the steps required to install Oracle R Enterprise on your Database server and your client machine.

One of the steps that I gave was the initial set of Database privileges that the DB needed to give to the RQUSER. The RQUSER is a little bit like the SCOTT/TIGER schema in the Oracle Database. Setting up the RQUSER as part of the installation process allows you to test that you can connect to the database using ORE and that you can issue some ORE commands.

After the initial testing of the ORE install you might consider locking this RQUSER schema or dropping it from the Database.

So when a new ORE user wants access to the database what steps does the DBA have to perform.

Create a new schema for the user
Grant the new schema the standard set of privileges to connect to the DB, create objects, etc.
Create any data sets in their schema
Create any views to data that exists in other schemas (and grant the necessary privileges, etc

Now we get onto the ORE specific privileges. The following are the minimum required for your user to be able to connect to their Oracle schema using ORE.

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

In most cases the first 3 privileges (TABLE, PROCEDURE and VIEW) will be standard for most schemas that you will set up. So in reality the only command or extra privilege that you will need to execute is:

GRANT CREATE MINING MODEL TO RQUSER;

This command will allow the user to connect to their Oracle schema using ORE, but what it will not allow them to do is to create any embedded R. These are R scripts that are stored in the database and can be called in their R/ORE scripts or by using the SQL API to R (I'll have more blog posts on these soon). To allow the user to create and use embedded R the DBA will also have to grant the following privilege as SYS:

GRANT RQADMIN to RQUSER;

To summarise the DBA will have to grant the following to each schema that wants to use the full power of ORE.

GRANT CREATE MINING MODEL TO RQUSER;

GRANT RQADMIN to RQUSER;

A note of Warning: Be careful what schemas you grant the RQADMIN privilege to. It is a powerful privilege and opens the database to the powerful features of R. So using the typical DBA best practice of granting privileges, the DBA should only grant the RQADMIN privilege to only the people who require it.

Tuesday, April 29, 2014

Installing ORE - Part C - Issue installing ORE on Windows Server

In my previous two blog posts (Part-A and Part-B) I detailed 4 steps for how you can install ORE on your servers and on your client machines.

I also mentioned a possible issue you may encounter if you try to install ORE on a Windows server. This blog post will look at this issue and how you can workaround it and get ORE installed.

The problem occurs when I when to install the ORE Supporting packages.

I was prompted to install these into a new library directory. If you get this error message then something is wrong and you should not proceed with installing these packages. If you do proceed and install them in a new library directory then they will not be seen by ORE and the database (as they were not installed in the $ORACLE_HOME/R/library) and when you go to run ORE from within R you will get errors like the following

package ‘Cairo’ successfully unpacked and MD5 sums checked

package ‘DBI’ successfully unpacked and MD5 sums checked

package ‘png’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘png’

package ‘ROracle’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘ROracle’

If I try the ore.connect I get the following errors.

ore.connect(user="RQUSER", sid="orcl", host="localhost", password="RQUSER", port=1521, all=TRUE)

Loading required package: ROracle

Error in .ore.oracleQuerySetup() :

ORACLE connection requires ROracle package

In addition: Warning message:

In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ROracle’

To overcome this ORE install issue all you need to do is to close down your R Gui, then add the following lines to the Rprofile file. The Rprofile file is located in R\etc directory C:\Program Files\R\R-3.0.1\etc. Add the following lines:

# Add $ORACLE_HOME/R/library to .libPaths() for ORE packages

.libPaths("C:/app/oracle/product/11.2.0/dbhome_1/R/library")

The above line will tell R to look in or to include the R directory in the Oracle home as part of its search path. You many need to change the directory above to point to your Oracle home. When you log into the R Gui the path above will be included. Now you can install the packages and then import the packages. This time they will be installed in the $ORACLE_HOME/R/library.

When you open the R Gui and run the command to load the ORE package and to connect to your ORE schema you should not receive any error messages.

> library(ORE)

> ore.connect(user="RQUSER", sid="orcl", host="localhost", password="RQUSER", port=1521, all=TRUE)

Now you should have ORE installed and working on your Windows server.

Thursday, April 24, 2014

Installing ORE - Part B

This is the second part of a two part blog post on installing ORE.

In reality there are 3 blog posts on installing ORE. The third and next blog post will be on a particular issue you might encounter on a Windows server and how you can over come the issue.

In the previous blog post I outlined the steps needed to install ORE on the database server and on the client machine. Click here to go to this post.

In this blog post I will show you how to setup a schema for ORE and how to get connected to the schema using ORE.

Step 3 : Setting up your Schema to use ORE / Tasks for your DBA

On the server when you unzipped the ORE download, you will find a demo_user.bat script (something similar like demo_user.sh on Linux).

After the script has performed some checks, you will be asked do you want to create a demo schema. Enter yes for this task to be completed and the RQUSER schema will be created in your schema. Then enter the password for the RQUSER.

The RQUSER can as a small set of system privileges that allow it to connect to and perform some functions on the database. This include:

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

NOTE: If you cannot connect to the database using the RQUSER and the password you set, then you might need to also grant CONNECT and RESOURCE to it too.

For every schema that you want to access using ORE you will need to grant the above to them.

In addition to these grants, if you want a schema to be able to create and drop R scripts in the database then you will need to grant them the addition role of RQADMIN.

sqlplus / AS SYSDBA

GRANT RQADMIN to RQUSER;

NB: You will need to grant RQADMIN to an schema where you want to use the embedded ORE in the database.

Step 4 : Connecting to the Database

If you have complete all of the above steps you are now ready to use ORE to connect to your database. The following is an example of the ore.connect command that you can use. It is assuming the RQUSER has the password RQUSER, and the the host is on the local machine (localhost). Replace localhost with the host name of your database server and also change the SID to that of your database.

ore.connect(user="rquser", sid="orcl", host="localhost", password="rquser", port=1521, all=TRUE);

If you get no errors and you get the R prompt back then you are connected to the RQUSER schema in your database.

To test that the connection was made you can run the following ORE command and then list the tables in the schema.

> ore.is.connected()

[1] TRUE

> ore.ls()

character(0)

The output of the last line above tells us that we do not have any tables in our RQUSER schema. I will have more blog posts on how you can use ORE and perform various ORE analytics in future posts.

There are a series of demonstrations that come with ORE. To access these type in the following command which will list the available ORE demos.

> demo(package="ORE")

The following command illustrates how you can run the ORE demo called basic.

> demo(basic, package="ORE")

Also check out the Part C blog post on how to resolve a potential install issue on a Windows server.

Tuesday, April 22, 2014

Installing ORE - Part A

This blog post will look at how you can go about installing ORE in your environment.

The install involves a 4 steps. The first step is the install on the Oracle Database server. The second step involves the install on your client machine. The third steps involves creating a schema for ORE. The fourth steps is connecting to the database using ORE.

In this Part A blog post I will cover the first two steps in this process. The other steps will be coved in another blog post.

NB : A the time of writing this blog post ORE 1.4 cannot be installed on a 12c database if it has a CDB/PDB configuration. If you want to use ORE with 12c then you need to do a traditional install that does not create a CDB with a PDB. The ORE team are working hard on this and I'm sure it will be available in the next release (or two or ...) of ORE.

Step 1 : Installing ORE on the Database Server

Before you being looking at ORE you need to ensure that you have the correct version of database. If you have version 11.2.0.3 or 11.2.0.4 then you can go ahead and perform the installation below. But if you have 11.2.0.1 or 11.2.0.2 then you will need to apply a patch to your database. See my note above about 12c.

Download the Oracle R Distribution from their website. Download here.

Although you can use the standard version of R, Oracle R Distribution comes with some highly tuned packages. If you are going to use the standard R download then you will need to ensure that you download the correct version. ORE 1.4 will require R version 3.0.1. Yes this is not the current version of R.

Accept at the defaults during the installation of ROracle, and within a minute or two ROracle will be installed.

Download the Oracle R Enterprise software. Download here. This will include the Server and Supporting downloads.

Uncompress the downloaded ORE files and go to the server directory. Here you will find the install.bat (other other similar name for your platform).

Make sure your ORACLE_HOME and ORACLE_SID environment variables are set.

A number of environment and environment variables are checked. When prompted accept the defaults.

When prompted for the password for the RQSYS user, enter an appropriate password and take careful note of it.

Now go back to the Oracle download page for ORE and download the supporting packages. Unzip the downloaded file. Noting the directory that they were installed in you can now load them in R. To do this open R and run the following commands. You will need to change the directory to where these are located on your server.

install.packages("C:/app/supporting/ROracle_1.1-11.zip", repos=NULL)

install.packages("C:/app/supporting/DBI_0.2-7.zip", repos=NULL)

install.packages("C:/app/supporting/png_0.1-7.zip", repos=NULL)

install.packages("C:/app/supporting/cairo_1.5-5.zip", repos=NULL)

Or you can use the R Gui to import these packages

WARNING:If you are installing on a Windows server you may encounter some issues when importing these packages. I will have a separate blog post on this soon.

NB: The ORE installation instructions make reference to Cario-_1.5-2.zip. This is incorrect. ORE 1.4 comes with Cario-_1.5-5.zip.

At this point, assuming you didn't have any errors, you now have ORE installed on your server.

Step 2 : Installing ORE on the Client

Download the Oracle R Distribution from their website. Download here.

NOTE: If your database and client are on the one machine then there is no need to install ROracle again.

The client install is much simpler and less involved. After you have installed ROracle the next step is to install the client packages for ORE. These can be downloaded from here.

After you have unzipped the file you can use the import packages from zip feature of the R Gui tool or using RStudio. Then import the supporting packages that you also installed as part of the server install.

Now you can install the supporting packages. Unzip them and then use the R Gui or RStudio to importing them. These supporting packages can be downloaded from here.

That should be the client R software and ORE packages installed on your client machine. The next steps is to test a connection to your Oracle database using ORE. Before you can do that you will need to setup a Schema in the database to use R and also grant the necessary privileges to your other schemas that you want to access using R

Check out my next blog post (Installing ORE - Part B) for Steps 3 and 4.

Also check out the Part C blog post on how to resolve a potential install issue on a Windows server.

Thursday, March 27, 2014

Oracle BigDataLite version 2.5.1 is now available

Back at the end of January Oracle finally go round to releasing the updated version of the Oracle BigDataLite virtual machine. Check out my previous blog post of this.

Yesterday (27th March) I say on Facebook that a new updated versions of the BigDataLite VM was released. I must have missed the tweet and other publicity on this somewhere :-(

This is a great VM that allows you to play with the various Big Data technologies without the hassle of going through the who install and configuration thing.

If you are interested in this then here are the details of what it contains and where you can find more details.

The following components are included on Oracle Big Data Lite Virtual Machine v 2.5:

Oracle Enterprise Linux 6.4

Oracle Database 12c Release 1 Enterprise Edition (12.1.0.1)

Cloudera’s Distribution including Apache Hadoop (CDH4.6)

Cloudera Manager 4.8.2

Cloudera Enterprise Technology, including:

Cloudera RTQ (Impala 1.2.3)

Cloudera RTS (Search 1.2)

Oracle Big Data Connectors 2.5

Oracle SQL Connector for HDFS 2.3.0

Oracle Loader for Hadoop 2.3.1

Oracle Data Integrator 11g

Oracle R Advanced Analytics for Hadoop 2.3.1

Oracle XQuery for Hadoop 2.4.0

Oracle NoSQL Database Enterprise Edition 12cR1 (2.1.54)

Oracle JDeveloper 11g

Oracle SQL Developer 4.0

Oracle Data Integrator 12cR1/

Oracle R Distribution 3.0.1

Go to the Oracle Big Data Lite Virtual Machine landing page on OTN to download the latest release.

Wednesday, March 26, 2014

Predicting using ORE package

In a previous post I gave a an overview of the various in-database data mining algorithms that you can use in your Oracle R Enterprise scripts.

To create data mining models based on those algorithms you need to use the ore.odm functions.

After you have developed and tested your models you will select one of these to score your new data.

How can you do this using ORE? There is a suite of ORE functions called ore.predict that you can use to apply your data mining model to score or label new data.

The following table lists the ore.predict functions:

ORE Predict Function	Description
ore.predict-glm	Generalized linear model
ore.predict-kmeans	k-Means clustering mode
ore.predict-lm	Linear regression model
ore.predict-matrix	A matrix with no more than 1000 rows
ore.predict-multinom	Multinomial log-linear model
ore.predict-nnet	Neural network models
ore.predict-ore.model	An Oracle R Enterprise model
ore.predict-prcomp	Principal components analysis on a matrix
ore.predict-princomp	Principal components analysis on a numeric matrix
ore.predict-rpart	Recursive partitioning and regression tree model

As you will see from the above table there are more ore.predict functions than there are ore.odm functions. The reason for this is that ORE comes with some additional data mining algorithms. These are in addition to the sub-set of Oracle Data Mining algorithms that it uses. These include the ore.glm, ore.lm, ore.neural and ore.stepwise.

You also need to watch out for the data mining algorithms that are not used in prediction. These include the Minimum Description Length, Apriori and Non-Negative Matrix Factorization.

Remember that these ore.predict functions are run inside the Oracle Database. No data is extracted to the data analyst laptop or desktop. All the data stays in the database. The ORE functions are run in the database on the data in the database

Sunday, March 23, 2014

Using the in-database ODM algorithms in ORE

Oracle R Enterprise is the version of R that Oracle has that runs in the database instead of on your laptop or desktop.

Oracle already has a significant number of data mining algorithms in the database. With ORE they have exposed these so that they can be easily called from your R (ORE) scripts.

To access these in-database data mining algorithms you will need to use the ore.odm package.

ORE is continually being developed with new functionality being added all the time. Over the past 2 years Oracle have released and updated version of ORE about every 6 months. ORE is generally not certified with the latest version of R. But is slightly behind but only a point or two of the current release. For example the current version of ORE 1.4 (released only last week) is certified for R version 3.0.1. But the current release of R is 3.0.3.

Will ORE work with the latest version of R? The simple answer is maybe or in theory it should, but is not certified.

Let's get back to ore.dm. The following table maps the ore.odm functions to the in-database Oracle Data Mining functions.

ORE Function	Oracle Data Mining Algorithm	What Algorithm can be used for
ore.odmAI	Minimum Description Length	Attribute Importance
ore.odmAssocRules	Apriori	Association Rules
ore.odmDT	Decision Tree	Classification
ore.odmGLM	Generalized Linear Model	Classification and Regression
ore.odmKMeans	k-Means	Clustering
ore.odmNB	Naïve Bayes	Classification
ore.odmNMF	Non-Negative Matrix Factorization	Feature Extraction
ore.odmOC	O-Cluster	Clustering
ore.odmSVM	Support Vector Machines	Classification and Regression

As you can see we only have a subset of the in-database Oracle Dat Miner algorithms. This is a pity really, but I'm sure as we get newer releases of ORE these will be added.

Sunday, March 16, 2014

ORE 1.4 New Parallel feature

Oracle R Enterprise (ORE) 1.4 has just been released and can downloaded from here. Remember there is a client and server side install required and ORE 1.4 is certified against R 3.0.1 and the Oracle R Distribution

One of the interesting new features is the PARALLEL option. You can set this to significantly improve the performance of your R server side code by using the PARALLEL database option. You can set the degree of PARALLEL at a global level in your code by using the ore.parallel setting.

The default setting for this ore.parallel setting is FALSE or 1. Otherwise it must be set to a minimum of 2 of more to enable the Parallel database option.

Alternatively you can set the ore.parallel setting to TRUE to use the default degree of parallelism that is set for the database object or set to NULL to use the default database setting

You will also be able to set the degree of parallel (DOP) using the parallel enabled functions ore.groupApply, ore.rowApply and ore.indexApply.

They have also made available or as they say exposed some more of the in-database Oracle Data Mining algorithms. These include the ODM algorithms for Association rules (ore.odmAssocRules), the feature extraction algorithm called Non-Negative Matrix Factorization (NMF) (ore.odmNMF) and the ODM Clustering algorithm O-Cluster (ore.odmOC)

Watch out of some blog posts on these over the coming weeks.

Check out the OTN page for the R Technologies from Oracle

Friday, July 12, 2013

Oracle 12c Advanced Analytics Option new features

With the release of Oracle 12c (finally) now have a lot of learning to do. Oracle 12c is a different beast to what we have been used to up to now.

As part of the 12c there are a number of new in-database Advanced Analytics features. These are separate to the Advanced Analytics new features that come as part of the Oracle Data Miner tool, that is part of SQL Developer.

This post will only look at the new features that are part of the 12c Database. The new in-Database Advanced Analytics features include:

Using Decisions Trees for Text analysis is now possible. Up to now (11.2g) when you wanted to do text classification you had to exclude Decision Trees from the process. This was because the Decision Trees algorithm could not support nested data.
Additionally for text mining some of the text processing has been moved from having a separate step, to being part of the some of the algorithms.
A number of additional features are available for Clustering. These include a cluster distance (from the centroid) and details functions.
There is a new clustering algorithm (in addition to the K-Means and O-Cluster algorithms), called Expectation Maximization algorithm. This creates a density model that can be give better results when data from different domains are combined for clustering. This algorithm will also determine the optimal number of clusters.
There are two new Feature Extraction methods that are scalable for high dimensional data, large number of records, for both structured and unstructured. This can be used to reduce the number of dimensions to use as input to the data mining algorithms. The first of these is called Singular Value Decomposition (SVD) and is widely used in text mining. The second method can be considered a special scoring method of SVD is called Principal Component Analysis (PCA). With this method it produces projections that are scaled with the data variance.
A new feature of the GLM algorithm is that it will perform a feature section step. This is used to reduce the number of predictors used by the algorithm and allow for faster builds. This will makes the outputs more understandable and model more transparent. This feature is not default so you will need to set this on if you want to use it with the GLM algorithm.
In previous versions of the database, there could be some performance issues that relate to the data types used. In 12c these has been addressed for BINARY_DOUBLE and BINARY_FLOAT. So if you are using these data types you should now see faster scoring of the data in 12c
There is new in-database feature called Predictive Queries. This allows on-the-fly models that are temporary models that are formed as part of an analytics clause. These models cannot be tuned and you cannot see the details of the model produced. They are formed for the query and do not exist afterwards.

SELECT cust_id, age, pred_age, age-pred_age age_diff, pred_det FROM
 (SELECT cust_id, age, pred_age, pred_det,
    RANK() OVER (ORDER BY ABS(age-pred_age) DESC) rnk FROM
    (SELECT cust_id, age,
         PREDICTION(FOR age USING *) OVER () pred_age,
         PREDICTION_DETAILS(FOR age ABS USING *) OVER () pred_det
  FROM mining_data_apply_v))
WHERE rnk <= 5;

There is a new function called PREDICTION_DETAILS. This allows you to see what the algorithm used to make the prediction. For example if we want to score a customer to see if they will churn, we can use the PREDICTION and PREDITION_PROBABILITY functions to do this and to see how how strong this prediction is. With PREDICTION_DETAILS we can now see what attributes and values the algorithm used to make that particular prediction. The output is in XML format.

These are the new in-database Advanced Analytics (Data Mining) features. Apart from the new algorithms or changes to them, most of the other changes gives greater transparency into what the algorithms/models are doing. This is good as it allows us to better understand and see what is happening.

The rest of the new Advanced Analytics Option new features will be part of Oracle Data Miner tool in SQL Developer 4. My next blog post will cover the new features in SQL Developer 4.

I haven’t mentioned anything about ORE. The reason for that is that it comes as a separate install and its current version 1.3 works the same in 11.2.0.3g as well as 12c. I’ve had some previous blog posts on this and you can check out the ORE website on OTN.

Saturday, January 19, 2013

BIWA Oracle Data Scientist Certificate

Last week I had had the opportunity to present at the BIWA Summit conference. This was held in the Sofitel Hotel beside the Oracle HQ buildings at Redwood Shores just out side of San Francisco.

This conference was a busy 2 days of with 4 parallel streams of presentations and another stream for Hands-on Labs. The streams covered Big Data, Advanced Analytics, Business Intelligence and Data Warehousing. There was lots of great presentations from well known names in the subject areas.

The BIWA Oracle Data Scientist Certificate was launched at the summit. The requirements for this certificate was to attend my presentation on ‘The Oracle Data Scientist’ (this was compulsory) and then to attend a number of other data science related presentations and hands-on labs. In addition to these presentations there is a short exam to take. This consists of some 30-ish questions, which were based on my presentation and some of the other presentations and hand-on labs. The main topic areas covered in the exam include what is data science about, Oracle Data Miner, Oracle R Enterprise and then some questions based on the key notes, in particular the keynote by Ari Kaplan.

There are a few days left to take the exam. Your answers to the questions will be reviewed and you should receive an email within a couple of days with your result and hopefully your certificate.

https://www.surveymonkey.com/s/BiwaSummitDataScientistCertificate

This was my first trip to Redwood Shores and I had some time to go for a walk around the Oracle HQ campus. Hopefully it wont be my last. Here is a photo I took of some of the Oracle buildings.

The BIWA Summit conference returns to Redwood Shores again in 2014 around the 14th and 15th January. It will be in the Oracle Conference centre that is part of the Oracle HQ campus.

Maybe I’ll see you there in 2014.

Tuesday, December 18, 2012

BIWA Summit–9th & 10th January, 2013

The BIWA Summit will be on the 9th and 10th January, 2013. It is being held in the Sofitel Hotel beside the Oracle HQ at Redwood Shores, just outside of San Francisco.

The BIWA Summit looks to be leading event in 2013 focused on Analytics, Data Warehousing, Big Data and BI. If you are a data architect or a data scientist this is certainly one event that you should consider attending in 2013.

All the big names (in the Oracle world) will be there Tom Kyte, Mark Rittman, Maria Colgan, Balaji Yelmanchili, Vaishnavi Sashikanth, Charlie Berger, Mark Hornick, Karl Rexter, Tim and Dan Vlamis.

Oh and then there is me. I’ll be giving a presentation on the Oracle Data Scientist. This will be on the first day of the event (9th) at 11:20am.

For anyone interest in the Oracle Data Scientist World there are lots of presentations to help you get start and up to speed in this area. Here is a list of presentations and hands on labs that I can recommend.

As is typical with all good conferences there are many presentations on at the same time that I would like to attend. If only I could time travel.

This is a great event to start off the new year and for everyone who is thinking of moving into or commencing a project in the area. So get asking you manager to see if there is any training budget left for 2012 or get first dibs on the training budget for 2013.

Registration is open and at the moment the early bird discount still seems to be available. You can also book a room in the hotel using the registration page.

To view the full agenda – click here

Friday, November 16, 2012

Accepted for BIWA Summit–9th to 10th January

I received an email today to say that I had a presentation accepted for the BIWA Summit. This conference will be in the Sofitel Hotel beside the Oracle HQ in Redwood City.

The title of the presentation is “The Oracle Data Scientist” and the abstract is

Over the past 18 months we have seen a significant increase in the demand for Data Scientists. But how does someone become a data scientist. If we examine the requirements and job descriptions of this role we can see that being able to understand and process data are fundamental skills. So an Oracle developer is ideally suited to being a Data Scientist. The presentation will show how an Oracle developer can evolve into a data scientist through a number of stages, including BI developer, OBIEE developer, statistical analysis, data miner and data scientist. The tasks and tools will be discussed and explored through each of these roles. The second half of the presentation will focus on the data mining functionality available in SQL and PL/SQL. This will consist of a demonstration of an Analytics Development environment and how you can migrate (and use) your models in a Production environment

For some reason Simon Cowell of XFactor fame kept on popping into my head and it now looks like he will be making an appearance in the presentation too. You will have to wait until the conference to find out what Simon Cowell and Being an Oracle Data Scientist have in common.

Check out the BIWA Summit website for more details and to register for the event.

I’ll see you there Smile

Thursday, October 25, 2012

New features for Developers in Oracle 12c & Tools

Continuing on from my previous posts on new features in the Advanced Analytics Option and the 12c Databases, this post will focus on the proposed new features for Developers in 12c and in the new releases of the development tools.

Health Warning: As with all the presentations at OOW that talked about what may be in or may be in the next release, there is no guarantee that these features will actually be in the released version of the database. Here is the slide that gives the Safe Harbor statement.

APEX 4.2 is out now and has lots of new features in particular features for creating mobile applications including transitions, gestures, changes in orientation and HTML 5 support. Coming in 12c APEX will be able to support the pluggable database environment. You will have the option to install APEX in the Contain database or in the pluggable databases. It will also support the extended VARCHAR2 size
SQL Developer will have Data Pump to allow for fast movement of data and for scheduling of the movements. The Database Difference tool has been redesigned to give more options and gives a more reliable comparison. A redesigned Database Copy (more options), improved Migrations and PDF report generation. SQL Developer is not the admin tool to manage the APEX Listener. UI improvements include more/better drag and drop, GRANT statement support in SQL editor and database Doc reporting. A new release of SQL Developer will be made available with the release of 12c that includes all the 12c new features
Better Data Compression of data being sent to/from the client/server. So how you use the ORDER BY clause will become more important
We will now have BOOLEAN in 12c but only in PL/SQL
12c will allow you to grant ROLES to PL/SQL program units. Or we could specify a White List that lists what other code units can call your code. This is a great security enhancement, although it involves more admin work, but it is worth it.
12c will allow you to include PL/SQL in WITH
12c will allow you to create duplicate indexes on the same set of columns. Sometimes you might want two different types of index on the same data, for example a bit map index and a b-tree index.
Cursor results sets can now be returned implicitly instead of the existing explicit method
The Warning messages when compiling our PL/SQL code can be filtered based on if they are Severe, Performance related or just Informational. This can be set at a System or Session level.

alter session set plsql_warnings='enable:severe';

alter session set plsql_warnings='enable:performance';

alter session set plsql_warnings='enable:informational';

There was a large number of exhibits at OOW. All of them were giving things away. For some of these you had to endure a sales pitch. One of the popular type of give away was a t-shirt. If you really wanted to, you could get enough t-shirts to keep you going for a few years. I popped into the exhibits for JavaOne and the pictures below is my faviourate t-shirt from OOW, by CloudBees.

Some of the exhibits were also giving away money. If you would sit through a 10 minute presentation you were given a ticket and if your number was picked your would could win anything from $20 up to $100. Many thanks to Intel Smile

Saturday, October 20, 2012

Oracle Advanced Analytics Option in Oracle 12c

At Oracle Open World a few weeks ago there was a large number of presentations on Big Data and Analytics. Most of these were marketing type presentations, with a couple of presentations on using R and how it can not be integrated into the Oracle Database 11.2.

In addition this these there was one presentation that focused on the Oracle Advanced Analytics (OAA) Option.

The Oracle Advanced Analytics Option covers the Oracle Data Mining features and the Oracle R Enterprise features in the Database.

The purpose of this blog post is to outline and summarise what was mentioned at these presentations, and will include what changes are/may be coming in the “Next Release” of the database i.e. Oracle 12c.

Health Warning: As with all the presentations at OOW that talked about what may be in or may be in the next release, there is not guarantee that the features will actually be in the release version of the database. Here is the slide that gives the Safe Harbor statement.

12c will come with R embedded into it. So there will be no need for any configurations.
Oracle R client will come as part of the server install.
Oracle R client will be able to use the Analytics functions that exist in the database.
Will be able to run R code in the database.
The database (12c) will be able to spawn multiple R engines.
Will be able to emulate map-reduce style algorithms.
There will be new PREDICTION function, replacing the existing (11g) functionality. This will combine a number of steps of building a model and applying it to the data to be scored into one function. But we will still need the functionality of the existing PREDICTION function that is in 11g. So it will be interesting to see how this functionality will be kept in addition to the new functionality being proposed in 12c.
Although the Oracle Data Miner tool will still exits and will have many new features. It was also referred to as the ‘OAA Workflow’. So those this indicate a potential name change? We will have to wait and see.
Oracle Data Miner will come with a new additional graphing feature. This will be in addition to the Explore Node and will allow us to produce more typical attribute related graphs. From what I could see these would be similar to the type of box plot, scatter, bar chart, etc. graphs that you can get from R.
There will be a number of new algorithms too, including a useful One Class Support Vector Machine. This can be used when we have a data set with just one class value. This algorithm will work out what records/cases are more important and others.
There will be a new SQL node. This will allow us to write our own data transformation code.
There will be a new node to allow the calling of R code.
The tool also comes with a slightly modified layout and colour scheme.

Again, the points that I have given above are just my observations. They may or may not appear in 12c, or maybe I misunderstood what was being said.

It certainly looks like we will have a integrate analytics environment in 12c with full integration of R and the ODM in-database features.

Friday, August 24, 2012

Big Data videos by Oracle

Here are the links to the 2 different sets of Big Data videos that Oracle have produced over the past 12 months

Oracle Big Data Videos - Version 1

Episode 1 - The Challenge

Episode 2 - Gold Mine or Just Stuff

Episode 3 - Big Data Speaks

Episode 4 - Everything You Always Wanted to Know

Episode 5 - Little Data

Oracle Big Data Videos - Version 2

Episode 1 - Overview for the Boss

Episode 2 - Hadoop

Episode 3 - Acquiring Big Data

Episode 4 - Organising Big Data

Episode 5 - Analysing Big Data

Monday, June 25, 2012

R resources

Download R : http://www.r-project.org/

R installation instructions : http://star-www.st-andrews.ac.uk/cran/

Oracle R Enterprise

R-Uni (A List of 85+ Free R Tutorials and Resources in Universities webpages)

R programming for those coming from other languages

Understanding big data analysis with the R language

Introduction to R for Data Mining from Revolution Analytics

Monday, April 23, 2012

Oracle Analytics Sessions at COLLABORATE12

There are a number of Oracle Advanced Analytics and related topics taking place this week at COLLABORATE12 in Las Vegas (http://collaborate12.com).

Date	Time	Presentation	Presenter
Sun 22nd	9:00-3pm	Oracle Business Intelligence Application Journey
Mon 23rd	9:45-10:45	Managing Unstructured Data using Hadoop, Oracle 11g and Oracle Exadata Database Machine	Jim Steiner
Mon 23rd	9:45-10:45	Environmental Data Management and Analytics-a Real World Perspective	Angela Miller
Mon 23rd	11-12	Public Safety and Environmental Real-Time Analytics using Oracle Business Intelligence	Raghav Venkat Therese Arguelles
Mon 23rd	11-12	BI is more than slice and dice	Peter Scott
Mon 23rd	14:30-15:30	In-Database Analytics: Predictive Analytics, Data Mining, Exadata & Business Intelligence	Jacek Myczkowski
Mon 23rd	15:45-16:45	Big Data Analytics, R you ready	Mark Hornick Shyam Nath
Tues 24th	10:45-11:45	BI Analytics and Oracle NoSQL. The Future of Now	Manish Khera
Wed. 25th	8:15-9:15	Oracle Data Mining – A Component of the Oracle Advanced Analytics Option-Hands-on Lab	Charlie Berger
Wed 25th	9:30-10:30	Oracle R Enterprise – A Component of the Oracle Advanced Analytics Option-Hands-on Lab	Mark Hornick

Here are the abstracts from the two main Oracle Advanced Analytics presentations by Charlie Berger and Mark Hornick

Oracle Data Mining – A Component of the Oracle Advanced Analytics Option

This Hands-on Lab provides an introduction to Oracle Data Mining and the Oracle Data Miner GUI.

Oracle Data Mining (ODM), now part of Oracle Advanced Analytics, provides an extensive set of in-database data mining algorithms that solve a wide range of business problems. It can predict customer behavior, detect fraud, analyze market baskets, segment customers, and mine text to extract sentiments. ODM provides powerful data mining algorithms that run as native SQL functions for in-database model building and model deployment. There is no need for the time delays and security risks of data movement.

The free Oracle Data Miner GUI is an extension to Oracle SQL Developer 3.1 that enables data analysts to work directly with data inside the database, explore the data graphically, build and evaluate multiple data mining models, apply ODM models to new data, and deploy ODM’s predictions and insights throughout the enterprise. Oracle Data Miner work flows capture and document the user's analytical methodology and can be saved and shared with others to automate advanced analytical methodologies.

Oracle R – A component of the Oracle Advanced Analytics Option

This Hands-on Lab provides an introduction to Oracle R Enterprise.

Oracle R Enterprise, a part of the Oracle Advanced Analytics Option, makes the open source R statistical programming language and environment ready for the enterprise by integrating R with Oracle Database. R users can interactively and transparently execute R scripts for statistical and graphical analyses on data stored in Oracle Database. R scripts can be executed in Oracle Database using potentially multiple database-managed R engines - resulting in data parallel execution. ORE also provides a rich set of statistical functions and advanced analytics techniques.

In this lab, attendees will be introduced to Oracle's strategy for R, including the Oracle R Distribution, Oracle R Enterprise (ORE), and Oracle R Connector for Hadoop (ORCH). We will focus on Oracle R Enterprise with hands-on exercises exploring the transparency layer, embedded R execution, and statistics engine.

Pages

Thursday, September 11, 2014

Tuesday, September 9, 2014

Monday, June 2, 2014

Monday, May 26, 2014

Tuesday, April 29, 2014

Thursday, April 24, 2014

Tuesday, April 22, 2014

Thursday, March 27, 2014

Wednesday, March 26, 2014

Sunday, March 23, 2014

Sunday, March 16, 2014

Friday, July 12, 2013

Saturday, January 19, 2013

Tuesday, December 18, 2012

Friday, November 16, 2012

Thursday, October 25, 2012

Saturday, October 20, 2012

Friday, August 24, 2012

Monday, June 25, 2012

Monday, April 23, 2012