Brendan Tierney - Oralytics Blog: 2016

Wednesday, December 28, 2016

2016: A review of the year

As 2016 draws to a close I like to look back at what I have achieved over the year. Most of the following achievements are based on my work with the Oracle User Group community. I have some other achievements are are related to the day jobs (Yes I have multiple day jobs), but I won't go into those here.

As you can see from the following 2016 was another busy year. There was lots of writing, which I really enjoy and I'll be continuing with in 2017. As they say, watch this space for writing news in 2017.

Books

Yes 2016 was a busy year for writing and most of the later half of 2015 and the first half of 2016 was taken up writing two books. Yes two books. One of the books was on Oracle R Enterprise and this book compliments my previous published book on Oracle Data Mining. I now have the books that cover both components of the Oracle Advanced Analytics Option.

I also co-wrote a book with legends of Oracle community. These were Arup Nada, Martin Widlake, Heli Helskyaho and Alex Nuijten.

More news coming in 2017.

Blog Posts

One of the things I really enjoy doing is playing with various features of Oracle and then writing some blog posts about them. When writing the books I had to cut back on writing blog posts. I was luck to be part of the 12.2 Database beta this year and over the past few weeks I've been playing with 12.2 in the cloud. I've already written a blog post or two already on this and I also have an OTN article on this coming out soon. There will be more 12.2 analytics related blog posts in 2017.

In 2016 I have written 55 blog posts (including this one). This number is a little bit less when compared with previous years. I'll blame the book writing for this. But more posts are in the works for 2017.

Articles

In 2016 I've written articles for OTN and for Toad World. These included:

OTN

Oracle Advanced Analytics : Kicking the Tires/Tyres
Kicking the Tyres of Oracle Advanced Analytics Option - Using SQL and PL/SQL to Build an Oracle Data Mining Classification Model
Kicking the Tyres of Oracle Advanced Analytics Option - Overview of Oracle Data Miner and Build your First Workflow
Kicking the Tyres of Oracle Advanced Analytics Option - Using SQL to score/label new data using Oracle Data Mining Models
Setting up and configuring RStudio on the Oracle 12.2 Database Cloud Service

ToadWorld

Introduction to Oracle R Enterprise
ORE 1.5 - User Defined R Scripts

Conferences

January - Yes SQL Summit, NoCOUG Winter Conference, Redwood City, CA, USA **
January - BIWA Summit, Oracle HQ, Redwood City, CA, USA **
March - OUG Ireland, Dublin, Ireland
June - KScope, Chicago, USA (3 presentations)
September - Oracle Open World (part of EMEA ACEs session) **
December - UKOUG Tech16 & APPs16

** for these conferences the Oracle ACE Director programme funded the flights and hotels. All other expenses and other conferences I paid for out of my own pocket.

OUG Activities

I'm involved in many different roles in the user group. The UKOUG also covers Ireland (incorporating OUG Ireland), and my activities within the UKOUG included the following during 2016:

Editor of Oracle Scene: We produced 4 editions in 2016. Thank you to all who contributed and wrote articles.
Created the OUG Ireland Meetup. We had our first meeting in October. Our next meetup will be in January.
OUG Ireland Committee member of TECH SIG and BI & BA SIG.
Committee member of the OUG Ireland 2 day Conference 2016.
Committee member of the OUG Ireland conference 2017.
KScope17 committee member for the Data Visualization & Advanced Analytics track.

I'm sure I've forgotten a few things, I usually do. But it gives you a taste of some of what I got up to in 2016.

Monday, December 19, 2016

Auditing Oracle Data Mining model usage

In a previous blog post I talked about how you can rename and comment your Oracle Data Mining models. This is to allow you to easily to see and understand the intended use of the data mining model.

Another feature available to you is to audit the usage of the the data mining models. As your data mining environment grows to many 10s or more typically 100s of models, you will need to have some way of tracking their usage. This can allow you to discover what models are frequently being used and those that are not being used in-frequently. You can then use this information to investigate if there are any issues. Or in some companies I've seen an internal charging scheme in place for each time the models are used.

The following outlines the steps required to setup the auditing of your models and how to inspect the usage.

Note: You will need to the AUDIT_ADMIN role to audit the models.

First create an audit policy for the data mining model in a particular schema.

CREATE AUDIT POLICY oaa_odm_audit_usage 
ACTIONS ALL 
ON MINING MODEL dmuser.high_value_churn_clas_svm;

This creates a policy that monitors all activity on the data mining model HIGH_VALUE_CHURN_CLAS_SVM in the DMUSER schema.

Now we need to enable the policy and allow to to tract all activity on the model.

AUDIT POLICY oaa_odm_audit_usage BY oaa_model_user;

This will track all usage of the data mining model by the schema call OAA_MODEL_USER. We can then use the following query to search for the audit records for the OAA_MODEL_USER schema.

SELECT dbusername,
       action_name, 
       systemm_privilege_used, 
       return_code,
       object_schema, 
       object_name, 
       sql_text
FROM  unified_audit_trail
WHERE object_name = 'HIGH_VALUE_CHURN_CLAS_SVM';

But there is a little problem with using what I've just shown you above. The problem is that it will track all activity on the data mining model. Perhaps this isn't what we really want. Perhaps we only want to track only certain activity of the data mining model. Instead of creating the policy using 'ACTIONS ALL', we can list out the actions or operations we want to track. For example, we want to tract when it is used in a SELECT. The following shows how you can set this up for just SELECT.

CREATE AUDIT POLICY oaa_odm_audit_select 
ACTIONS SELECT 
ON MINING MODEL dmuser.high_value_churn_clas_svm;

AUDIT POLICY oaa_odm_audit_select BY oaa_model_user;

The list of individual audit events you can use include:

AUDIT
COMMENT
GRANT
RENAME
SELECT

A policy can be setup to tract one or more of these events. For example, if we wanted a policy to track SELECT and GRANT, we would have list each event separated by a comma.

CREATE AUDIT POLICY oaa_odm_audit_select_grant 
ACTIONS SELECT 
ON MINING MODEL dmuser.high_value_churn_clas_svm,
ACTIONS GRANT 
ON MINING MODEL dmuser.high_value_churn_clas_svm,
;

AUDIT POLICY oaa_odm_audit_select_grant BY oaa_model_user;

Monday, December 12, 2016

Renaming & Commenting Oracle Data Mining Models

As your company evolves with their data mining projects, the number of models produced and in use in production will increase dramatically.

Care needs to be taken when it comes to managing these. This includes using meaningful names, adding descriptions of what the model is about or for, and being able to track their usage, etc.

I will look at tracking the usage of the models in another blog post, but the following gives examples of how to rename Oracle Data Mining models and how to add comments or descriptions to these models. This is particularly useful because our data analytics teams have a constant turn over or it has been many months since you last worked on a model and you want a quick idea of what purpose of the model was for.

If you have been using the Oracle Data Mining tool (part of SQL Developer) will will see your model being created with some sort of sequencing numbers. For example for a Support Vector Machine (SVM) model you might see it labelled for classification:

CLAS_SVM_5_22

While you are working on this project you will know and understand what it was about and why it is being used. But afterward you may forget as you will be dealing with many hundreds of models. Yes you could check your documentation for the purpose of this model but that can take some time.

What if you could run a SQL query to find out?

But first we need to rename the model.

DBMS_DATA_MINING.RENAME_MODEL('CLAS_SVM_5_22', 'HIGH_VALUE_CHURN_CLAS_SVM');

Next we will want to add a longer description of what the model is about. We can do this by adding a comment to the model.

COMMENT ON MINING MODEL high_value_churn_clas_svm IS
'Classification Model to Predict High Value Customers most likely to Churn';

We can now see these updated details when we query the Oracle Data Mining models in a user schema.

SELECT model_name, mining_function, algorithm, comments 
FROM user_mining_models;

These are two very useful commands.

Wednesday, December 7, 2016

12.2 DBaaS (Extreme Edition) possible bug/issue with the DB install/setup

A few weeks ago the 12.2 Oracle Database was released on the cloud. I immediately set an account and got my 12.2 DBaaS setup. This was a relatively painless process and quick.

For me I wanted to test out all the new Oracle Advanced Analytics new features and the new features in SQL Developer 4.2 that only become visible when you are using the 12.2 Database.

When you are go to use the Oracle Data Miner (GUI tool) in SQL Developer 4.2, it will check to see if the ODMr repository is installed in the database. If it isn't then you will be promoted for the SYS password.

This is the problem. In previous version of the DBaaS (12.1, etc) this was not an issue.

When you go to create your DBaaS you are asked for a password that will be used for the admin accounts of the database.

But when I entered the password for SYS, I got an error saying invalid password.

After using ssh to create a terminal connection to the DBaaS I was able to to connect to the container using

sqlplus / as sysdba

and also using

sqlplus sys/ as sysdba

Those worked fine. But when I tried to connect to the PDB1 I got the invalid username and/or password error.

sqlplus sys/@pdb1 as sysdba

I reconnected as follows

sqlplus / as sysdba

and then changed the password for SYS with containers=all

This command completed without errors but when I tried using the new password to connect the the PDB1 I got the same error.

After 2 weeks working with Oracle Support they eventually pointed me to the issue of the password file for the PDB1 was missing. They claim this is due to an error when I was creating/installing the database.

But this was a DBaaS and I didn't install the database. This is a problem with how Oracle have configured the installation.

The answer was to create a password file for the PDB1 using the following

% orapwd file=$ORACLE_HOME/dbs/orapwPDB1 password= entries=10

I then changed the password again for SYS, then tried to connect as SYS to the PDB1, and if by magic I was connected.

I then tried installing the ODMr repository again (in SQL Developer) and when I entered the new password for SYS, it worked !

It's a pity that it took Oracle Support 2 weeks to get me to this point.

As 12.2 is a cloud service hopefully Oracle will get that issue fixed soon so that one one else has to suffer like I did.

Monday, December 5, 2016

Evaluating Cluster Dispersion in Oracle Data Mining

When working with the Clustering algorithms, and particularly k-Means, in the Oracle Data Miner tool there is no way of seeing how compact or dispersed the data is within a cluster.

There are a number of measures typically used in various tools and algorithms, but with Oracle Data Miner we are not presented with any of this information.

But if we flip from using the Oracle Data Miner tool to using SQL we can get to see some more details of the clusters produced by the k-Means algorithm along with some additional and useful information.

As I said there are a number of different measures used to evaluate clusters. The one that Oracle uses is called Dispersion. Now there are a few different definitions of what this could be and I haven't been able to locate what is Oracle's own definition of it in any of the documentation.

We can use the Dispersion value as a measure of how compact or how spread out the data is within a cluster. The Dispersion value is a number greater than 0. The lower the value of the more compact the cluster is i.e. the data points are close the the centroid of the cluster. The larger the value the more disperse or spread out the data points are.

The DBMS_DATA_MINING PL/SQL package comes with a function called GET_MODEL_DETAILS_KM. This function returns a record of the form DM_CLUSTERS.

(id                   NUMBER,
 cluster_id           VARCHAR2(4000),
 record_count         NUMBER,
 parent               NUMBER,
 tree_level           NUMBER,
 dispersion           NUMBER,
 split_predicate      DM_PREDICATES,
 child                DM_CHILDREN,
 centroid             DM_CENTROIDS,
 histogram            DM_HISTOGRAMS,
 rule                 DM_RULE)

We can not use the following query to get the Dispersion value for each of the clusters from an ODM cluster model.

SELECT cluster_id,
       record_count,
       parent,
       tree_level,
       dispersion
FROM  table(dbms_data_mining.get_model_details_km('CLUS_KM_3_2'));

Tuesday, November 29, 2016

UKOUG Conference 1999 : Who was there & what they were talking about

Yes you read the title of this blog post correctly!

Recently I was doing a bit a clear out and I came across a CD of the UKOUG Conference proceedings from 1999. That was my second UKOUG conference and how times have changed.

NewImage

The CD contained all the conference proceedings consisting of slides and papers.

Here are some familiar names from back in 1999 who you may find presenting at this years conference, some you might remember as being a regular presenter and some are still presenting but not at this years conference.

Jonathan Lewis
Carl Dudley
Fiona Martin
Peter Robson
Duncan Mills
Kent Graziano
John King
Toby Price
Doug Burns
Dan Hotka
Joel Goodman

The 1999 Ralph Fiennes did the Keynote speech. I queued up afterwards to get a signed book but they ran out with three people ahead of me :-(

The agenda grid was a bit smaller back then compared to now.

NewImage

I'll see you again in Birmingham this year, in a few days time :-)

I'll be presenting during the Super Sunday and then again on the Tuesday.

When you are at the UKOUG Conference, addend some sessions that are not in your area. You never know what you might learn. Also get out and about, and explore the Christmas market just outside the conference venue.

Monday, November 14, 2016

Using the Identity column for Oracle Data Miner

If you are a user of the Oracle Data Miner tool (the workflow data mining tool that is part of SQL Developer), then you will have noticed that for many of the algorithms you can specify a Case Id attribute along with, say, the target attribute.

NewImage

The idea is that you have one attribute that is a unique identifier for each case record. This may or may not be the case in your data model and you may have a multiple attribute primary key or case record identifier.

But what is the Case Id field used for in Oracle Data Miner?

Based on the documentation this field does not need to have a value. But it is recommended that you do identify an attribute for the Case Id, as this will allow for reproducible results. What this means is that if we run our workflow today and again in a few days time, on the exact same data, we should get the same results. So the Case Id allows this to happen. But how? Well it looks like the attribute used or specified for the Case Id is used as part of the Hashing algorithm to partition the data into a train and test data set, for classification problems.

So if you don't have a single attribute case identifier in your data set, then you need to create one. There are a few options open to you to do this.

Create one: write some code that will generate a unique identifier for each of your case records based on some defined rule.
Use a sequence: and update the records to use this sequence.
Use ROWID: use the unique row identifier value. You can write some code to populate this value into an attribute. Or create a view on the table containing the case records and add a new attribute that will use the ROWID. But if you move the data, then the next time you use the view then you will be getting different ROWIDs and that in turn will mean we may have different case records going into our test and training data sets. So our workflows will generate different results. Not what we want.
Use ROWNUM: This is kind of like using the ROWID. Again we can have a view that will select ROWNUM for each record. Again we may have the same issues but if we have our data ordered in a way that ensures we get the records returned in the same order then this approach is OK to use.
Use Identity Column: In Oracle 12c we have a new feature called Identify Column. This kind of acts like a sequence but we can defined an attribute in a table to be an Identity Column, and as records are inserted into the the data (in our scenario our case table) then this column will automatically generate a unique number for our data. Again if we need to repopulate the case table, you will need to drop and recreate the table to get the Identity Column to reset, otherwise the newly inserted records will start with the next number of the Identity Column

Here is an example of using the Identity Column in a case table.

CREATE TABLE case_table (
id_column	NUMBER GENERATED ALWAYS AS IDENTITY,
affinity_card 	NUMBER,
age		NUMBER,
cust_gender	VARCHAR2(5),
country_name	VARCHAR2(20)
...
);

You can now use this Identity Column as the Case Id in your Oracle Data Miner workflows.

NewImage

Wednesday, November 9, 2016

New OAA features in Oracle 12.2 Database

The Oracle 12.2c Database has been released and is currently available as a Cloud Service. The on-site version should be with us soon.

A few weeks ago I listed some of the new features that you will find in the Oracle Data Miner GUI tool (check out that blog post). I'll have another blog post soon that looks a bit closer at how the new OAA features are exposed in this tool.

In this blog post I will list most of the new database related features in Oracle 12.2. There is a lot of new features and a lot of updated features. Over the next few months (yes it will take that long) I'll have blog posts on most of these.

The Oracle Advanced Analytics Option new features include:

The first new feature is one that you cannot see. Yes that sound a bit odd. But the underlying architecture of OAA has been rebuilt to allow for the algorithms to scale significantly. This is also future proofing OAA for new features coming in future releases of the database.
Explicit Semantic Analysis. This is a new algorithm allows us to perform text similarity comparison. This is a great new addition and and much, much easier now compared to what we may have had to do previously.
Using R models using SQL. Although we have been able to do this in the previous version of the database, the framework and supports have been extended to allow for greater and easier usage of user defined R scripts and R models with the in-database environment.
Partitioned Models. We can now build partitioned mining models. This is where you can specify an attribute and a separate model will be created based on each value in the attribute.
Partitioned scoring. Similarly we can now dynamically score the data based on an partition attribute.
Extentions to Association Rules. Over the past few releases of the database, additional insights to the workings and decision making of the algorithms have been included. In 12.2 we now have some additional insights for the Association Rules aglorithm where we can now get to see the calculation of values associated with rules.
DBMS_DATA_MINING package extended. This PL/SQL package has been extended to include the functionality for the new features listed above. Additional it can now process R algorithms and models.
SQL Function changes: Change to the followi ODM related SQL functions to allow for partitioned models. CLUSTER_DETAILS, CLUSTER_DISTANCE, CLUSTER_ID, CLUSTER_PROBABILITY, CLUSTER_SET, FEATURE_COMPARE, FEATURE_DETAILS, FEATURE_ID, FEATURE_SET, FEATURE_VALUE, ORA_DM_PARTITION_NAME, PREDICTION, PREDICTION_BOUNDS, PREDICTION_COST, PREDICTION_DETAILS, PREDICTION_PROBABILITY, PREDICTION_SET
New SQL Hint for ODM models. We have had hints in SQL for many, many versions now, but with 12.2c we now have a hint for partitioned models, called GROUPING hint.
New CREATE_MODEL function. With the existing CREATE_MODEL function the input data set for the function needed to be defined in a table or accessed using a view. Basically the data needed to resist somewhere. With CREAETE_MODEL2 you can now define the input data set based on a SELECT statement.

In addition to all of these changes there are also some new interesting DB, SQL and PL/SQL new features that are of particular interest for your data science, machine learning, advanced analytics (or whatever the current favourite marketing term is today) projects.

It is going to be a busy few months ahead, working through all of these new features and write blog posts on how to use each of them.

Tuesday, November 1, 2016

Creating and Reading SPSS and SAS data sets in R

NOTE: Several people have contacted me to say that using this R package does not work. The data set 
generated is not readable by SAS. If you encounter this problem then get in touch with the creators 
of Haven for help and support. I'm using R version 3.2.0.
All I an say is, it worked for me!

Have you ever been faced with having to generate a data set in the format that is needed by another analytics tool? or having to generate a data set in a particular format but you don't have the software that generates that format? For example, if you are submitting data to the FDA and other bodies, you may need to submit the data in a SAS formatted file. There are a few ways you can go about this.

One option is that you can use the Haven R package to generate your dataset in SAS and SPSS formats. But you can also read in SAS and SPSS formatted files. I have to deal with these formatted data files all the time, and it can be a challenge, but I've recently come across the Haven R package that has just made my life just a little bit/lots easier. Now I can easily generate SAS and SPSS formatted data sets for my data in my Oracle Database, using R and ORE. ORE we can now use the embedded feature to build the generation of these data sets into some of our end-user applications.

Let us have a look at Haven and what it can do.

Firstly there is very little if any documentation online for it. That is ok so we will have to rely on the documentation that comes with the R packages. Again there isn't much to help and that is because the R package mainly consists of functions to Read in these data sets, functions to Write these data sets and some additional functions for preparing data.

For reading in data sets we have the following functions:

# SAS
read_sas("mtcars.sas7bdat")
# Stata
read_dta("mtcars.dta")
# SPSS
read_sav("mtcars.sav")

For writing data sets we have the following functions:

# SAS
write_sas(mtcars, "mtcars.sas7bdat")
# Stata
write_dta(mtcars, "mtcars.dta")
# SPSS
write_sav(mtcars, "mtcars.sav")

Let us now work through an example of creating a SAS data set. We can use some of the sample data sets that come with the Oracle Database in the SH schema. I'm going to use the data in the CUSTOMER table to create a SAS data set. In the following code I'm using ORE to connect to the database but you can use your preferred method.

> library(ORE)
> # Create your connection to the schema in the DB
> ore.connect(user="sh", password="sh", host="localhost", service_name="PDB12C", 
            port=1521, all=TRUE) 

> dim(CUSTOMERS)
[1] 55500    23
> names(CUSTOMERS)
 [1] "CUST_ID"                "CUST_FIRST_NAME"        "CUST_LAST_NAME"        
 [4] "CUST_GENDER"            "CUST_YEAR_OF_BIRTH"     "CUST_MARITAL_STATUS"   
 [7] "CUST_STREET_ADDRESS"    "CUST_POSTAL_CODE"       "CUST_CITY"             
[10] "CUST_CITY_ID"           "CUST_STATE_PROVINCE"    "CUST_STATE_PROVINCE_ID"
[13] "COUNTRY_ID"             "CUST_MAIN_PHONE_NUMBER" "CUST_INCOME_LEVEL"     
[16] "CUST_CREDIT_LIMIT"      "CUST_EMAIL"             "CUST_TOTAL"            
[19] "CUST_TOTAL_ID"          "CUST_SRC_ID"            "CUST_EFF_FROM"         
[22] "CUST_EFF_TO"            "CUST_VALID"

Next we can prepare the data, take a subset of the data, reformat the data, etc. For me I just want to use the data as it is. All I need to do now is to pull the data from the database to my local R environment.

dat <- ore.pull(CUSTOMERS)

Then I need to load the Haven library and then create the SAS formatted file.

library(haven)
write_sas(dat, "c:/app/my_customers.sas7bdat")

That's it. Nice and simple.

But has it worked? Has it created the file correctly? Will it load into my SAS tool?

There is only one way to test this and that is to only it in SAS. I have an account on SAS OnDemand with access to several SAS products. I'm going to use SAS Studio.

Well it works! The following image shows SAS Studio after I had loaded the data set with the variables and data shown.

NewImage

WARNING: When you load the data set into SAS you may get a warning message saying that it isn't a SAS data set. What this means is that it is not a data set generated by SAS. But as you can see in the image above all the data got loaded OK and you can work away with it as normal in your SAS tools.

The next step is to test the loading of a SAS data set into R. I'm going to use one of the standard SAS data sets called PVA97NK.SAS7BDAT. If you have worked with SAS products then you will have come across this data set.

When you use Haven to load in your SAS data set, it will create the data in tribble format. This is a slight varient of a data.frame. So if you want the typical format of a data.frmae then you will need to convert the loaded data, as shown in the following code.

> data_read <- read_sas("c:/app/pva97nk.sas7bdat")
> dim(data_read)
[1] 9686   28
> d<-data.frame(data_read)
> class(data_read)
[1] "tbl_df"     "tbl"        "data.frame"
> class(d)
[1] "data.frame"
> head(d)
  TARGET_B       ID TARGET_D GiftCnt36 GiftCntAll GiftCntCard36 GiftCntCardAll
1        0 00014974       NA         2          4             1              3
2        0 00006294       NA         1          8             0              3
3        1 00046110        4         6         41             3             20
...

I think this package to going to make my life a little bit easier, and if you work with SPSS and SAS data sets then hopefully some of your tasks have become a little bit easier too.

Monday, October 31, 2016

Data Science Is Multidisciplinary (updated October 2016)

[Update :October 2016. There appears to be some discussion about the Venn diagram I've proposed below. The central part of this diagram is not anything I can up with. It was a commonly used Venn diagram for Data Mining. Thanks to Polly Michell-Guthrie for providing the original reference for the Venn. I just added the outer ring of additional skills needed for the new area of Data Science. This was just my view of things back in 2012. Things have moved on a bit since then]

A few weeks ago I had a blog post called Domain Knowledge + Data Skills = Data Miner.
In that blog post I was saying that to be a Data Scientist all you needed was Domain Knowledge and some Data Skills, which included Data Mining.
The reality is that the skill set of a Data Scientist will be much larger. There is a saying ‘A jack of all trades and a master of none’. When it comes to being a data scientist you need to be a bit like this but perhaps a better saying would be ‘A jack of all trades and a master of some’.
I’ve put together the following diagram, which includes most of the skills with an out circle of more fundamental skills. It is this outer ring of skills that are fundamental in becoming a data scientist. The skills in the inner part of the diagram are skills that most people will have some experience in one or more of them. The other skills can be developed and learned over time, all depending on the type of person you are.

Can we train someone to become a data scientist or are they born to be a data scientist. It is a little bit of both really but you need to have some of the fundamental skills and the right type of personality. The learning of the other skills should be easy(ish)
What do you think? Are their Skill that I’m missing?

Tuesday, October 25, 2016

Oracle Text, Oracle R Enterprise and Oracle Data Mining - Part 5

In this 5th blog post in my series on using the capabilities of Oracle Text, Oracle R Enterprise and Oracle Data Mining to process documents and text, I will have a look at some of the machine learning features of Oracle Text.

Oracle Text comes with a number of machine learning algorithms. These can be divided into two types. The first is called 'Supervised Learning' where we have two machine learning algorithms for classification type of problem. The second type is called 'Unsupervised Learning' where we have the ability to use clustering machine learning algorithms to look for patterns in our text documents and to find similarities between documents based on their contents.

It is this second type of document clustering that I will work through in this blog post.

When using clustering with text documents, the machine learning algorithm will look for patterns that are common between the documents. These patterns will include the words used, the frequency of the words, the position or ordering of these words, the co-occurance of words, etc. Yes this is a large an complex task and that is why we need a machine learning algorithm to help us.

With Oracle Text we only have one clustering machine learning algorithm available to use. When we move onto using the Oracle Advanced Analytics Option (Oracle Data Mining and Oracle R Enterprise) we more algorithms available to us.

With Oracle Text the clustering algorithm is called k-Means. In a way the actual algorithm is unimportant as it is the only one available to us when using Oracle Text. To use this algorithm we have the CTX_CLS.CLUSTERING procedure. This procedure takes the documents we want to compare and will then identify the clusters (using hierarchical clustering) and will then tells us, for each document, what clusters the documents belong to and they probability value. With clustering a document (or a record) can belong to many clusters. Typically in the text books we see clusters that are very distinct and are clearly separated from each other. When you work on real data this is never the case. We will have many over lapping clusters and a data point/record can belong to one or more clusters. This is why we need the probability vale. We can use this to determine what cluster our record belongs to most and what other clusters it is associated with.

Using the example documents that I have been using during this series of blog posts we can use the CTX_CLS.CLUSTERING algorithm to cluster and identify similarities in these documents.

We need to setup the parameters that will be used by the CTX_CLS.CLUSTERING procedure. Tell it to use the k-Means algorithm and then the number of clusters to generate. As with all Oracle Text procedures or algorithms there are a number of settings you can configure or you can just accept the default values.

exec ctx_ddl.drop_preference('Cluster_My_Documents');
exec ctx_ddl.create_preference('Cluster_My_Documents','KMEAN_CLUSTERING');
exec ctx_ddl.set_attribute('Cluster_My_Documents','CLUSTER_NUM','3');

The code above is an example of the basics of what you need to setup for clustering. Other attribute or cluster parameter setting available to you include, MAX_DOCTERMS, MAX_FEATURES, THEME_ON, TOKEN_ON, STEM_ON, MEMORY_SIZE and SECTION_WEIGHT.

Now we can run the CTX_CLS.CLUSTERING procedure on our documents. This procedure has the following parameters.

- The Oracle Text Index Name

- Document Id Column Name

- Document Assignment (cluster assignment) Table Name. This table will be created if it doesn't already exist

- Cluster Description Table Name. This table will be created if it doesn't already exist.

- Name of the Oracle Text Preference (list)

exec ctx_cls.clustering(
'MY_DOCUMENTS_OT_IDX',
'DOC_PK',
'OT_CLUSTER_RESULTS',
'DOC_CLUSTER_DETAILS',
'Cluster_My_Documents');

When the procedure has completed we can now examine the OT_CLUSTER_RESULTS and the DOC_CLUSTER_DETAILS tables. The first of these (OT_CLUSTER_RESULTS) allows us to see what documents have been clustered together. The following is what was produced for my documents.

SELECT d.doc_pk, 
       d.doc_title, 
       r.clusterid, 
       r.score 
FROM my_documents d, 
     ot_cluster_results r 
WHERE d.doc_pk = r.docid;

NewImage

We can see that two of the documents have been grouped into the same cluster (ClusterId=2). If you have a look back at what these documents are about then you can see that yes these are very similar. For the other two documents we can see that they have been clustered into separate clusters (ClusterId=4 & 5). The clustering algorithms have said that they are different types of documents. Again when you examine these documents you will see that they are talking about different topics. So the clustering process worked !

You can also explore the various features of the clusters by looking that he DOC_CLUSTER_DETAILS table. Although the details in this table are not overly useful but it will give you some insight into what clusters the k-Means algorithm has produced.

Hopefully I've shown you how easy it is to setup and use the clustering feature of Oracle Text.

WARNING: Before using the Clustering or Classification with Oracle Text, you need to check with your local Oracle Sales representative about if there is licence implication. There seems to be some mentions the the algorithms used are those that come with Oracle Data Mining. Oracle Data Mining is a licence cost option for the database. So make sure you check before you go using these features.

Saturday, October 22, 2016

Our first OUG Ireland Meet-up

Last Thursday evening (20th October) we had our first Meet-up event for OUG Ireland.

Up to recently we have had a one or two full day SIG events that covered both the Tech and BA/Big Data areas. But we have been finding it increasingly difficult to get speakers and attendees to take a full day out of work to attend the SIG events. This was particularly true for a SIG event we had scheduled to happen in early October. But by the end of August things were not coming together for us, so it was time to try something new.

Over the past couple of years Meet-ups have been growing in numbers and in popularity. We (the OUG Ireland SIG committee) have been keeping an eye on this in the UK and Ireland.

It was time to get this concept a try.

What did it entail and what venue did you use?

The first thing we needed to do was to arrange a venue. A very popular location for Meet-ups in Ireland is in one of Bank of Ireland branches. This is Bank of Ireland on Grand Canal Dock in Dublin. It is one of their enterprise centres and is open during the day for meetings, as a workspace and it also operates as a branch. In the evenings and on a Saturday morning it is available for groups to hold meetings for larger groups and Meet-ups. We have the venue from 6pm-8pm.

What presentations did you have?

After securing a venue the we then decided to have the theme of the Meet-up to be about 'Updates from Oracle Open World'. Most of the SIG committee was at Oracle Open World, so that should be easy enough to put a few presentations together and we got the local Oracle office to joins us to.

NewImage

What about catering?

The venue very kindly supplies some soft drinks, tea, coffee, a few beers, along with some sandwiches and pastries. All for free!

So far we have a free venue, free catering and the committee for presenters.

NewImage

How did you advertise the event?

The only other thing we needed to do now was to advertise the event. For this we used a combination of EventBrite and Meet-up.com for this, along with our own contacts. Plus some of our friends helped to spread the word. This worked really well. We ended up getting roughly the same number of people registering for the event on each platform. We had 98 registrations on these websites.

Something we were warned about is that a lot of people will register, but if you get 40% of those turning up for the event then you are doing well. We got 48 attendees (=50%). We were delighted with this. For our full day SIG events we might have had 20-25 attendees.

How much this this event cost?

It cost us zero euro/dollars/sterling. As there was no admin, advertising costs, catering, room hire, nothing. Well that is not entirely true. There was a small cost and that was for our membership fee to be on Meet-up. That cost me about $30 for 6 months (unlimited plan). Yes I've paid that myself.

What was the feedback after the event?

The feedback was fantastic. People loved the new format, loved that it was in the evening after work, liked the short length presentations, liked that it was free, etc.

I also asked people if they might be interested in presenting at a future Meet-up. Personally I had 5 people talk to me about this. The over committee members also had people talk to them about it. It seems people are interested in trying the shorter format presentations, as it is not as daunting as presenting at a conference. A conference seems to be more formal and a step up in presenting levels.

NewImage

What next?

Well we have the same venue booked for 12th January and 11th May. We have our 2 presenters for the 12th January already. We actually had those before our first meet-up. Plus for the 11th May we have some possible presenters and I just need to work with them to see who will get on the agenda.

The plan was to have 3-4 of these each year. Based on the feedback and the level of interest we might need to have a few more. But it is still early days and we need to see how things develop.

Our Meet-up was in Dublin. Ideally we want to bring this to over regions. For example we could do the same in Belfast and Cork, and possibly Limerick and Galway. This is something we are looking at and in 2017 we will definitely have a Meet-up in one (or two) of those locations. That would bring us up to 4-5 Meet-ups in 2017.

Thank you to everyone who attended and everyone how helped to make this happen.

Monday, October 17, 2016

Oracle Data Miner (ODMr) 4.2 Repository Upgrade

With each new release of the Oracle Data Miner (ODMr) tool (part of SQL Developer) an upgrade of your ODMr Repository is needed. This is because of the numerous new features in the tool. This is particularly the case with ODMr (SQLDev) 4.2.

No most of the new features for ODMr 4.2 will not be visible until you are running a 12.2 Database. But a small number of new features are available if you are running an earlier version of the DB. Check out my blog post on some of these.

Before upgrading the ODMr repository, just like with any upgrade, make sure to do your backups. Although there is some coping of objects done during the repository upgrade (lot story but a few versions ago my ODMr repository and work got wiped during an upgrade), you should always export and save your workflows. You will need to do this using your current version of ODMr/SQL Dev before you start using ODMr 4.2.

When you have saved your workflows etc you can then start using ODMr/SQLDev 4.2.

The easiest way to do the ODMr 4.2 Repository upgrade is to let the tool do it for you. You can do this by trying to open one of your ODMr connections.

IMPORTANT: You will need to have the SYS password for the ODMr upgrade, so have your DBA do this step for you or have them on standby to enter the password for you.

NewImage

NOTE: This upgrade is being done on a CDB/PDB 12.2 DB.

When prompted enter the SYS password.

NewImage

When promoted click on the Start button.

NewImage

The progress bar will let you know things are going.

NewImage

When complete you will get the following.

NewImage

It is always good to check the Log file/report. Especially if you encounter errors !

NewImage

Job Done!

You can now start using all (well almost all) the new features of ODMr 4.2.

When the 12.2 Database is available you will get to see lots more features.

Tuesday, October 11, 2016

OTN Appreciation Day : My favourite thing from OTN #ThanksOTN

This blog post is my contribution to the OTN Appreciation Day, the brain child of Tim Hall (read his blog post here).

For my contribution, I'm going to write about something that is a bit different to what most people will be writing about. Most people will be writing about some feature of the Oracle Database or maybe their favourite tool.

I'm not going to do that. What I'm going to write about is something that OTN does for use Developers, DBAs, etc.

Basically OTN has done so much over the years to help developers in a multitude of different ways.

Apart from the support that OTN gives me as an Oracle ACE Director (Thank you!), one of my favourite things that OTN makes available to us are the VirtualBox Pre-built Developer VMs.

NewImage

These pre-built VMs allow us developers to go play with the technology, to learn how to use it, to follow tutorials, to see how various software applications work together, etc all within a virtual machine.

I bet that (almost) everyone reading this blog and taking part in the OTN Appreciation Day will have used one or more of the virtual box prebuilt VMs.

Why is this a good thing? How would you like trying to install all this software from scratch? Not me. Typically for me when performing an install I usually mess something up. If this happens often enough then you may just get frustrated with what you are trying to do and just give up on it. The result will probably be you giving a negative review to your employer.

But the pre-build VMs take the pain of installing (sometimes) large and complex software is taken away from you and allows you to dive straight into using the software. I also really love that the VMs come with tutorials, decent data sets, example applications built using the software, and demonstrations on how to get each of these working together.

If you mess anything up, then you can just re-import the VM and start all over again. When you are finished using the VM and testing the software, all you need to do is to delete the VM. You latop, desktop or where ever you have installed the VM is left clean with no partially uninstalled files, etc.

Each of us will have our favourite VMs. For most people the Developer Day VM is fantastic. It you are a beginner or an experienced developer I would bet most people will have a copy of this VM and are probably using it as their personal Oracle Database sever.

For me, I'm also a regular user of the Oracle Big Data Lite VM and the OBIEE Sample Application VM.

For OTN Appreciation day, I haven't talked about a Database feature. Instead I've talked about something that OTN has done for us, the developer, DBA, etc community. I'd like to thank OTN for supporting the community by providing these VirtualBox pre-built VMs for us to use. You have saved me/us many, many, many hours/days/weeks/months over the years.

BTW. I'm looking forward to the VM with the 12.2c Database.

Monday, October 10, 2016

OUG Ireland Meet-up 20th October

Come along to the first OUG Ireland meet-up on the 20th October, in Bank of Ireland, Grand Canal Dock, Dublin, between 18:15 and 20:00.

Over the years the OUG Ireland SIG committee have organised one day SIG events once or twice a year. This is in addition to the annual OUG Ireland conference (typically held in March). Sometimes it has been a challenge to get people to attend, sometimes it has been a challenge to get enough speakers, sometimes it was a challenge to get a good venue, etc.

So we have decided to try something a little bit different. In keeping with the current trend of smaller scale events we have organised our first Meet-up. This will be a short 2 hour event to be held after work on the 20th October. So come along and joins us.

This is a free and open event. You do not need to be a member of the user group to come to this meet-up.

Here are the details:

Theme for Meet-up

Updates from Oracle Open World 2016

Agenda

18:00-18:20 : Sign-in, meet and greet, and setup of space with seats etc

18:20-18:30 : Introductions & Welcome, Agenda, what is OUG Ireland

18:30-18:45 : The Oracle 12.2 Database new features (Simon Holt)

18:45-19:00 : What's new in the BI, BA, Big Data world from Oracle (Brendan Tierney)

19:00-19:15 : What's happening with Cloud (Tony Cassidy)

19:15-19:30 : Other updates from Oracle (John Caulfield, Oracle)

19:30-19:45 : Q&A session and Open discussion

Location

Bank Of Ireland

1 Grand Canal Square

Dublin

Please sign up, so that we know who is coming

There are 2 places where you can sign up. It doesn't matter which one you use but please use one of them to let us know you will be there.

We will be looking to setup more Meet-up events, so let us know what you think of the new format and particularly if you would like to get involved with talking about a topic, project, new feature, whatever, etc. for 15-20 minutes (a short demo would be good)

Wednesday, October 5, 2016

Oracle Data Miner 4.2 EA : New Features

A couple of weeks ago during the madness of Oracle Open World there was some new product releases and lots of updates to existing products.

One such product was SQL Developer. They released an Early Adopter version (EA1). This is where you can try out the new version of the product, but you need to be careful as it is not the GA/Production version. So it may have some "features".

One component of SQL Developer is the Oracle Data Miner tool. This tool GUI workflow based tool based on the Oracle Advanced Analytics option. At OOW we got to hear about the various new Oracle Data Mining features that are coming with Oracle 12.2 Database. For Oracle Data Miner (ODMr) 4.2 (EA) there are a lot of new features but most of these are hidden and will only come available when you are using the Oracle 12.2 DB.

But if you are using a 12.1 (or earlier) then there are some new features. I've been having a bit of a look around the EA1 release to see what is new and available to us now (while we wait for 12.2).

If you are on Oracle 12.1 DB or earlier there are two main new features. These are a new Workflow Scheduler and being able to specify in-memory options for ODMr objects. These can be easily found on the ODMr menu bar, are highlighted in the following image.

NewImage

Let us now have a quick look at these.

ODMr Workflow Scheduler

The Workflow Scheduler allows us to take an ODMr Workflow and to use schedule it to run in the Oracle Database at a defined time or for a defined schedule. Previously we would have to write the SQL and PL/SQL code to enable the scheduling. Plus the ODMr schedule was outputted in a number of SQL scripts. So it was a little bit of challenge to get the workflow running on a regular basis.

Now with the new in-built ODMr Schedular we can quickly and easily do this without having to write a line of SQL or PL/SQL. The tool will look after the hard bit for us. We can schedule the entire workflow or certain parts of the workflow.

NewImage

When setting up your schedule you can pick the Start Date, how frequently you would like it run (daily, weekly, monthly or some other custom frequency), when it should end (never, after X number of runs or on a specific date). You can also re-use an existing schedule.

NewImage

For the advanced settings you can setup email notification, the job priority level, maximum run durations and limits, and timezone to use.

ODMr In-memory Options

To access the in-memory options you can click on the 'Performance Options' button on the ODMr menu or you can access it via the menu (Tools -> Preferences) to get the complete list of in-memory settings.

NewImage

When you use ODMr to build your data mining workflows, ODMr will create a number of objects for each of the nodes of the workflow. These are typically created as tables in your schema. The previous version of ODMr introduced the Performance Options, where you could set the degree of parallel to use for some Nodes and the underlying SQL and PL/SQL code that is generated.

Now we can specify if the tables created should be in-memory, and available of the significant performance response times when you are using the data in these tables. This is particularly useful as we work with larger and larger data sets and we want our lighting fast response from some of our data mining tasks.

In addition to turning on the in-memory option for certain nodes, we can also specify the in-memory configuration settings such as the level of Columnar Compression to use and the Priority Level.

NewImage

(I've been on the 12.2 beta so I've had a chance to try out many of the new features. There is some good stuff coming and I'll have blog posts about these when 12.2 comes GA)

Monday, September 26, 2016

Machine Learning notebooks (and Oracle)

Over the past 12 months there has been an increase in the number of Machine Learning notebooks becoming available.
What is a Machine Learning notebook?
As the name implies it can be used to perform machine learning using one or more languages and allows you to organise your code, scripts and other details in one application.
The ML notebooks provide an interactive environment (sometimes browser based) that allows you to write, run, view results, share/collaborate code and results, visualise data, etc.
Some of these ML notebooks come with one language and others come with two or more languages, and have the ability to add other ML related languages. The most common languages are Spark, Phython and R.
Based on these languages ML notebooks are typically used in the big data world and on Hadoop.
NewImage

Examples of Machine Learning notebooks include: (Starting with the more common ones)

Apache Zeppelin
Jupyter Notebook (formally known as IPython Notebook)
Azure ML R Notebook
Beaker Notebook
SageMath

At Oracle Open World (2016), Oracle announced that they are currently working creating their own ML notebook and it is based on Apache Zeppelin. They seemed to indicate that a beta version might be available in 2017. Here are some photos from that presentation, but with all things that Oracle talk about you have to remember and take into account their Safe Habor.
2016 09 22 12 43 41

I'm looking forward to getting my hands on this new product when it is available.

Friday, September 16, 2016

Oracle Text, Oracle R Enterprise and Oracle Data Mining - Part 4

This is the fourth blog post of a series on using Oracle Text, Oracle R Enterprise and Oracle Data Mining. Make sure to check out the previous blog posts as each one builds upon each other.

In this blog post, I will have an initial look at how you can use Oracle Text to perform document classification. In my next blog post, in the series, I will look at how you can use Oracle Data Mining with Oracle Text to perform classification.

The area of document classification using Oracle Text is a well trodden field and there are lots and lots of material out there to assist you. This blog post will look at the core steps you need to follow and how Oracle Text can help you with classifying your documents or text objects in a table.

When you use Oracle Text for documentation classification the simplest approach is to use 'Rule-based Classification'. With this approach you will defined a set of rules, when applied to the document will determine classification that will be assigned to the document.

There is a little bit of setup and configuration needed to make this happen. This includes the following.

Create a table that will store you document. See my previous blog posts in the series to see an example of one that is used to store the text from webpages.
Create a rules table. This will contain the classification label and then a set of rules that will be used by Oracle Text to determine that classification to assign to the document. These are in the format similar to what you might see in the WHERE clause of a SELECT statement. You will need follow the rules and syntax of CTXRULES to make sure your rules fire correctly.
Create a CTXRULE index on the rules table you created in the previous step.
Create a table that will be a link table between the table that contains your documents and the table that contains your categories.

When you have these steps completed you can now start classifying your documents. The following example illustrates using these steps using the text documents I setup in my previous blog posts.

Here is the structure of my documents table. I had also created an Oracle Text CTXSYS.CONTEXT index on the DOC_TEXT attribute.

create table MY_DOCUMENTS (	
 doc_pk			NUMBER(10) PRIMARY KEY, 
 doc_title		VARCHAR2(100), 
 doc_extracted 	DATE, 
 data_source 	VARCHAR2(200), 
 doc_text 		CLOB );

The next step is to create a table that contains our categories and rules. The structure of this table is very simple, and the following is an example.

create table DOCUMENT_CATEGORIES (
 doc_cat_pk  	NUMBER(10) PRIMARY KEY, 
 doc_category 	VARCHAR2(40),
 doc_cat_query  VARCHAR2(2000) );

create sequence doc_cat_seq;

Now we can create the table that will store the identified document categories/classifications for each of out documents. This is a link table that contains the primary keys from the MY_DOCUMENTS and the MY_DOCUMENT_CATEGORIES tables.

create table MY_DOC_CAT (
 doc_pk 	NUMBER(10), 
 doc_cat_pk NUMBER(10) );

Queries for CTXRULE are similar to those of CONTAINS queries. Basic phrasing within quotes is supported, as are the following CONTAINS operators: ABOUT, AND, NEAR, NOT, OR, STEM, WITHIN, and THESAURUS. The following statements contain my rules.

insert into document_categories values
  (doc_cat_seq.nextval, 'OAA','Oracle Advanced Analytics');

insert into document_categories values
  (doc_cat_seq.nextval, 'Oracle Data Mining','ODM or Oracle Data Mining');

insert into document_categories values
  (doc_cat_seq.nextval, 'Oracle Data Miner','ODMr or Oracle Data Miner or SQL Developer');

insert into document_categories values
  (doc_cat_seq.nextval, 'R Technologies','Oracle R Enterprise or ROacle or ORAACH or R');

We are now ready to create the Oracle Text CTXRULE index.

create index doc_cat_idx on document_categories(doc_cat_query) indextype is ctxsys.ctxrule;

Our next step is to apply the rules and to generate the categories/classifications. We have two scenarios to deal with here. The first is how do we do this for our existing records and the second to how can you do this ongoing as new documents get loaded into the MY_DOCUMENTS table.

For the first scenario, where the documents already exist in our table, we can can use a procedure, just like the following.

DECLARE
   v_document    MY_DOCUMENTS.DOC_TEXT%TYPE;
   v_doc         MY_DOCUMENTS.DOC_PK%TYPE;
BEGIN
   for doc in (select doc_pk, doc_text from my_documents) loop
      v_document := doc.doc_text;
      v_doc  := doc.doc_pk;
      for c in (select doc_cat_pk from document_categories
              where matches(doc_cat_query, v_document) > 0 )
         loop
            insert into my_doc_cat values (doc.doc_pk, c.doc_cat_pk);
      end loop;
   end loop;
END;
/

Let us have a look at the categories/classifications that were generated.

select a.doc_title, c.doc_cat_pk, b.doc_category
from my_documents a,
     document_categories b,
     my_doc_cat c
where a.doc_pk = c.doc_pk
and c.doc_cat_pk = b.doc_cat_pk
order by a.doc_pk, c.doc_cat_pk;

NewImage

We can see the the categorisation/classification actually gives us the results we would have expected of these documents/web pages.

Now we can look at how to generate these these categories/classifications on an on going basis. For this we will need a database trigger on the MY_DOCUMENTS table. Something like the following should do the trick.

CREATE or REPLACE TRIGGER t_cat_doc
  before insert on MY_DOCUMENTS
  for each row
BEGIN
  for c in (select doc_cat_pk from document_categories
            where  matches(doc_cat_query, :new.doc_text)>0)
  loop
        insert into my_doc_cat values (:new.doc_pk, c.doc_cat_pk);
  end loop;
END;

At this point we have now worked through how to build and use Oracle Text to perform Rule based document categorisation/classification.

In addition to this type of classification, Oracle Text also has uses some machine learning algorithms to classify documents. These include using Decision Trees, Support Vector Machines and Clustering. It is important to note that these are not the machine learning algorithms that come as part of Oracle Data Mining. Look out of my other blog posts that cover these topics.

Monday, September 12, 2016

My 3rd Book is now officially released

Today 12th September (2016) is the official release date of my 3rd book.

The title of the books is 'Oracle R Enterprise'. Make sure to check it out on Amazon.

It has been a busy 17 months, as you may have noticed that I had another book released a few weeks ago. Check it out here.

Yes, I was working on two books at the same time.

Yes, that was a lot of work, and looking back on it was a lot of fun too.

This new book (Oracle R Enterprise) is a good companion for my first book (Predictive Analytics using Oracle Data Miner), as I now have a book for each of the components of the Oracle Advanced Analytics option.

Here is what is on the back cover of the book.

"Effectively manage your enterprise’s big data and keep complex processes running smoothly using the hands-on information contained in this Oracle Press guide. Oracle R Enterprise: Harnessing the Power of R in Oracle Database shows, step-by-step, how to create and execute large-scale predictive analytics and maintain superior performance. Discover how to explore and prepare your data, accurately model business processes, generate sophisticated graphics, and write and deploy powerful scripts. You will also find out how to effectively incorporate Oracle R Enterprise features in APEX applications, OBIEE dashboards, and Apache Hadoop systems. Learn to: • Install, configure, and administer Oracle R Enterprise • Establish connections and move data to the database • Create Oracle R Enterprise packages and functions • Use the R language to work with data in Oracle Database • Build models using ODM, ORE, and other algorithms • Develop and deploy R scripts and use the R script repository • Execute embedded R scripts and employ ORE SQL API functions • Map and manipulate data using Oracle R Advanced Analytics for Hadoop • Use ORE in Oracle Data Miner, OBIEE, and other applications ... "

This books is ideally suited to people who are starting out with Oracle R Enterprise (ORE) or have some experience with using it, and want to see what you can do with it and how it can be used with other products like APEX, OBIEE, Hadoop and Spark. Yes I touch on these in the book. This book may also be of interest for those who are working with the products I've just listed and want to see how to use ORE.

If you are at Oracle Open World (OOW) next week make sure to check out the book in the Oracle Book Store, and if you buy a copy try to track me down to get me to sign it. The best way to do this is to contact me on Twitter, leave a message at the Oracle Press stand, or you will find me hanging out at the OTN Lounge.

A special thanks to my technical editor, Mark Hornick, who is a Director of Oracle Advanced Analytics Product Management, for Oracle's R Technologies.

Here are quotes from some people about the book.

The book ‘Oracle R Enterprise’, written by Brendan Tierney, is a valuable resource for any data scientist who wants to use the R language with the Oracle Database. It demonstrates very well the many features of Oracle R Enterprise, from performing simple analytics to utilising the many performance features of the Oracle Database, allowing you to work with all your datasets - Big or small. Additionally the book demonstrates how you can use the power of the R language with the SQL language as well as with other Oracle products including APEX and OBIEE, as well as Hadoop and Spark.

- John Donnelly - Regional Director, Oracle Ireland

The new book by Brendan Tierney, Oracle ACE Director, on Oracle R Enterprise details how users can gain maximal value out of the Oracle Database’s tight integration with the popular open source R statistical programming language. The author guides the R community into how they can, through the ease and familiarity of R, tap into the power of the Oracle Database Enterprise Edition with its Oracle Advanced Analytics Option or the Oracle Database Cloud Service. Brendan, an expert in this field, clearly articulates how to get quickly started and provides extensive “how to” examples and R scripts. Readers of the book can learn how they can access data directly in the Database, eliminate data movement while exploiting the openness and flexibility of R. Readers can then tap into the scalability and security of SQL of the Oracle Database and leverage Oracle’s proprietary, parallelized in-database machine learning algorithms and Oracle R Enterprise’s R “push down” to SQL functions. Read this book and learn how to leverage R and reduce model development and enterprise model deployment from days/weeks to minutes/hours!

-Charlie Berger

Sr. Director Product Management, Oracle Advanced Analytics and Machine Learning

"Brendan Tierney conveys very clearly all the aspects required for a successful Data Scientist that wants to work with large Databases and large Big Data clusters. It contains a great articulation of all aspects related to building and deploying Machine Learning algorithms in an Oracle Database environment with an overview on the algorithms on Hadoop clusters, as well as the integration with Business Intelligence dashboards and Applications. This is an essential reference for anyone in the Data Science field today working with Oracle Databases.

Marcos Arancibia, Product Manager, Oracle Data Science.

Tuesday, September 6, 2016

Change the size of ORE PNG graphics using in-database R functions

In a previous blog post I showed you how create and display a ggplot2 R graphic using SQL. Make sure to check it out before reading the rest of this blog post.
In my previous blog post, I showed and mentioned that the PNG graphic returned by the embedded R execution SQL statement was not the same as what was produced if you created the graphic in an R session.
Here is the same ggplot2 graphic. The first one is what is produced in an R session and the section is what is produced by SQL query and the embedded R execution in Oracle.
NewImage

As you can see the second image (produced using the embedded R execution) gives a very square image.
The reason for this is that Oracle R Enterprise (ORE) creates the graphic image in PNG format. The default setting from this is 480 x 480. You will find this information when you go digging in the R documentation and not in the Oracle documentation.
So, how can I get my ORE produced graphic to appear like what is produced in R?
What you need to do is to change the height and width of the PNG image produced by ORE. You can do this by passing parameters in the SQL statement used to call the user defined R function, that in turn produces the ggplot2 image.
In my previous post, I gave the SQL statement to call and produce the graphic (shown above). One of the parameters to the rqTableEval function was set to null. This was because we didn't have any parameters to pass, apart from the data set.
We can replace this null with any parameters we want to pass to the user defined R function (demo_ggpplot). To pass the parameters we need to define them using a SELECT statement.

cursor(select 500 as "ore.png.height", 850 as "ore.png.width" from dual),

The full SELECT statement now becomes

select *
from table(rqTableEval( cursor(select * from claims),
                        cursor(select 500 as "ore.png.height", 850 as "ore.png.width" from dual),
                        'PNG',
                        'demo_ggpplot'));

When you view the graphic in SQL Developer, you will get something that looks a bit more like what you would expect or want to see.
NewImage

For each graphic image you want to produce using ORE you will need to figure out that are the best PNG height and width settings to use. Plus it also depends on what tool or application you are going to use to display the images (eg. APEX etc)

Thursday, September 1, 2016

How to Display a BLOB image in an APEX Report

Do you want to display an image on a report in APEX ?

Is the image stored as a BLOB data type in your schema or the blob is returned by some functions?

If so, then displaying the BLOB is not a simple or straight forward task.

Actually it is a simple and straight forward task, as long as you know "the trick" you need to create/defined in your APEX report.

The following steps outlines what you need to do to create a report with a BLOB images. Most of these are the standard steps, except for Step 4. That is the important one.

1. Create the Report using the APEX wizard

Create a new report. In my example here I'm going to create a classic report.

Enter a title for the report, and accept the default settings NewImage

Create as new navigation menu entry

NewImage

2. Define the Table or Query for the Report

Select the table or view that contains the data or define the SQL Query to return the results. It might be best to select this later option as it will make things clearer and easier to change in Step 4.

NewImage

Click next on the next 2 screens of the wizard and then click the Create button.

3. Set the BLOB attribute settings

When you run the report you will get something like the following being displayed. As you can see it clearly does not display the BLOB image.

NewImage

Next we need to setup the BLOB attribute settings. As shown in the following.

Screenshot 2016 08 26 13 59 30

When we run the report now, we now get an error message.

NewImage

4. Change the report query to return the length of the BLOB

Now this is the magic bit.

To get the image to display you need to go back to the Report level and change the query in the SQL Query box, to contain function below that get the length of the image in the BLOB attribute, dbms_lob.getlength() (in my example this attribute is call IMAGE)

select ID,
       dbms_lob.getlength(image)  image
from V_DOCUMENT_TM_IMAGE

5. The BLOB object now appears :-)

That's it. Now when you run your report the image will be displayed.

So now you know how to display a BLOB image in an APEX Report.

(Thanks to Roel and Joel for the help in working out how to do this)

Wednesday, August 24, 2016

How to get ORE to work with APEX

This blog post will bring you through the steps of how to get Oracle R Enterprise (ORE) to work with APEX.

The reason for this blog posts is that since ORE 1.4+ the security model has changed for how you access and run in-database user defined R scripts using the ORE SQL API functions.

I have a series of blog posts going out on using Oracle Text, Oracle R Enterprise and Oracle Data Mining. It was during one of these posts I wanted to show how easy it was to display an R chart using ORE in APEX. Up to now my APEX environment consisted of APEX 4 and ORE 1.3. Everything worked, nice and easy. But in my new APEX environment (APEX 5 and ORE 1.5), it didn't work. This is the calling of an in-database user defined R script using the SQL API functions didn't work. Here is the error message that is displayed.

NewImage

So something extra was needed with using ORE 1.5. The security model around the use of in-database user defined R scripts has changed. Extra functions are now available to allow you who can run these scripts. For example we have an ore.grant function where you can grant another user the privilege to run the script.

But the problem was, when I was in APEX, the application was defined on the same schema that the r script was created in (this was the RQUSER schema). When I connect to the RQUSER schema using ORE and SQL, I was able to see and run this R script (see my previous blog post for these details). But when I was in APEX I wasn't able to see the R script. For example, when using the SQL Workshop in APEX, I just couldn't see the R script.

NewImage

Something strange is going on. It turns out that the view definitions for the in-database ORE scripts are defined with

owner=SYS_CONTEXT('USERENV', 'SESSION_USER');

(Thanks to the Oracle ORE team and the Oracle APEX team for their help in working out what needed to be done)

This means when I'm connected to APEX, using my schema (RQUSER), I'm not able to see any of my ORE objects.

How do you overcome this problem ?

To fix this problem, I needed to grant the APEX_PUBLIC_USER access to my ORE script.

ore.grant(name = "prepare_tm_data_2", type = "rqscript", user = "APEX_PUBLIC_USER")

Now when I query the ALL_RQ_SCRIPTS view again, using the APEX SQL Workshop, I now get the following.

NewImage

Great. Now I can see the ORE script in my schema.

Now when I run my APEX application I now get graphic produced by R, running on my DB server, and delivered to my APEX application using SQL (via a BLOB object), displayed on my screen.

NewImage

Pages

Wednesday, December 28, 2016

Monday, December 19, 2016

Monday, December 12, 2016

Wednesday, December 7, 2016

Monday, December 5, 2016

Tuesday, November 29, 2016

Monday, November 14, 2016

Wednesday, November 9, 2016

Tuesday, November 1, 2016

Monday, October 31, 2016

Tuesday, October 25, 2016

Saturday, October 22, 2016

Monday, October 17, 2016

Tuesday, October 11, 2016

Monday, October 10, 2016

Wednesday, October 5, 2016

Monday, September 26, 2016

Friday, September 16, 2016

Monday, September 12, 2016

Tuesday, September 6, 2016

Thursday, September 1, 2016

Wednesday, August 24, 2016