Monday, September 26, 2016

Machine Learning notebooks (and Oracle)

Over the past 12 months there has been an increase in the number of Machine Learning notebooks becoming available.
What is a Machine Learning notebook?
As the name implies it can be used to perform machine learning using one or more languages and allows you to organise your code, scripts and other details in one application.
The ML notebooks provide an interactive environment (sometimes browser based) that allows you to write, run, view results, share/collaborate code and results, visualise data, etc.
Some of these ML notebooks come with one language and others come with two or more languages, and have the ability to add other ML related languages. The most common languages are Spark, Phython and R.
Based on these languages ML notebooks are typically used in the big data world and on Hadoop.
NewImage
Examples of Machine Learning notebooks include: (Starting with the more common ones)
  • Apache Zeppelin
  • Jupyter Notebook (formally known as IPython Notebook)
  • Azure ML R Notebook
  • Beaker Notebook
  • SageMath
At Oracle Open World (2016), Oracle announced that they are currently working creating their own ML notebook and it is based on Apache Zeppelin. They seemed to indicate that a beta version might be available in 2017. Here are some photos from that presentation, but with all things that Oracle talk about you have to remember and take into account their Safe Habor.
2016 09 22 12 43 41 2016 09 22 12 45 53 2016 09 21 12 16 09
I'm looking forward to getting my hands on this new product when it is available.

Friday, September 16, 2016

Oracle Text, Oracle R Enterprise and Oracle Data Mining - Part 4

This is the fourth blog post of a series on using Oracle Text, Oracle R Enterprise and Oracle Data Mining. Make sure to check out the previous blog posts as each one builds upon each other.

In this blog post, I will have an initial look at how you can use Oracle Text to perform document classification. In my next blog post, in the series, I will look at how you can use Oracle Data Mining with Oracle Text to perform classification.

The area of document classification using Oracle Text is a well trodden field and there are lots and lots of material out there to assist you. This blog post will look at the core steps you need to follow and how Oracle Text can help you with classifying your documents or text objects in a table.

When you use Oracle Text for documentation classification the simplest approach is to use 'Rule-based Classification'. With this approach you will defined a set of rules, when applied to the document will determine classification that will be assigned to the document.

There is a little bit of setup and configuration needed to make this happen. This includes the following.

  • Create a table that will store you document. See my previous blog posts in the series to see an example of one that is used to store the text from webpages.
  • Create a rules table. This will contain the classification label and then a set of rules that will be used by Oracle Text to determine that classification to assign to the document. These are in the format similar to what you might see in the WHERE clause of a SELECT statement. You will need follow the rules and syntax of CTXRULES to make sure your rules fire correctly.
  • Create a CTXRULE index on the rules table you created in the previous step.
  • Create a table that will be a link table between the table that contains your documents and the table that contains your categories.

When you have these steps completed you can now start classifying your documents. The following example illustrates using these steps using the text documents I setup in my previous blog posts.

Here is the structure of my documents table. I had also created an Oracle Text CTXSYS.CONTEXT index on the DOC_TEXT attribute.

create table MY_DOCUMENTS (	
 doc_pk			NUMBER(10) PRIMARY KEY, 
 doc_title		VARCHAR2(100), 
 doc_extracted 	DATE, 
 data_source 	VARCHAR2(200), 
 doc_text 		CLOB );
The next step is to create a table that contains our categories and rules. The structure of this table is very simple, and the following is an example.
create table DOCUMENT_CATEGORIES (
 doc_cat_pk  	NUMBER(10) PRIMARY KEY, 
 doc_category 	VARCHAR2(40),
 doc_cat_query  VARCHAR2(2000) );

create sequence doc_cat_seq;

Now we can create the table that will store the identified document categories/classifications for each of out documents. This is a link table that contains the primary keys from the MY_DOCUMENTS and the MY_DOCUMENT_CATEGORIES tables.

create table MY_DOC_CAT (
 doc_pk 	NUMBER(10), 
 doc_cat_pk NUMBER(10) );

Queries for CTXRULE are similar to those of CONTAINS queries. Basic phrasing within quotes is supported, as are the following CONTAINS operators: ABOUT, AND, NEAR, NOT, OR, STEM, WITHIN, and THESAURUS. The following statements contain my rules.

insert into document_categories values
  (doc_cat_seq.nextval, 'OAA','Oracle Advanced Analytics');

insert into document_categories values
  (doc_cat_seq.nextval, 'Oracle Data Mining','ODM or Oracle Data Mining');

insert into document_categories values
  (doc_cat_seq.nextval, 'Oracle Data Miner','ODMr or Oracle Data Miner or SQL Developer');

insert into document_categories values
  (doc_cat_seq.nextval, 'R Technologies','Oracle R Enterprise or ROacle or ORAACH or R');

We are now ready to create the Oracle Text CTXRULE index.

create index doc_cat_idx on document_categories(doc_cat_query) indextype is ctxsys.ctxrule;

Our next step is to apply the rules and to generate the categories/classifications. We have two scenarios to deal with here. The first is how do we do this for our existing records and the second to how can you do this ongoing as new documents get loaded into the MY_DOCUMENTS table.

For the first scenario, where the documents already exist in our table, we can can use a procedure, just like the following.

DECLARE
   v_document    MY_DOCUMENTS.DOC_TEXT%TYPE;
   v_doc         MY_DOCUMENTS.DOC_PK%TYPE;
BEGIN
   for doc in (select doc_pk, doc_text from my_documents) loop
      v_document := doc.doc_text;
      v_doc  := doc.doc_pk;
      for c in (select doc_cat_pk from document_categories
              where matches(doc_cat_query, v_document) > 0 )
         loop
            insert into my_doc_cat values (doc.doc_pk, c.doc_cat_pk);
      end loop;
   end loop;
END;
/

Let us have a look at the categories/classifications that were generated.

select a.doc_title, c.doc_cat_pk, b.doc_category
from my_documents a,
     document_categories b,
     my_doc_cat c
where a.doc_pk = c.doc_pk
and c.doc_cat_pk = b.doc_cat_pk
order by a.doc_pk, c.doc_cat_pk;

NewImage

We can see the the categorisation/classification actually gives us the results we would have expected of these documents/web pages.

Now we can look at how to generate these these categories/classifications on an on going basis. For this we will need a database trigger on the MY_DOCUMENTS table. Something like the following should do the trick.

CREATE or REPLACE TRIGGER t_cat_doc
  before insert on MY_DOCUMENTS
  for each row
BEGIN
  for c in (select doc_cat_pk from document_categories
            where  matches(doc_cat_query, :new.doc_text)>0)
  loop
        insert into my_doc_cat values (:new.doc_pk, c.doc_cat_pk);
  end loop;
END;

At this point we have now worked through how to build and use Oracle Text to perform Rule based document categorisation/classification.

In addition to this type of classification, Oracle Text also has uses some machine learning algorithms to classify documents. These include using Decision Trees, Support Vector Machines and Clustering. It is important to note that these are not the machine learning algorithms that come as part of Oracle Data Mining. Look out of my other blog posts that cover these topics.

Monday, September 12, 2016

My 3rd Book is now officially released

Today 12th September (2016) is the official release date of my 3rd book.

The title of the books is 'Oracle R Enterprise'. Make sure to check it out on Amazon.

It has been a busy 17 months, as you may have noticed that I had another book released a few weeks ago. Check it out here.

Yes, I was working on two books at the same time.

Yes, that was a lot of work, and looking back on it was a lot of fun too.

This new book (Oracle R Enterprise) is a good companion for my first book (Predictive Analytics using Oracle Data Miner), as I now have a book for each of the components of the Oracle Advanced Analytics option.

NewImage NewImage

Here is what is on the back cover of the book.

"Effectively manage your enterprise’s big data and keep complex processes running smoothly using the hands-on information contained in this Oracle Press guide. Oracle R Enterprise: Harnessing the Power of R in Oracle Database shows, step-by-step, how to create and execute large-scale predictive analytics and maintain superior performance. Discover how to explore and prepare your data, accurately model business processes, generate sophisticated graphics, and write and deploy powerful scripts. You will also find out how to effectively incorporate Oracle R Enterprise features in APEX applications, OBIEE dashboards, and Apache Hadoop systems. Learn to: • Install, configure, and administer Oracle R Enterprise • Establish connections and move data to the database • Create Oracle R Enterprise packages and functions • Use the R language to work with data in Oracle Database • Build models using ODM, ORE, and other algorithms • Develop and deploy R scripts and use the R script repository • Execute embedded R scripts and employ ORE SQL API functions • Map and manipulate data using Oracle R Advanced Analytics for Hadoop • Use ORE in Oracle Data Miner, OBIEE, and other applications ... "

This books is ideally suited to people who are starting out with Oracle R Enterprise (ORE) or have some experience with using it, and want to see what you can do with it and how it can be used with other products like APEX, OBIEE, Hadoop and Spark. Yes I touch on these in the book. This book may also be of interest for those who are working with the products I've just listed and want to see how to use ORE.

If you are at Oracle Open World (OOW) next week make sure to check out the book in the Oracle Book Store, and if you buy a copy try to track me down to get me to sign it. The best way to do this is to contact me on Twitter, leave a message at the Oracle Press stand, or you will find me hanging out at the OTN Lounge.

A special thanks to my technical editor, Mark Hornick, who is a Director of Oracle Advanced Analytics Product Management, for Oracle's R Technologies.

Here are quotes from some people about the book.

The book ‘Oracle R Enterprise’, written by Brendan Tierney, is a valuable resource for any data scientist who wants to use the R language with the Oracle Database. It demonstrates very well the many features of Oracle R Enterprise, from performing simple analytics to utilising the many performance features of the Oracle Database, allowing you to work with all your datasets - Big or small. Additionally the book demonstrates how you can use the power of the R language with the SQL language as well as with other Oracle products including APEX and OBIEE, as well as Hadoop and Spark.

- John Donnelly - Regional Director, Oracle Ireland

The new book by Brendan Tierney, Oracle ACE Director, on Oracle R Enterprise details how users can gain maximal value out of the Oracle Database’s tight integration with the popular open source R statistical programming language. The author guides the R community into how they can, through the ease and familiarity of R, tap into the power of the Oracle Database Enterprise Edition with its Oracle Advanced Analytics Option or the Oracle Database Cloud Service. Brendan, an expert in this field, clearly articulates how to get quickly started and provides extensive “how to” examples and R scripts. Readers of the book can learn how they can access data directly in the Database, eliminate data movement while exploiting the openness and flexibility of R. Readers can then tap into the scalability and security of SQL of the Oracle Database and leverage Oracle’s proprietary, parallelized in-database machine learning algorithms and Oracle R Enterprise’s R “push down” to SQL functions. Read this book and learn how to leverage R and reduce model development and enterprise model deployment from days/weeks to minutes/hours!

-Charlie Berger

Sr. Director Product Management, Oracle Advanced Analytics and Machine Learning

"Brendan Tierney conveys very clearly all the aspects required for a successful Data Scientist that wants to work with large Databases and large Big Data clusters. It contains a great articulation of all aspects related to building and deploying Machine Learning algorithms in an Oracle Database environment with an overview on the algorithms on Hadoop clusters, as well as the integration with Business Intelligence dashboards and Applications. This is an essential reference for anyone in the Data Science field today working with Oracle Databases.

Marcos Arancibia, Product Manager, Oracle Data Science.

Tuesday, September 6, 2016

Change the size of ORE PNG graphics using in-database R functions

In a previous blog post I showed you how create and display a ggplot2 R graphic using SQL. Make sure to check it out before reading the rest of this blog post.
In my previous blog post, I showed and mentioned that the PNG graphic returned by the embedded R execution SQL statement was not the same as what was produced if you created the graphic in an R session.
Here is the same ggplot2 graphic. The first one is what is produced in an R session and the section is what is produced by SQL query and the embedded R execution in Oracle.
NewImage NewImage
As you can see the second image (produced using the embedded R execution) gives a very square image.
The reason for this is that Oracle R Enterprise (ORE) creates the graphic image in PNG format. The default setting from this is 480 x 480. You will find this information when you go digging in the R documentation and not in the Oracle documentation.
So, how can I get my ORE produced graphic to appear like what is produced in R?
What you need to do is to change the height and width of the PNG image produced by ORE. You can do this by passing parameters in the SQL statement used to call the user defined R function, that in turn produces the ggplot2 image.
In my previous post, I gave the SQL statement to call and produce the graphic (shown above). One of the parameters to the rqTableEval function was set to null. This was because we didn't have any parameters to pass, apart from the data set.
We can replace this null with any parameters we want to pass to the user defined R function (demo_ggpplot). To pass the parameters we need to define them using a SELECT statement.
cursor(select 500 as "ore.png.height", 850 as "ore.png.width" from dual),
The full SELECT statement now becomes
select *
from table(rqTableEval( cursor(select * from claims),
                        cursor(select 500 as "ore.png.height", 850 as "ore.png.width" from dual),
                        'PNG',
                        'demo_ggpplot'));
When you view the graphic in SQL Developer, you will get something that looks a bit more like what you would expect or want to see.
NewImage
For each graphic image you want to produce using ORE you will need to figure out that are the best PNG height and width settings to use. Plus it also depends on what tool or application you are going to use to display the images (eg. APEX etc)

Thursday, September 1, 2016

How to Display a BLOB image in an APEX Report

Do you want to display an image on a report in APEX ?

Is the image stored as a BLOB data type in your schema or the blob is returned by some functions?

If so, then displaying the BLOB is not a simple or straight forward task.

Actually it is a simple and straight forward task, as long as you know "the trick" you need to create/defined in your APEX report.

The following steps outlines what you need to do to create a report with a BLOB images. Most of these are the standard steps, except for Step 4. That is the important one.

1. Create the Report using the APEX wizard

Create a new report. In my example here I'm going to create a classic report.

NewImage Enter a title for the report, and accept the default settings NewImage

Create as new navigation menu entry

NewImage

2. Define the Table or Query for the Report

Select the table or view that contains the data or define the SQL Query to return the results. It might be best to select this later option as it will make things clearer and easier to change in Step 4.

NewImage

Click next on the next 2 screens of the wizard and then click the Create button.

3. Set the BLOB attribute settings

When you run the report you will get something like the following being displayed. As you can see it clearly does not display the BLOB image.

NewImage

Next we need to setup the BLOB attribute settings. As shown in the following.

Screenshot 2016 08 26 13 59 30

When we run the report now, we now get an error message.

NewImage

4. Change the report query to return the length of the BLOB

Now this is the magic bit.

To get the image to display you need to go back to the Report level and change the query in the SQL Query box, to contain function below that get the length of the image in the BLOB attribute, dbms_lob.getlength() (in my example this attribute is call IMAGE)

select ID,
       dbms_lob.getlength(image)  image
from V_DOCUMENT_TM_IMAGE
Screenshot 2016 08 26 14 07 59

5. The BLOB object now appears :-)

That's it. Now when you run your report the image will be displayed.

NewImage

So now you know how to display a BLOB image in an APEX Report.

(Thanks to Roel and Joel for the help in working out how to do this)