Brendan Tierney - Oralytics Blog: January 2016

Wednesday, January 27, 2016

BIWA Summit : HOL notes for ODM using SQL

The following notes and documents are for the BIWA Summit 2016 attendees who are taking the HOL on using ODM using SQL and PL/SQL.

Getting connected to the Cloud

The following document outlines the steps you need to perform to get connected to the Cloud Database we are using.

If you have attended the HOL (yesterday) then you can reuse the same connection details (and number).

If you didn't attend the HOL yesterday then you will need to be assigned a Number Ask me or Charlie for this. After you get assigned a number then follow the instructions.

It might be a good idea to download this document to your local machine.

HOL Notes

The following links are for the HOL documents. The first is the slides we will work through and this document contains the exercises that you will complete.

The second document is demonstration script and contains all the code that is in the HOL slides document. Download load this file and open it in Worksheet in SQL Developer that is connected to your schema in the database. (Do not use the ODM connections and no need to open the ODM GUI).

Wednesday, January 20, 2016

BIWA Summit 2016

The annual BIWA Summit 2016 will be next week from the 26-28 January, and it is back in Oracle HQ at Redwood Shores. If you are into the Oracle Database, Business Intelligence, Big Data, Advanced Analytics, etc then this is the conference for you.

Over the 3 days there is an action packed agenda of 5 parallel tracks, plus a full 3 days of Hands-on Labs. The agenda is filled with the whos-who of the Oracle BI and Analytics world, so if this is your area then BIWA Summit is the conference for you and your training budget. (I'm sure it is not too late to book your place)

I've been lucky this year in that I will have 2 Hands-on Labs and 1 presentation to give. Yes that is 5 hours of presenting/hosting to do. The presentation I will be giving is 'Is Oracle SQL the best language for Statistics?' (on Tuesday 26th). This presentation is listed for the BIWA Summit and also for the NoCOUG Yes SQL conference that is running at the same time in the same venue (i.e. co-located). I've also written a brand new 2-hour Hands-on Lab titled 'Predictive Analytics using SQL and PL/SQL'. The first outing for this will be on Wednesday 27th. I will also be co-hosting, with Charlie Berger, the 'Learn Predictive Analytics in 2hours with Oracle Data Miner' Hands-on Lab on Tuesday 26th.

Come to my Hands-on Labs to be in with chance to win a copy of my book on Oracle Data Mining.

Hopefully I'll see you there!

Check out the full agenda by clicking on the image below.

Tuesday, January 19, 2016

ORE video : Demo Code Part 4

The following is the fourth set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.

The following code example illustrate how you can build a Data Mining model using the in-database data mining algorithms. In this example a Decision Tree model is created. This model is then applied to new data, scoring this data with the predicted values.

> #
> # Build am in-database ODM Decision Tree
> #
> dtData <- ore.get("MINING_DATA_BUILD_V")
> # Create a ODM DT model in the DB : Only a temporary model. It is deleted when you logout
> dtModel <- ore.odmDT(AFFINITY_CARD ~ ., dtData)
> # View the details of the ODM model
> #summary(dtModel)
> names(dtModel)
 [1] "name"          "settings"      "attributes"    "costs"         "distributions”
 [6] "nodes"         "formula"       "extRef"        "call"         
> dtModel$name
 [1] "ORE$208_210”
> dtModel$settings
                          value
prep.auto                    on
impurity.metric   impurity.gini
term.max.depth                7
term.minpct.node           0.05
term.minpct.split           0.1
term.minrec.node             10
term.minrec.split            20
> dtModel$attributes
                 name        type data.type data.length precision scale is.target
1       AFFINITY_CARD categorical    number          22         0     0      TRUE
2                 AGE   numerical    number          22        NA    NA     FALSE
3 CUST_MARITAL_STATUS categorical  varchar2          20        NA    NA     FALSE
4           EDUCATION categorical  varchar2          21        NA    NA     FALSE
5      HOUSEHOLD_SIZE categorical  varchar2          21        NA    NA     FALSE
6          OCCUPATION categorical  varchar2          21        NA    NA     FALSE
7       YRS_RESIDENCE   numerical    number          22        NA    NA     FALSE
>

> ## Compute the Compusion Matrix
> dtResults <- predict(dtModel, dtData, "AFFINITY_CARD")
> with(dtResults, table(AFFINITY_CARD, PREDICTION))
             PREDICTION
AFFINITY_CARD    0    1
            0 1056   64
            1  201  179
> ## How do you persist the model in the DB
> ##     Rename and save the model in the database
> dtModel$name
 [1] "ORE$208_210"

> ## Save the ODM model in the in-database R datastore
> ore.save(dtModel, name = "ORE_MODELS", overwrite=TRUE)
> ore.load(name = "ORE_MODELS")
 [1] "dtModel"

> ## Score new data using the DM Model
> ore.sync(table = c("MINING_DATA_APPLY"))
> ore.ls()
 [1] "DEMO_R_APPLY_RESULT"   "DEMO_R_TABLE"          "DEMO_SUBSET_TABLE"    
 [4] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_APPLY"     "MINING_DATA_BUILD_V"  
 [7] "MINING_DATA_TEST_V"   > dtApply <- ore.get("MINING_DATA_APPLY")
> dim(dtApply)
 [1] 1500   18
> class(dtApply)
 [1] "ore.frame”
 attr(,"package")
 [1] "OREbase”
> DTAPPLY <- ore.push(dtApply)
> dtApplyResult <- predict(dtModel, DTAPPLY)

> dtApplyResult <- predict(dtModel, DTAPPLY)
> head(dtApplyResult)
             '0'        '1' PREDICTION
100001 0.9521912 0.04780876          0
100002 0.9521912 0.04780876          0
100003 0.9521912 0.04780876          0
100004 0.9521912 0.04780876          0
100005 0.2633745 0.73662551          1
100006 0.9521912 0.04780876          0
> dim(dtApplyResult)
 [1] 1500    3
> dim(dtApply)
 [1] 1500   18
> dtResults <- cbind(dtApply, dtApplyResult)
> dim(dtResults)
 [1] 1500   21
> ore.drop(table = "DEMO_R_APPLY_RESULT")
> ore.create(dtApplyResult, table="DEMO_R_APPLY_RESULT")
> ## Run the following for the first time you will rename a mode
> # ore.exec(paste("BEGIN> 
  #                  DBMS_DATA_MINING.RENAME_MODEL(model_name => '", dtModel$name, "',> 
  #                      new_model_name => 'DEMO_R_DT_MODEL'); END;",sep=""))> 
  ## Run the following to refresh an existing model
> ore.exec(paste("BEGIN
+ DBMS_DATA_MINING.DROP_MODEL('DEMO_R_DT_MODEL');
+ DBMS_DATA_MINING.RENAME_MODEL(model_name => '", dtModel$name,"',
+ new_model_name => 'DEMO_R_DT_MODEL');
+ END;",sep=""))

Tuesday, January 12, 2016

ORE video : Demo Code Part 3

The following is the third set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.

The following code is illustrates some simple examples of using Oracle R Enterprise. In these example you will see how to connect to the Oracle Database, how to query and process some of the tables and views in the Oracle Database, how to check that you are working with objects in the database, how to move data to the database and query it.

> library(ORE)
> # ore.connect(user="rquser", sid="orcl", host="localhost", password="rquser", port=1521, all=TRUE);
> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);
> # Test the connection
> ore.is.connected()
 [1] TRUE
> # List all the tables and views
> ore.ls()
character(0)
> # Use ore.sync to only include the tables and views listed
> ore.sync()
> ore.ls()
 [1] "DEMO_R_APPLY_RESULT"      "DEMO_R_TABLE"             "INSUR_CUST_LTV_SAMPLE"    "MINING_DATA_APPLY"       
 [5] "MINING_DATA_APPLY_V"      "MINING_DATA_BUILD_V"      "MINING_DATA_TEST_V"       "MINING_DATA_TEXT_APPLY_V”
 [9] "MINING_DATA_TEXT_BUILD_V" "MINING_DATA_TEXT_TEST_V" 
> # Disconnect and reattached with no meta-data sync
> ore.disconnect()
> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);
> ore.sync(table = c("MINING_DATA_BUILD_V", "MINING_DATA_TEST_V", "INSUR_CUST_LTV_SAMPLE"))
> ore.ls()
 [1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V"   "MINING_DATA_TEST_V"   
> # Check for the existance of a table or view
> ore.exists("MINING_DATA_BUILD_V")
 [1] TRUE
> # list the objects in the DMUSER schema
> ore.ls("DMUSER")
 [1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V"   "MINING_DATA_TEST_V" 
> #
> # Load data from a file into a new table
> ore.exists("DEMO_R_TABLE")
 [1] TRUE
> ore.drop(table='DEMO_R_TABLE')
> ore.ls()
 [1] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_BUILD_V"   "MINING_DATA_TEST_V"   
> titanic <- read.table("c:/R/titanic2.txt", header=T, sep="\t")
> ore.create(titanic, table="DEMO_R_TABLE")
> tData <- ore.get("DEMO_R_TABLE")
> head(tData)
                 NAME PCLASS AGE    SEX SURVIVED
1 Fynney, Mr Joseph J    2nd  35   male        0
2      Gale, Mr Harry    2nd  35   male        0
3   Gale, Mr Shadrach    2nd  38   male        0
4 Garside, Miss Ethel    2nd  24 female        1
5  Gaskell, Mr Alfred    2nd  16   male        0
6  Gavey, Mr Lawrence    2nd  26   male        0
> # Use ORE to pull data from the Database to local R
> # ore.pull  -- United States of America
> mdbv <- ore.get("MINING_DATA_BUILD_V")
> mdbv_data <- ore.pull(mdbv)
Warning message:ORE object has no unique key - using random order 
> head(mdbv_data,3)
  CUST_ID CUST_GENDER AGE CUST_MARITAL_STATUS             COUNTRY_NAME    CUST_INCOME_LEVEL EDUCATION OCCUPATION
1  101501           F  41              NeverM United States of America J: 190,000 - 249,999   Masters      Prof.
2  101502           M  27              NeverM United States of America I: 170,000 - 189,999     Bach.      Sales
3  101503           F  20              NeverM United States of America H: 150,000 - 169,999   HS-grad    Cleric.
  HOUSEHOLD_SIZE YRS_RESIDENCE AFFINITY_CARD BULK_PACK_DISKETTES FLAT_PANEL_MONITOR HOME_THEATER_PACKAGE
1              2             4             0                   1                  1                    1
2              2             3             0                   1                  1                    0
3              2             2             0                   1                  0                    0
  BOOKKEEPING_APPLICATION PRINTER_SUPPLIES Y_BOX_GAMES OS_DOC_SET_KANJI
1                       1                1           0                0
2                       1                1           1                0
3                       1                1           1                0
> class(mdbv_data)
[1] "data.frame”
> summary(mdbv_data)

Wednesday, January 6, 2016

ORE video : Demo Code Part 2

The following is the second set of demo code from my video on using R in the Oracle Database. Check out the video before using the following code. The blog post for the video will be updated to contain links to all blog posts that have the various demo code.

The following code gives a very quick demonstration of using the RORACLE R package to access the data in your Oracle schema. ROracle has a number of advantages over using RJDBC and most of the advantages are about the performance improvements. Typically when using ROracle you will see a many fold improvement with selecting data and moving it to your R client, processing data in the database and also writing data back to the Oracle Database. In some tests you can see a 7 times improvement in performance over RJDBC. Now that is a big difference.

But the problem with ROracle is that it is only available on certain platforms/OS. For example it is not officially available for the Mac. But if you google this issue carefully you will find unofficial ways over coming this problem.

ROracle is dependent on Oracle Client. So you will need to have Oracle Client installed on you machine and have it available on the search path.

When you have Oracle Client installed and the ROracle R package installed you are ready to start using it.

So here is the demo code from the video.

> library(ROracle)
> drv <- dbDriver("Oracle")
> # Create the connection string
> host <- "localhost"
> port <- 1521
> sid <- "orcl"
>connect.string <- paste("(DESCRIPTION=”, "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
>    "(CONNECT_DATA=(SID=", sid, ")))", sep = "")

> con <- dbConnect(drv, username = "dmuser", password = "dmuser",dbname=connect.string)

> rs <- dbSendQuery(con, "select view_name from user_views")
> # fetch records from the resultSet into a data.frame
> data <- fetch(rs)
> # extract all rows
> dim(data)
[1] 6 1
> data
                  VIEW_NAME
1       MINING_DATA_APPLY_V
2       MINING_DATA_BUILD_V
3        MINING_DATA_TEST_V
4  MINING_DATA_TEXT_APPLY_V
5  MINING_DATA_TEXT_BUILD_V
6   MINING_DATA_TEXT_TEST_V
> dbCommit(con)
> dbClearResult(rs)
> dbDisconnect(con)

Pages