Monday, May 26, 2014

Oracle R Enterprise (ORE) Tasks for the Oracle DBA

In previous posts I gave the steps required to install Oracle R Enterprise on your Database server and your client machine.

One of the steps that I gave was the initial set of Database privileges that the DB needed to give to the RQUSER. The RQUSER is a little bit like the SCOTT/TIGER schema in the Oracle Database. Setting up the RQUSER as part of the installation process allows you to test that you can connect to the database using ORE and that you can issue some ORE commands.

After the initial testing of the ORE install you might consider locking this RQUSER schema or dropping it from the Database.

So when a new ORE user wants access to the database what steps does the DBA have to perform.

  1. Create a new schema for the user
  2. Grant the new schema the standard set of privileges to connect to the DB, create objects, etc.
  3. Create any data sets in their schema
  4. Create any views to data that exists in other schemas (and grant the necessary privileges, etc

Now we get onto the ORE specific privileges. The following are the minimum required for your user to be able to connect to their Oracle schema using ORE.

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

In most cases the first 3 privileges (TABLE, PROCEDURE and VIEW) will be standard for most schemas that you will set up. So in reality the only command or extra privilege that you will need to execute is:

GRANT CREATE MINING MODEL TO RQUSER;

This command will allow the user to connect to their Oracle schema using ORE, but what it will not allow them to do is to create any embedded R. These are R scripts that are stored in the database and can be called in their R/ORE scripts or by using the SQL API to R (I'll have more blog posts on these soon). To allow the user to create and use embedded R the DBA will also have to grant the following privilege as SYS:

GRANT RQADMIN to RQUSER;

To summarise the DBA will have to grant the following to each schema that wants to use the full power of ORE.

GRANT CREATE MINING MODEL TO RQUSER;

GRANT RQADMIN to RQUSER;

A note of Warning: Be careful what schemas you grant the RQADMIN privilege to. It is a powerful privilege and opens the database to the powerful features of R. So using the typical DBA best practice of granting privileges, the DBA should only grant the RQADMIN privilege to only the people who require it.

Monday, May 12, 2014

Getting Started with ROracle

There are many different ways for you to connect to a database using R. You can setup an RODBC connection, use RJDBC, use Oracle R Enterprise (ORE), etc. But if you are an Oracle user you will want to be able to connect to your Oracle databases and be able to access the data as quickly as possible.

The problem with RODBC and RJDBC connections is that they are really designed to process small amounts of data. As your database and data grows, particularly in the Big Data World then using these type of connections soon become a bottleneck. Another alternative is to use Oracle R Enterprise, but if you do then you have to pay extra licence fees. Again this may not be an option.

An alternative is to use the ROracle package. This R package that is supplied by Oracle, for FREE!!!, allows you to setup connections that utilise the Oracle Client software that you will have installed on your client PCs/laptops etc. Because it utilises the communication technology of Oracle Client you are going to get really good performance and opens up the possibility of processing your Big Data is a reasonable amount of time.

The following steps brings your through the various steps involved in getting ROracle installed, how to connect to the database and how to execute some simple commands. At the end of the post I will point you towards some performance evaluations that have been conducted comparing ROracle to other connection methods.

Installing ROracle (on Client and on the Server)

The first step you need to perform is to install ROracle. If you are installing this on your client machine then you can install it into your R directory. If you are installing ROracle on your server and you have ORE already installed then you can install it in the ORE directory. If you do not have ORE installed on your server but you have R installed then install ROracle in your R directory on the server.

To install ROracle you can run the following command:

install.packages("ROracle") Or select the Install Packages menu option from the R Gui menu or from the RStudio menu. You will get prompted for the R home to install the ROracle package in. ROracle1

You may get some warning messages about some other packages and if the ROracle package was compiled using a slightly different version of R. These are just warning messages and everything should work OK. If you get an error message then you will need to check out what is causing it.

As part of the process the R process will be restarted.


Connecting to your Oracle Database

Now that you have ROracle installed the next step is to test that you can connect to your database. The following commands loads the ROracle package, defines the Oracle driver, sets-up the connection information and then establishes the connection.

> library(ROracle)

> drv <- dbDriver("Oracle")

> # Create the connection string

> host <- "localhost"

> port <- 1521

> sid <- "orcl"

>connect.string <- paste(

>    "(DESCRIPTION=",

>    "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",

>    "(CONNECT_DATA=(SID=", sid, ")))", sep = "")

con <- dbConnect(drv, username = "dmuser", password = "dmuser",dbname=connect.string)

At this point you are now connected to the DMUSER schema in the Oracle database.


Issuing a Query and processing the results

How we can run our queries on objects in our schema (DMUSER). The following commands sends a query to the schema (to bring back all the view names) and returns the results into rs. The contents of rs is then mapped into data. The dim(data) command returns the number of records in the result set (in the example this is 6) and the number of columns (1 in our example).

> rs <- dbSendQuery(con, "select view_name from user_views")

> # fetch records from the resultSet into a data.frame

> data <- fetch(rs)

> # extract all rows

> dim(data)

[1] 6 1

> data

                  VIEW_NAME

1       MINING_DATA_APPLY_V

2       MINING_DATA_BUILD_V

3        MINING_DATA_TEST_V

4  MINING_DATA_TEXT_APPLY_V

5  MINING_DATA_TEXT_BUILD_V

6   MINING_DATA_TEXT_TEST_V


Checking Query meta-data

ROracle allows us to find out information or meta-data regarding the execution of the query. The following commands list the various meta-data available for the query, then gets the meta-data for the query we ran in the above step (results are in rs) and then displays the meta-data for the rs query results.

> # Get the meta-data about the query

> names(dbGetInfo(rs))

[1] "statement" "isSelect" "rowsAffected" "rowCount" "completed" "prefetch"

[7] "bulk_read" "fields"

> rsInfo <- dbGetInfo(rs)

> rsInfo

$statement

[1] "select view_name from user_views"

$isSelect

[1] TRUE

$rowsAffected

[1] 0

$rowCount

[1] 6

$completed

[1] TRUE

$prefetch

[1] FALSE

$bulk_read

[1] 1000

$fields

name Sclass type len precision scale nullOK

1 VIEW_NAME character VARCHAR2 30 0 0 FALSE


Checking Database meta-data

You can also get some meta-data about your connection to the database. The following commands lists the meta-data available for a connection, gets the meta-data for the connection and then displays the meta-data for the current connection (con).

> # Get the meta-data about the connection ot the database

> names(dbGetInfo(con))

[1] "username" "dbname" "serverVersion" "serverType" "resTotal"

[6] "resOpen" "prefetch" "bulk_read" "stmt_cache" "results"

> dbInfo <- dbGetInfo(con)

> dbInfo

$username

[1] "dmuser"

$dbname

[1] "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521))(CONNECT_DATA=(SID=orcl)))"

$serverVersion

[1] "11.2.0.3.0"

$serverType

[1] "Oracle RDBMS"

$resTotal

[1] 1

$resOpen

[1] 1

$prefetch

[1] FALSE

$bulk_read

[1] 1000

$stmt_cache

[1] 0

$results

$results[[1]]

Statement:           select view_name from user_views

Rows affected:       0

Row count:           6

Select statement:    TRUE

Statement completed: TRUE

OCI prefetch:        FALSE

Bulk read:           1000


Closing the DB Connection

In this section some commands are given that allows you to see some of the details of the Oracle driver, to Commit any changes to the database, to free up the resources being held by result set of the query we ran in a previous step and then to close the connection to the schema.

> # Free up the resources and disconnect from the database

> summary(drv)

Driver name: Oracle (OCI)

Driver version: 1.1-11

Client version: 11.2.0.3.0

Connections processed: 3

Open connections: 2

Interruptible: FALSE

> dbCommit(con)

[1] TRUE

> dbClearResult(rs)

[1] TRUE

> dbDisconnect(con)

[1] TRUE


Performance evaluation

Mark Hornick of Oracle has written a blog post on the performance differences between RODBC, RJDBC and ROracle. He compares the performance of reading from tables with various numbers of attributes and various volumes of records. Check out his blog post here. The following image is taken from this blog post and illustrates the performances.

ROraclePerfomance

You can access the R CRAN ROracle Package Documentation here, and the ROracle documentation/webpage here.

Thursday, May 1, 2014

Oracle Magazine-July/August 2000

The headline articles of Oracle Magazine for July/August 2000 were on business intelligence, architectures for BI and how companies like NetFlix m drug-store.com and health insurances companies are using BI to better understand their customers.

2000 July Aug Other articles included:
  • Tom Kyte has an article on Back to Basic for DBAs to ensure robust performance and scalability. He looks at sizing and some of the different aspects involved in this, some of the hot backup methods and the resizing redo log files.
  • Oracle Jdeveloper 3.1 is released and there is a new component of the Oracle XML Developer's Kit (Oracle XDK), which offers end-to-end XML support and provides developers with an integrated, full-featured application development tool for delivering business-to-business applications in Java and XML.
  • Oracle and IBM plan to offer Oracle Internet Directory within AIX and Monterey/64 as a directory option to customers.
  • Over 5 pages of press releases from Oracle Partners.
  • The Common Warehouse Model (CWM) is an open standard by which DWs can share meta-data. Oracle the the OMG are working on defining a standard for sharing metadata.
  • Kevin Loney has an article on transportable tablespaces in Oracle 8i. He discusses issues around generating a transportable tablespace set, plugging in the transportable tablespace set, some management issues and some things to consider with partitions
  • The is a modelling article that looks at the differences between using Relational and an Object-Based model. Taking a real world example the article walks through how to use the relation and then the object model.
  • Richard Niemiec gives the second part of his article on Tuning Problems Queries. In this article he covers using hints, using joins effectively, using the parallel query option and using some mathematical techniques to anticipate query performance.
  • Jonathan Gennick outlines a 6 step incremental approach to developing good SQL queries
  • The final article was an interview with Ari Kaplan who was at the time and Oracle DBA, Oracle book author and talks about his analytics projects with a number of baseball clubs.

To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.