Monday, May 12, 2014

Getting Started with ROracle

There are many different ways for you to connect to a database using R. You can setup an RODBC connection, use RJDBC, use Oracle R Enterprise (ORE), etc. But if you are an Oracle user you will want to be able to connect to your Oracle databases and be able to access the data as quickly as possible.

The problem with RODBC and RJDBC connections is that they are really designed to process small amounts of data. As your database and data grows, particularly in the Big Data World then using these type of connections soon become a bottleneck. Another alternative is to use Oracle R Enterprise, but if you do then you have to pay extra licence fees. Again this may not be an option.

An alternative is to use the ROracle package. This R package that is supplied by Oracle, for FREE!!!, allows you to setup connections that utilise the Oracle Client software that you will have installed on your client PCs/laptops etc. Because it utilises the communication technology of Oracle Client you are going to get really good performance and opens up the possibility of processing your Big Data is a reasonable amount of time.

The following steps brings your through the various steps involved in getting ROracle installed, how to connect to the database and how to execute some simple commands. At the end of the post I will point you towards some performance evaluations that have been conducted comparing ROracle to other connection methods.

Installing ROracle (on Client and on the Server)

The first step you need to perform is to install ROracle. If you are installing this on your client machine then you can install it into your R directory. If you are installing ROracle on your server and you have ORE already installed then you can install it in the ORE directory. If you do not have ORE installed on your server but you have R installed then install ROracle in your R directory on the server.

To install ROracle you can run the following command:

install.packages("ROracle") Or select the Install Packages menu option from the R Gui menu or from the RStudio menu. You will get prompted for the R home to install the ROracle package in. ROracle1

You may get some warning messages about some other packages and if the ROracle package was compiled using a slightly different version of R. These are just warning messages and everything should work OK. If you get an error message then you will need to check out what is causing it.

As part of the process the R process will be restarted.


Connecting to your Oracle Database

Now that you have ROracle installed the next step is to test that you can connect to your database. The following commands loads the ROracle package, defines the Oracle driver, sets-up the connection information and then establishes the connection.

> library(ROracle)

> drv <- dbDriver("Oracle")

> # Create the connection string

> host <- "localhost"

> port <- 1521

> sid <- "orcl"

>connect.string <- paste(

>    "(DESCRIPTION=",

>    "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",

>    "(CONNECT_DATA=(SID=", sid, ")))", sep = "")

con <- dbConnect(drv, username = "dmuser", password = "dmuser",dbname=connect.string)

At this point you are now connected to the DMUSER schema in the Oracle database.


Issuing a Query and processing the results

How we can run our queries on objects in our schema (DMUSER). The following commands sends a query to the schema (to bring back all the view names) and returns the results into rs. The contents of rs is then mapped into data. The dim(data) command returns the number of records in the result set (in the example this is 6) and the number of columns (1 in our example).

> rs <- dbSendQuery(con, "select view_name from user_views")

> # fetch records from the resultSet into a data.frame

> data <- fetch(rs)

> # extract all rows

> dim(data)

[1] 6 1

> data

                  VIEW_NAME

1       MINING_DATA_APPLY_V

2       MINING_DATA_BUILD_V

3        MINING_DATA_TEST_V

4  MINING_DATA_TEXT_APPLY_V

5  MINING_DATA_TEXT_BUILD_V

6   MINING_DATA_TEXT_TEST_V


Checking Query meta-data

ROracle allows us to find out information or meta-data regarding the execution of the query. The following commands list the various meta-data available for the query, then gets the meta-data for the query we ran in the above step (results are in rs) and then displays the meta-data for the rs query results.

> # Get the meta-data about the query

> names(dbGetInfo(rs))

[1] "statement" "isSelect" "rowsAffected" "rowCount" "completed" "prefetch"

[7] "bulk_read" "fields"

> rsInfo <- dbGetInfo(rs)

> rsInfo

$statement

[1] "select view_name from user_views"

$isSelect

[1] TRUE

$rowsAffected

[1] 0

$rowCount

[1] 6

$completed

[1] TRUE

$prefetch

[1] FALSE

$bulk_read

[1] 1000

$fields

name Sclass type len precision scale nullOK

1 VIEW_NAME character VARCHAR2 30 0 0 FALSE


Checking Database meta-data

You can also get some meta-data about your connection to the database. The following commands lists the meta-data available for a connection, gets the meta-data for the connection and then displays the meta-data for the current connection (con).

> # Get the meta-data about the connection ot the database

> names(dbGetInfo(con))

[1] "username" "dbname" "serverVersion" "serverType" "resTotal"

[6] "resOpen" "prefetch" "bulk_read" "stmt_cache" "results"

> dbInfo <- dbGetInfo(con)

> dbInfo

$username

[1] "dmuser"

$dbname

[1] "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521))(CONNECT_DATA=(SID=orcl)))"

$serverVersion

[1] "11.2.0.3.0"

$serverType

[1] "Oracle RDBMS"

$resTotal

[1] 1

$resOpen

[1] 1

$prefetch

[1] FALSE

$bulk_read

[1] 1000

$stmt_cache

[1] 0

$results

$results[[1]]

Statement:           select view_name from user_views

Rows affected:       0

Row count:           6

Select statement:    TRUE

Statement completed: TRUE

OCI prefetch:        FALSE

Bulk read:           1000


Closing the DB Connection

In this section some commands are given that allows you to see some of the details of the Oracle driver, to Commit any changes to the database, to free up the resources being held by result set of the query we ran in a previous step and then to close the connection to the schema.

> # Free up the resources and disconnect from the database

> summary(drv)

Driver name: Oracle (OCI)

Driver version: 1.1-11

Client version: 11.2.0.3.0

Connections processed: 3

Open connections: 2

Interruptible: FALSE

> dbCommit(con)

[1] TRUE

> dbClearResult(rs)

[1] TRUE

> dbDisconnect(con)

[1] TRUE


Performance evaluation

Mark Hornick of Oracle has written a blog post on the performance differences between RODBC, RJDBC and ROracle. He compares the performance of reading from tables with various numbers of attributes and various volumes of records. Check out his blog post here. The following image is taken from this blog post and illustrates the performances.

ROraclePerfomance

You can access the R CRAN ROracle Package Documentation here, and the ROracle documentation/webpage here.

No comments:

Post a Comment