Wednesday, July 30, 2014

Tokenizing a String : Using Regular Expressions

In my previous blog post I gave some PL/SQL that performed the tokenising of a string. Check out this blog post here.

Thanks also to the people who sent me links examples of how to tokenise a string using the MODEL clause. Yes there are lots of examples of this out there on the interest.

While performing the various searches on the internet I did come across some examples of using Regular Expressions to extract the tokens. The following example is thanks to a blog post by Tanel Poder

I've made some minor changes to it to remove any of the special characters we want to remove.

column token format a40

define separator=" "

define mystring="$My OTN LA Tour (2014?) will consist of Panama, CostRica and Mexico."

define myremove="\?|\#|\$|\.|\,|\;|\:|\&|\(|\)|\-";

 

SELECT regexp_replace(REGEXP_REPLACE(

                 REGEXP_SUBSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL )

                           , '&separator$', ''), '&myremove', '') TOKEN

FROM

    DUAL

CONNECT BY

    REGEXP_INSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL ) > 0

ORDER BY

LEVEL ASC

/

When we run this code we get the following output.

Token fun 3

So we have a number of options open to use to tokenise strings using SQL and PL/SQL, using a number of approaches including substring-ing, using pipelined functions, using the Model clause and also using Regular Expressions.

Monday, July 28, 2014

BUCKET_WIDTH: Calculating the size of the bucket

Some time ago I had some blog posts introducing some of the basic Statistical function available in Oracle. Here are the links to these.

Most people do not realise that Oracle has over 250+ statistical functions that are available (no addition cost) in all the database versions.

I've had a query about one of the functions BUCKET_WIDTH. The question was wondering if it was possible to get the width of the bucket in each case. There does not seem to be a build in feature to get this value, so we have to calculate this ourselves.

Here is an example of how to calculate the bucket width, as on the example I used in my previous blog post.

SELECT bucket, max(age)-min(age) BUCKET_WIDTH, count(*)

FROM (SELECT cust_id,

                       age,

                      width_bucket(age,

                         (SELECT min(age) from mining_data_build_v),

                         (select max(age)+1 from mining_data_build_v),

                      10) bucket

          FROM mining_data_build_v

          GROUP BY cust_id, age )

GROUP BY bucket

ORDER BY bucket;

Bucket width

What this query gives is an approximate value of the size of the Bucket Width based on the values/records that are in a bucket. The actual values used cannot be determined exactly as there is not function/value in SQL that tells us the actual value.

Thursday, July 24, 2014

Tokenizing a String

Over the past while I've been working a lot with text strings. Some of these have been short in length like tweets from Twitter, or longer pieces of text like product reviews. Plus others of various lengths.

In all these scenarios I have to break up the data into individual works or Tokens.

The examples given below illustrate how you can take a string and break it into its individual tokens. In addition to tokenising the string I've also included some code to remove any special characters that might be included with the string.

These include ? # $ . ; : &

This list of special characters to ignore are just an example and is not an exhaustive list. You can add whatever characters to the list yourself. To remove these special characters I've used regular expressions as this seemed to be the easiest way to do this.

Using PL/SQL

The following example shows a simple PL/SQL unit that will tokenise a string.

DECLARE

    vDelimiter VARCHAR2(5) := ' ';

    vString    VARCHAR2(32767) := 'Hello Brendan How are you today?'||vDelimiter;

    vPosition   PLS_INTEGER;

    vToken     VARCHAR2(32767);

    vRemove    VARCHAR2(100) := '\?|\#|\$|\.|\,|\;|\:|\&';

    vReplace   VARCHAR2(100) := '';

BEGIN

    dbms_output.put_line('String = '||vString);

    dbms_output.put_line('');

    dbms_output.put_line('Tokens');

    dbms_output.put_line('------------------------');

   

    vPosition := INSTR(vString, vDelimiter);

   

    WHILE vPosition > 0 LOOP

   

       vToken := LTRIM(RTRIM(SUBSTR(vString, 1, vPosition-1)));

       vToken := regexp_replace(vToken, vRemove, vReplace);

   

       vString := SUBSTR(vString, vPosition + LENGTH(vDelimiter));

       dbms_output.put_line(vPosition||': '||vToken);

       vPosition := INSTR(vString, vDelimiter);

   

    END LOOP;

END;

/

When we run this (with Serveroutput On) we get the following output.

Toke PLSQL

A slight adjustment is needed to the output of this code to remove the numbers or positions of the token separator/delimiter.

Tokenizer using a Function

To make this more usable we will really need to convert this into an iterative function. The following code illustrates this, how to call the function and what the output looks like.

CREATE OR replace TYPE token_list

AS TABLE OF VARCHAR2(32767);

/

 

CREATE OR replace FUNCTION TOKENIZER(pString IN VARCHAR2,

                                     pDelimiter IN VARCHAR2)

RETURN token_list pipelined

AS

    vPosition       INTEGER;

    vPrevPosition   INTEGER := 1;

    vRemove         VARCHAR2(100) := '\?|\#|\$|\.|\,|\;|\:|\&';

    vReplace        VARCHAR2(100) := '';

    vString         VARCHAR2(32767) := regexp_replace(pString, vRemove, vReplace);

BEGIN

    LOOP

       vPosition := INSTR (vString, pDelimiter, vPrevPosition);

       IF vPosition = 0 THEN

          pipe ROW (SUBSTR(vString, vPrevPosition ));

          EXIT;

       ELSE

          pipe ROW (SUBSTR(vString, vPrevPosition, vPosition - vPrevPosition ));

          vPrevPosition := vPosition + 1;

       END IF;

    END LOOP;

END TOKENIZER;

/

Here are a couple of examples to show how it works and returns the Tokens.

SELECT column_value TOKEN

FROM TABLE(tokenizer('It is a hot and sunny day in Ireland.', ' '))

, dual;

Token fun 1

How if we add in some of the special characters we should see a cleaned up set of tokens.

SELECT column_value TOKEN

FROM TABLE(tokenizer('$$$It is a hot and sunny day in #Ireland.', ' '))

, dual;

Token fun 2

Thursday, July 17, 2014

OTN Latin America (North) Tour 2014

For a few years now I (and I'm sure you have too) have heard about and followed the various Oracle User Group tours that OTN arranges/facilitates. A tour consists of a number of Oracle User Groups in a region coordinating together to have their conferences organised so that they can get speakers from across the world to come and present.

For most presenters it involves lots of travel. So instead of them doing all that travelling to present at one conference, they can now extend their travels a little and present in a number of countries. Most of the speakers are Oracle ACE Directors and OTN is very generous with their support in that they pay for all the flights, transportation and hotels. Without the generous support of OTN these tours and perhaps many of the conference would not take place.

With envy I used to follow the various speakers on tweeter as they talked about their travels from country to country and their experiences of meeting the people and exploring the various countries. Yes their time in each country seemed to be limited but they always got to see and do so much.

Earlier this year there was an call for presentations for the various OTN Tours in 2014. I submitted 3 presentations that coverd Oracle Advanced Analytics Option (Oracle Data Mining and Oracle R Enterprise). I thought I didn't stand a chance given the speakers that have participated in previous years.

A couple of weeks ago I received an email saying that I had been accepted onto the OTN Latin America (North) Tour. So you can imagine my excitement. The full OTN Tour North leg covers a number of countries across central and south America and is over a 2 week period. Unfortunately I'm not able to be away for that long, so I was accepted for the conferences on the first week of the tour. This will include Panama, Costa Rica and Mexico :-)

Some of you might think this is a bit of a golly and a holiday. What I've discovered over the past week or more is that it will be far from that. There is a lot of work in preparing the presentations, giving the presentation, setting up live demos between presentation, various meetings with people at the conferences etc etc etc. Then there is all the travel, all the airports, all the airport transfers, all the overnights in hotels. Over the course of 7 days I will be staying 6 different hotels.

I have spent the last week just trying to arrange my flights and hotels. This also involved trying to coordinate with other speakers so that we can travel together as much as possible.

Here are the dates and the presentations that I will be giving at these conferences:

4th August : Panama (in Panama City)

     10:00-11:00 : Getting Started with Oracle Data Miner & Predictive Analytics

     11:00-12:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

     13:00-13:40 : Sentiment Analysis Using Oracle Data Mining

6th August : Costa Rica (in San Carlos)

     10:00-11:00 : Getting Started with Oracle Data Miner & Predictive Analytics

     13:00-14:00 : Sentiment Analysis Using Oracle Data Mining

     16:00-17:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

8th August : Mexico (in Mexico City)

     14:00-15:00 : Getting Started with Oracle Data Miner & Predictive Analytics

     15:00-16:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

When the agenda for the conferences are available I will have another blog post with their details.

If you are at one of these conference do please say hello :-)

I've finally booked all my flights and hotels. Many thanks to my fellow ACE Director presenters for your research and sharing of travel plans. It looks like there will be a groups of us all travelling together.

Now the next challenge is to prepare the presentations and live demos (yes live demos).

I hope to blog about each of the conferences and my travels to/from each country. It really depends on what time I will have and access to the internet. Perhaps this is something I will try to do on my various plane flights or waiting at the airports. So watch out for these :-)


Updated with some stats on my travels

My travel plans for the OTN Latin America tour of user group conferences involves

  • 12,200 flying miles,
  • 29.75 of flying time,
  • way too many hours hanging around in airports
  • over 8 days
  • staying in 6 hotels
  • plus 1 over night flight,
  • giving 8 hours of presentations in 3 countries

Why do we do this? Because we love sharing with the Oracle User Groups around the world. I'm only doing 1 week of the tour. Some people are doing 2 weeks :-(

Monday, June 23, 2014

Oracle Magazine September/October 2000

The headline articles of Oracle Magazine for September/October 2000 were on e-Business Integration, including online healthy prescription for online retailing, streamlineing the pulp and fiber industries, and the health care industry. Plus there was lots and lots of articles and news items all on businesses delivering solutions via the internet.


Ora Mag 2000 Sept Oct

As this was the Oracle Open World edition (and you see the label on the cover saying Biggest Ever) you can imagine there was a LOT of advertisements and sponsored articles. The following of other articles below will not cover these and will only look at the main content articles.

Other articles included:

  • Tom Kyte's article is on Tips for Migrating, Indexing and Using Packaged Procedures. In his article he gives some tips for migrating to Oracle 8.1i. He also discusses some scenarios around creating (or not) indexes on foreign keys. He also looks at the scenario of compiling linked procedures and how the use of packages avoids the identified issues.
  • Do you remember the Internet File System. There was an article that gave an overview of this that was available in Oracle 8i and was capable of managing over 150 different file types.
  • Autodesk releaseed OnSite, an enterprise solution for bringing design and location based information to the point of work via mobile devices. Autodesk On Site used Oracle 8i Lite and the Palm OS platform to provide an interactive, two way communication environment between the mobile worker and the overall decision support system.
  • The Oracle Academic Initiative began in 1997. In 2000 Oracle donated software licences, support services and Oracle training material to 17 educational institutions valued at $60 million
  • There was page after page, after page of announcements and news from various Oracle Partners.
  • Douglas Scherer gives the first part of an article that looks at how you can use Oracle 8i interMedia for managing and deploying content rich data on the internet.
  • Managing Your Resources looks at some of the new Oracle 8i EE helps DBAs to define plan, assign users to groups and prioritise resource allocations.
  • With the release of Oracle 8.1.6 came the new Statspack. Connie Dialeris and Graham Wood give an overview of the main features of Statspack, providing some guidance on how to use it in a proactive manner and gives a step-by-step guide to how you can trouble shoot performance problems with Statspack.
  • The final article was on Oracle Warehouse Builder (OWB). This was an overview type of article and gave an overview of the main components and gives some guidelines for setting up some different types of integration.

To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

Thursday, June 12, 2014

ORE Getting Connected: ore.connect and other commands

After you have installed I ORE on your client and server (see my previous blog posts on these), you are not ready to start getting your hand dirty with ORE. During the installation of ORE you setup a test schema in your database to test that you can make a connection. But you will not be using this schema for you ORE work. Typically you will want to use one of your existing schemas that contains the data in the relevant tables and views. Plus you will want to be able to create some tables, creates some ORE temporary objects and stores, and to be able to store some of you ORE scripts in the schema that contains the data. In a previous blog post I gave some steps needed to setup your schema to be able to use ORE and the embedded feature.

In this post I want to show you some of the commands you will need to use to get connected to your Oracle schema and some initial commands you will need to know an use.

ore.connect command

The first command that you need to use is the ore.connect command that allows you to establish a connection to a database schema. You can also use this command to connect to HIVE tables in a Hadoop cluster.

The general syntax of the ore.connect function is

ore.connect(user = "", sid = "", host = "localhost", password = "",

            port = 1521, service_name = NULL, conn_string = NULL,

            all = FALSE, type = c("ORACLE", "HIVE"))

From this list of available parameters you only need to specify some of them. For a basic connection you do not need to specify the conn_string, type and all. The following is an example of using this connection string to connect to a schema called DMUSER that is located in a database whose service_name is ORCL and the host is the localhost.

> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=TRUE);

> ore.ls()

[1] "DEMO_R_APPLY_RESULT" "DEMO_R_TABLE" "DEMO_SUBSET_TABLE" "INSUR_CUST_LTV_SAMPLE"

[5] "MINING_DATA_APPLY" "MINING_DATA_APPLY_V" "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V"

[9] "MINING_DATA_TEXT_APPLY_V" "MINING_DATA_TEXT_BUILD_V" "MINING_DATA_TEXT_TEST_V"

I generally explicitly include the all=TRUE. The reason for this might become clear below when I show the alternative.

When you us the all=TRUE, the ore.connect function will also run the ore.sync and a ore.attach functions. This will result in synchronizing and attaching all tables and views in the ORE schema. The amount of time to run ore.sync grows linearly with the number of visible tables and views.

If you already have an ORE connection open and you try to establish a new ORE connection then your already existing ORE connection will be automatically disconnected. So you will need to be careful with the sequencing of your ORE code.

You schema might have lots and lots of object. As you work on building your advanced analytics environment you will end up build many more objects. You can imagine that over time every time you establish a connection it will start to take longer and longer. The following commands creates a connection to the schema, but this time it does not sync or attach the database objects, as shown using the ore.ls function.

> ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=FALSE);

> ore.ls()

character(0)

If you use the all=FALSE like is shown in the above example you will need to issue a ore.sync function to synchronise the meta-data and then the ore.attach function to add the synchronised objects to the search space of the local R environment.


is.ore.connected

You can use this command to check that you are connected to your ORE connection is live or not. As you develop your ORE scripts you might build in various connection and disconnections. This command is very useful to check your current status. If you have an ORE connection then you will get a response of TRUE. If you don't have an open ORE connection then you will get a response of FALSE.

> ore.is.connected()

[1] FALSE

After we establish our ORE connection the next time we run this command

> ore.is.connected()

[1] TRUE


The following commands can be used to check to see if we have a connection and if not then establish a connection and list the objects in the schema.

> if (!ore.is.connected())

ore.connect(user="dmuser", sid="orcl", host="localhost", password="dmuser", port=1521, all=TRUE)

> ore.ls()

[1] "DEMO_R_APPLY_RESULT" "DEMO_R_TABLE" "DEMO_SUBSET_TABLE"

[4] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_APPLY" "MINING_DATA_APPLY_V"

[7] "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V" "MINING_DATA_TEXT_APPLY_V"

[10] "MINING_DATA_TEXT_BUILD_V" "MINING_DATA_TEXT_TEST_V"

> ore.disconnect()

NOTE: Some of the documentation seems to refer to the command is.ore.connected(). This generates and error and seems to be an error in some of the ORE documentation.


ore.ls()

The ore.ls() command is used to find out what objects you have in your schema. The objects that it will list are all the tables and views that are defined in the schema.

> ore.ls()

[1] "DEMO_R_APPLY_RESULT" "DEMO_R_TABLE" "DEMO_SUBSET_TABLE"

[4] "INSUR_CUST_LTV_SAMPLE" "MINING_DATA_APPLY" "MINING_DATA_APPLY_V"

[7] "MINING_DATA_BUILD_V" "MINING_DATA_TEST_V" "MINING_DATA_TEXT_APPLY_V"

[10] "MINING_DATA_TEXT_BUILD_V" "MINING_DATA_TEXT_TEST_V"


ore.sync

The ore.sync() command is used to synchronise the meta-data about the objects in the oracle schema with the ORE environment. When no parameters are used in the command then all the meta-data for the tables and views are synchronised.

> ore.sync()

There are a a few variants for this command. These can include a specified list of tables and specifying a schema. These become increasingly important as the number of objects (tables and views) increase in your schema. Realistically as your analytical environment grows so will the number of objects. Therefore it the length of time it takes the ore.sync() command to run will start to take longer and long. So instead of synchronising all of the object, you can only synchronising the objects that you need.

The following command synchronises the meta-data for the listed tables.

ore.sync(table = c("MINING_DATA_BUILD_V", "MINING_DATA_TEST_V", "INSUR_CUST_LTV_SAMPLE"))

If you want to synchronise objects from another schema you can specify the schema name. Only the objects that you have privileges on will be synchronised.

ore.sync("SH")


ore.attach

After you have run the ore.sync() command then you can add the objects that were synchronised to the search path of what R objects you can access and use. To do this you need to run the ore.attach() command.

> ore.sync()

> ore.attach()

The ORE objects are added at position 2. This means that they are listed after the local R workspace objects and before all previously attached packages and environment. You cannot change the position to be 1, but you can change it to 3

> ore.attach("DMUSER", 3)


NOTE: If you use the all=TRUE in your ore.connect command than the connection will automatically execute the ore.sync() and the ore.attach() commands.


ore.rm

The ore.rm() command can be used to remove an object from the R search space. This does not remove or delete the object from your ORE schema.

> ore.rm("MINING_DATA_BUILD_V")


Disconnecting & Other things

ore.disconnect

After you have finished all your ORE work you will need to disconnect from your schema. To do this you can use the ore.disconnect function. As part of the ore.disconnect function all temporary objects created during the session will be removed/deleted from the database / your schema. The ore.disconnect function does return a value. If you are unsure if the disconnect has worked then you can use the ore.is.connected command.

> ore.disconnect()

> ore.is.connected()

[1] TRUE

If you exit from your R session or application and implicit ore.disconnect will be issued. But it is always good practice to issue this command yourself.

OREShowDoc()

The OREShowDoc() command will open up a web browser and will display the home page of the ORE documentation. This will be the ORE documentation on you machine. My test machine is a VM with ORE installed on it, so the home page for the will be based on ORE installed directories on the server. My laptop is a Mac and ORE is currently not supported on the Mac, so I was not able to test how this works on the client.

> OREShowDoc

OREShowDoc

Tuesday, June 10, 2014

OUG Finland (Harmony) 2014

I'm back a few days now after an eventful OUG Finland Conference. It was a great 2 days of Oracle techie stuff in one of the best conference locations in the world.

OUGF view

The conference kind of started on the Wednesday. It seemed like most of the speakers from across Europe and USA were getting into Helsinki around lunch time. Heli had arranged to be a tour guide for the afternoon and took us to see some of the sights around Helsinki, ending up at dinner at a Viking restaurant. At this we got to meet up with some of the other speakers.

We got to try the local Tar ice-cream. Let's just say it was an experience :-)

2014 06 04 21 52 23

The conference kicked off on the Thursday morning at a nature reserve (Haltia) about 30 minutes outside of Helsinki. I'm sure you have seem some of the photos that all of us were sharing on twitter. Heli did the opening welcome and got to show off how good her English is and how easily she can switch between Finnish and English (you had to be there). Each day consisted of 6 tracks and we also had some keynotes too. I think the highlight for most people on the first day was the keynote by James Morle. If you were not there, then hopefully he will be invited to give it at other conferences around the world.

Google Made Me Stupid

I did my usual thing of tweeting throughout the conference along with most of the other presenters, ACEs and ACE Directors.

My first presentation of the conference was my Sentiment Analysis using Oracle Data Mining. I had a decent enough attendance at this considering the glorious sunshine we had and also who was also presenting at the same time.

That evening there was the conference BBQ and we were entertained by the a band consisting of Oracle Finland employees. When all of the entertainment was finished it was time for some people to get the bus back to their hotel in Helsinki. But for 15 of us we headed off for an overnight camping in the local forest. Many thanks to Heli and Ann for arranging this and Ollie for setting up the camp and looking after us. The camp was a bit of fun but the mosquitoes were a bit of a issue for me. Lets just say 'They love me'

2014 06 06 06 58 072014 06 05 23 07 062014 06 05 23 06 37

Yes the safety briefing did talk about bears

2014 06 05 22 25 15

The next morning we got back in to the conference centre. After the keynote from Graham Wood (who was one of the campers), I had my second presentation on using ORE and how to get started with it. I was very surprised with the attendance, in that there was perhaps twice the number at it that I was hoping to have. This presentation ran on a little longer than hoped and it is more suited to a 1 hour slot than a 45 minute slot.

Me presenting ougf14

After that my presenting duties were over and I got to enjoy many of the presentations during the rest of the day. One of these was the Standard Edition Round table organised by Philippe Fierens, Jan Karrmans and Ann Sjokvit. There was lots of great discussion at this round table and I really hope they get to host this at other conferences.

Did I mention I got to meet Chris Date. Did I mention I got to meet Chris Date :-)

Me and Chris Date

As the conference came to a close the committee made sure that everyone had a small gift before going home. For all of those that were left waiting for the bus back into Helsinki there was one last (or several) jumps to do. Again the photos are on twitter, etc.

This was a great conference that had attendees (not just speakers) from a number of countries across Europe. I enjoyed the conference and getting to meet some old and new friends. Hopefully next year I will be able to fit OUG Finland into my calendar and so should you.

As Heli says "I'm Finnish and this is the end"

Monday, June 2, 2014

ore.parallel

In ORE there are a number ways to get you R scripts to run in parallel in the database. One way is to enable the Parallel option in ORE. This is what will be shown in this post. There are other methods of running various ORE commands/scripts in parallel. With these the scripts are divided out and several parallel R processes are started on the server.

But what if you want to use the database parallel feature on some of your ORE other commands?

Why would you want to do this?

Well the main answer is that you might want to use the parallel option of the database for the creation on objects (tables etc) and for selecting and manipulating the data in the database.

How can you enable your ORE connection to use the in-database parallel feature?

ORE 1.4 has a new option that enables the parallel option for your ORE connection in the database. This option is called ore.parallel.

When you enable or set the ore.parallel option, it seems to be the equivalent of running the following:

ALTER SESSION ENABLE PARALLEL DDL;

ALTER SESSION ENABLE PARALLEL DML;

ALTER SESSION ENABLE PARALLEL QUERY;

The exact details is a little unclear, but it seems to be above commands.

The following commands illustrates some options for using the ore.parallel option.

> #

> # Check to see if the ore.parallel is enabled for your ORE connection

> options("ore.parallel")

$ore.parallel

NULL

The NULL returned value tells us that your ORE connections does not have the Parallel option enabled. If the schema had Parallel enabled by default then we would have have a response of TRUE.

The following command turns on the Parallel option for your ORE connection / schema.

> options("ore.parallel" = TRUE)

> options("ore.parallel")

$ore.parallel

[1] TRUE

When the Parallel option is enabled (TRUE above) the database will use the degree of parallel that is set as default for the schema or the degree of parallel that is defined for the table when it is being used in your ORE commands.

You can changed the degree of parallelism by passing the required degree as a value to the ore.parallel command. In the following, the degree of parallelism is set to 8. We then as ORE what the degree is set to and it tells us that it is 8. So it was set correctly.

> options("ore.parallel" = 8)

> options("ore.parallel")

$ore.parallel

[1] 8

Monday, May 26, 2014

Oracle R Enterprise (ORE) Tasks for the Oracle DBA

In previous posts I gave the steps required to install Oracle R Enterprise on your Database server and your client machine.

One of the steps that I gave was the initial set of Database privileges that the DB needed to give to the RQUSER. The RQUSER is a little bit like the SCOTT/TIGER schema in the Oracle Database. Setting up the RQUSER as part of the installation process allows you to test that you can connect to the database using ORE and that you can issue some ORE commands.

After the initial testing of the ORE install you might consider locking this RQUSER schema or dropping it from the Database.

So when a new ORE user wants access to the database what steps does the DBA have to perform.

  1. Create a new schema for the user
  2. Grant the new schema the standard set of privileges to connect to the DB, create objects, etc.
  3. Create any data sets in their schema
  4. Create any views to data that exists in other schemas (and grant the necessary privileges, etc

Now we get onto the ORE specific privileges. The following are the minimum required for your user to be able to connect to their Oracle schema using ORE.

GRANT CREATE TABLE TO RQUSER;

GRANT CREATE PROCEDURE TO RQUSER;

GRANT CREATE VIEW TO RQUSER;

GRANT CREATE MINING MODEL TO RQUSER;

In most cases the first 3 privileges (TABLE, PROCEDURE and VIEW) will be standard for most schemas that you will set up. So in reality the only command or extra privilege that you will need to execute is:

GRANT CREATE MINING MODEL TO RQUSER;

This command will allow the user to connect to their Oracle schema using ORE, but what it will not allow them to do is to create any embedded R. These are R scripts that are stored in the database and can be called in their R/ORE scripts or by using the SQL API to R (I'll have more blog posts on these soon). To allow the user to create and use embedded R the DBA will also have to grant the following privilege as SYS:

GRANT RQADMIN to RQUSER;

To summarise the DBA will have to grant the following to each schema that wants to use the full power of ORE.

GRANT CREATE MINING MODEL TO RQUSER;

GRANT RQADMIN to RQUSER;

A note of Warning: Be careful what schemas you grant the RQADMIN privilege to. It is a powerful privilege and opens the database to the powerful features of R. So using the typical DBA best practice of granting privileges, the DBA should only grant the RQADMIN privilege to only the people who require it.

Monday, May 12, 2014

Getting Started with ROracle

There are many different ways for you to connect to a database using R. You can setup an RODBC connection, use RJDBC, use Oracle R Enterprise (ORE), etc. But if you are an Oracle user you will want to be able to connect to your Oracle databases and be able to access the data as quickly as possible.

The problem with RODBC and RJDBC connections is that they are really designed to process small amounts of data. As your database and data grows, particularly in the Big Data World then using these type of connections soon become a bottleneck. Another alternative is to use Oracle R Enterprise, but if you do then you have to pay extra licence fees. Again this may not be an option.

An alternative is to use the ROracle package. This R package that is supplied by Oracle, for FREE!!!, allows you to setup connections that utilise the Oracle Client software that you will have installed on your client PCs/laptops etc. Because it utilises the communication technology of Oracle Client you are going to get really good performance and opens up the possibility of processing your Big Data is a reasonable amount of time.

The following steps brings your through the various steps involved in getting ROracle installed, how to connect to the database and how to execute some simple commands. At the end of the post I will point you towards some performance evaluations that have been conducted comparing ROracle to other connection methods.

Installing ROracle (on Client and on the Server)

The first step you need to perform is to install ROracle. If you are installing this on your client machine then you can install it into your R directory. If you are installing ROracle on your server and you have ORE already installed then you can install it in the ORE directory. If you do not have ORE installed on your server but you have R installed then install ROracle in your R directory on the server.

To install ROracle you can run the following command:

install.packages("ROracle") Or select the Install Packages menu option from the R Gui menu or from the RStudio menu. You will get prompted for the R home to install the ROracle package in. ROracle1

You may get some warning messages about some other packages and if the ROracle package was compiled using a slightly different version of R. These are just warning messages and everything should work OK. If you get an error message then you will need to check out what is causing it.

As part of the process the R process will be restarted.


Connecting to your Oracle Database

Now that you have ROracle installed the next step is to test that you can connect to your database. The following commands loads the ROracle package, defines the Oracle driver, sets-up the connection information and then establishes the connection.

> library(ROracle)

> drv <- dbDriver("Oracle")

> # Create the connection string

> host <- "localhost"

> port <- 1521

> sid <- "orcl"

>connect.string <- paste(

>    "(DESCRIPTION=",

>    "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",

>    "(CONNECT_DATA=(SID=", sid, ")))", sep = "")

con <- dbConnect(drv, username = "dmuser", password = "dmuser",dbname=connect.string)

At this point you are now connected to the DMUSER schema in the Oracle database.


Issuing a Query and processing the results

How we can run our queries on objects in our schema (DMUSER). The following commands sends a query to the schema (to bring back all the view names) and returns the results into rs. The contents of rs is then mapped into data. The dim(data) command returns the number of records in the result set (in the example this is 6) and the number of columns (1 in our example).

> rs <- dbSendQuery(con, "select view_name from user_views")

> # fetch records from the resultSet into a data.frame

> data <- fetch(rs)

> # extract all rows

> dim(data)

[1] 6 1

> data

                  VIEW_NAME

1       MINING_DATA_APPLY_V

2       MINING_DATA_BUILD_V

3        MINING_DATA_TEST_V

4  MINING_DATA_TEXT_APPLY_V

5  MINING_DATA_TEXT_BUILD_V

6   MINING_DATA_TEXT_TEST_V


Checking Query meta-data

ROracle allows us to find out information or meta-data regarding the execution of the query. The following commands list the various meta-data available for the query, then gets the meta-data for the query we ran in the above step (results are in rs) and then displays the meta-data for the rs query results.

> # Get the meta-data about the query

> names(dbGetInfo(rs))

[1] "statement" "isSelect" "rowsAffected" "rowCount" "completed" "prefetch"

[7] "bulk_read" "fields"

> rsInfo <- dbGetInfo(rs)

> rsInfo

$statement

[1] "select view_name from user_views"

$isSelect

[1] TRUE

$rowsAffected

[1] 0

$rowCount

[1] 6

$completed

[1] TRUE

$prefetch

[1] FALSE

$bulk_read

[1] 1000

$fields

name Sclass type len precision scale nullOK

1 VIEW_NAME character VARCHAR2 30 0 0 FALSE


Checking Database meta-data

You can also get some meta-data about your connection to the database. The following commands lists the meta-data available for a connection, gets the meta-data for the connection and then displays the meta-data for the current connection (con).

> # Get the meta-data about the connection ot the database

> names(dbGetInfo(con))

[1] "username" "dbname" "serverVersion" "serverType" "resTotal"

[6] "resOpen" "prefetch" "bulk_read" "stmt_cache" "results"

> dbInfo <- dbGetInfo(con)

> dbInfo

$username

[1] "dmuser"

$dbname

[1] "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521))(CONNECT_DATA=(SID=orcl)))"

$serverVersion

[1] "11.2.0.3.0"

$serverType

[1] "Oracle RDBMS"

$resTotal

[1] 1

$resOpen

[1] 1

$prefetch

[1] FALSE

$bulk_read

[1] 1000

$stmt_cache

[1] 0

$results

$results[[1]]

Statement:           select view_name from user_views

Rows affected:       0

Row count:           6

Select statement:    TRUE

Statement completed: TRUE

OCI prefetch:        FALSE

Bulk read:           1000


Closing the DB Connection

In this section some commands are given that allows you to see some of the details of the Oracle driver, to Commit any changes to the database, to free up the resources being held by result set of the query we ran in a previous step and then to close the connection to the schema.

> # Free up the resources and disconnect from the database

> summary(drv)

Driver name: Oracle (OCI)

Driver version: 1.1-11

Client version: 11.2.0.3.0

Connections processed: 3

Open connections: 2

Interruptible: FALSE

> dbCommit(con)

[1] TRUE

> dbClearResult(rs)

[1] TRUE

> dbDisconnect(con)

[1] TRUE


Performance evaluation

Mark Hornick of Oracle has written a blog post on the performance differences between RODBC, RJDBC and ROracle. He compares the performance of reading from tables with various numbers of attributes and various volumes of records. Check out his blog post here. The following image is taken from this blog post and illustrates the performances.

ROraclePerfomance

You can access the R CRAN ROracle Package Documentation here, and the ROracle documentation/webpage here.

Thursday, May 1, 2014

Oracle Magazine-July/August 2000

The headline articles of Oracle Magazine for July/August 2000 were on business intelligence, architectures for BI and how companies like NetFlix m drug-store.com and health insurances companies are using BI to better understand their customers.

2000 July Aug Other articles included:
  • Tom Kyte has an article on Back to Basic for DBAs to ensure robust performance and scalability. He looks at sizing and some of the different aspects involved in this, some of the hot backup methods and the resizing redo log files.
  • Oracle Jdeveloper 3.1 is released and there is a new component of the Oracle XML Developer's Kit (Oracle XDK), which offers end-to-end XML support and provides developers with an integrated, full-featured application development tool for delivering business-to-business applications in Java and XML.
  • Oracle and IBM plan to offer Oracle Internet Directory within AIX and Monterey/64 as a directory option to customers.
  • Over 5 pages of press releases from Oracle Partners.
  • The Common Warehouse Model (CWM) is an open standard by which DWs can share meta-data. Oracle the the OMG are working on defining a standard for sharing metadata.
  • Kevin Loney has an article on transportable tablespaces in Oracle 8i. He discusses issues around generating a transportable tablespace set, plugging in the transportable tablespace set, some management issues and some things to consider with partitions
  • The is a modelling article that looks at the differences between using Relational and an Object-Based model. Taking a real world example the article walks through how to use the relation and then the object model.
  • Richard Niemiec gives the second part of his article on Tuning Problems Queries. In this article he covers using hints, using joins effectively, using the parallel query option and using some mathematical techniques to anticipate query performance.
  • Jonathan Gennick outlines a 6 step incremental approach to developing good SQL queries
  • The final article was an interview with Ari Kaplan who was at the time and Oracle DBA, Oracle book author and talks about his analytics projects with a number of baseball clubs.

To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

Tuesday, April 29, 2014

Installing ORE - Part C - Issue installing ORE on Windows Server

In my previous two blog posts (Part-A and Part-B) I detailed 4 steps for how you can install ORE on your servers and on your client machines.

I also mentioned a possible issue you may encounter if you try to install ORE on a Windows server. This blog post will look at this issue and how you can workaround it and get ORE installed.

The problem occurs when I when to install the ORE Supporting packages.

I was prompted to install these into a new library directory. If you get this error message then something is wrong and you should not proceed with installing these packages. If you do proceed and install them in a new library directory then they will not be seen by ORE and the database (as they were not installed in the $ORACLE_HOME/R/library) and when you go to run ORE from within R you will get errors like the following

package ‘Cairo’ successfully unpacked and MD5 sums checked

package ‘DBI’ successfully unpacked and MD5 sums checked

package ‘png’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘png’

package ‘ROracle’ successfully unpacked and MD5 sums checked

Warning: cannot remove prior installation of package ‘ROracle’

If I try the ore.connect I get the following errors.

ore.connect(user="RQUSER", sid="orcl", host="localhost", password="RQUSER", port=1521, all=TRUE)

Loading required package: ROracle

Error in .ore.oracleQuerySetup() :

ORACLE connection requires ROracle package

In addition: Warning message:

In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘ROracle’


To overcome this ORE install issue all you need to do is to close down your R Gui, then add the following lines to the Rprofile file. The Rprofile file is located in R\etc directory C:\Program Files\R\R-3.0.1\etc. Add the following lines:

# Add $ORACLE_HOME/R/library to .libPaths() for ORE packages

.libPaths("C:/app/oracle/product/11.2.0/dbhome_1/R/library")

The above line will tell R to look in or to include the R directory in the Oracle home as part of its search path. You many need to change the directory above to point to your Oracle home. When you log into the R Gui the path above will be included. Now you can install the packages and then import the packages. This time they will be installed in the $ORACLE_HOME/R/library.

When you open the R Gui and run the command to load the ORE package and to connect to your ORE schema you should not receive any error messages.

> library(ORE)

> ore.connect(user="RQUSER", sid="orcl", host="localhost", password="RQUSER", port=1521, all=TRUE)


Now you should have ORE installed and working on your Windows server.