Wednesday, December 30, 2015

ORE Video : Demo Code part 1

In a previous blog post I posted a video on using R with the Oracle Database and using Oracle R Enterprise. This is a part 1 extension of that blog post that gives the first set of demo code.

This first set of demonstration code is for using RJDBC to connect to the Oracle Database. Using RJDBC relies on using the JDBC jar file for Oracle. It is easily found in various installations of Oracle products and will be called something like ojdbc.jar. I like to take a copy of this file and place it in the root/home directory.

> library(RJDBC)
> # Create connection driver and open 
> connectionjdbcDriver <- JDBC(driverClass="oracle.jdbc.OracleDriver", classPath="c:/ojdbc6.jar")
> jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:@//localhost:1521/orcl", "dmuser", "dmuser")
> #list the tables in the schema
> #dbListTables(jdbcConnection)
> #get the DB connections details - it get LOTS of info - Do not run unless it is really needed
> dbGetInfo(jdbcConnection)
> # Query on the Oracle instance name.
> #instanceName <- dbGetQuery(jdbcConnection, "SELECT instance_name FROM v$instance")
              TABLE_NAME1 
1  INSUR_CUST_LTV_SAMPLE2            
2              OUTPUT_1_2
> #print(instanceName)tableNames <- dbGetQuery(jdbcConnection, "SELECT table_name from user_tables where  
                                                 table_name not like 'DM$%' and table_name not like 'ODMR$%'")
> print(tableNames)
> viewNames <- dbGetQuery(jdbcConnection, "SELECT view_name from user_views")print(viewNames)
1       MINING_DATA_APPLY_V
2       MINING_DATA_BUILD_V
3        MINING_DATA_TEST_V
4  MINING_DATA_TEXT_APPLY_V
5  MINING_DATA_TEXT_BUILD_V
6   MINING_DATA_TEXT_TEST_V

> v <- dbReadTable(jdbcConnection, "MINING_DATA_BUILD_V")
> names(v)
[1] "CUST_ID"                 "CUST_GENDER"             "AGE"                     
[4] "CUST_MARITAL_STATUS"     "COUNTRY_NAME"            "CUST_INCOME_LEVEL"       
[7] "EDUCATION"               "OCCUPATION"              "HOUSEHOLD_SIZE"         
[10] "YRS_RESIDENCE"           "AFFINITY_CARD"           "BULK_PACK_DISKETTES"    
[13] "FLAT_PANEL_MONITOR"      "HOME_THEATER_PACKAGE"    "BOOKKEEPING_APPLICATION”
[16] "PRINTER_SUPPLIES"        "Y_BOX_GAMES"             "OS_DOC_SET_KANJI" 
> dim(v)
[1] 1500   18
> summary(v)
    CUST_ID       CUST_GENDER             AGE        CUST_MARITAL_STATUS COUNTRY_NAME       
Min.   :101501   Length:1500        Min.   :17.00   Length:1500         Length:1500        
1st Qu.:101876   Class :character   1st Qu.:28.00   Class :character    Class :character   
Median :102251   Mode  :character   Median :37.00   Mode  :character    Mode  :character   
Mean   :102251                      Mean   :38.89                                          
3rd Qu.:102625                      3rd Qu.:47.00                                          
Max.   :103000                      Max.   :90.00                                          
CUST_INCOME_LEVEL   EDUCATION          OCCUPATION        HOUSEHOLD_SIZE     YRS_RESIDENCE    
Length:1500        Length:1500        Length:1500        Length:1500        Min.   : 0.000   
Class :character   Class :character   Class :character   Class :character   1st Qu.: 3.000   
Mode  :character   Mode  :character   Mode  :character   Mode  :character   Median : 4.000                                                                               
                                                                            Mean   : 4.089                                                                               
                                                                            3rd Qu.: 5.000                                                                               
                                                                            Max.   :14.000 
> hist(v$RESIDENCE)
> hist(v$AGE)
> dbDisconnect(jdbcConnection)

Make sure to check out the other demonstration scripts that are shown in the video.

Tuesday, December 29, 2015

Oracle R Enterprise 1.5 (new release)

The Oracle Santa had a busy time just before Christmas with the release of several new version of products. One of these was Oracle R Enterprise version 1.5.

Oracle R Enterprise (1.5) is part of the Oracle Advanced Analytics option for the enterprise edition of the Oracle Database.

As with every new release of a product there are a range of bug fixes. But with ORE 1.5 there are also some important new features. These important new features include:

  • New Random Forest specific for ORE.
  • New ORE Data Store functions and privileges.
  • Partitioning on multiple columns for ore.groupApply.
  • Multiple improvements to ore.summary.
  • Now performs parallel in-database execution for functions prcomp and svd.
  • BLOB and CLOB data types are now supported in some of the ORE functions.

Check out the ORE 1.5 Release Notes for more details on the new features.

ORE 1.5 is only certified (for now) on R 3.2.x in both the open source version and the Oracle R Distribution version 3.2.

Check out the ORE 1.5 Documentation.

You can download ORE 1.5 Server side and Client side software here.

Monday, December 21, 2015

Running R in the Oracle Database video

Earlier this year I was asked by the Business Analysics & Big Data SIG (of the UKOUG) to give a presentation on Oracle R Enterprise. Unfortunately I had already committed to giving the same presentation at the OUG Norway conference on the same day.

But then they asked me if I could record a video of the presentation and they would show it at the SIG. The following video is what I recorded.

At the UKOUG annual (2015) conferences I was supposed to give a 2 hour presentation during their Super Sunday event. Unfortunately due to a storm passing over Ireland on the Saturday all flights going to the UK were cancelled. This meant that I would miss my 2 hour presentation.

Instead of trying to find an alternative speaker for my presentation slot at such sort notice, the committee suggested that they would show the video.

Based on the feedback and the people who thanked me in person during the rest of the conference, I've decided to make it available to everyone. Hopefully you will find it useful.

The following are the links to the demo code that is shown or referred to in the video.

People have been asking me if the demo scripts I used in video are available. You will probably find some of these on various blog posts. So to make it easier for everyone I will post the demo scripts in one or more blog posts over the coming weeks. When these are available I will update this blog post with the links.

I have a few new presentations on Oracle R Enterprise in 2016 so watch out for these at an Oracle User Group conference.

Saturday, December 12, 2015

KScope 2016 Acceptances

I've never been to KScope. Yes never.

I've always wanted to. Each year you hear of all of these stories about how much people really enjoy KScope and how much they learn.

So back in October I decided to submit 5 presentations to KScope. 4 of these presentations are solo presentations and 1 joint presentation.

This week I have received the happy news that 2 of my solo presentations have been accepted, plus my joint presentation with Kim Berg Hansen.

So at the end of June 2016 I will be making my way to Chicago for a week of Oracle geekie fun at KScope.

My presentations will be:

  • Is Oracle SQL the best language for Statistic?
  • Running R in your Oracle Database using Oracle R Enterprise

and my join presentations is called

Forecasting in Oracle using the Power of SQL (this will talk about ROracle, Forecasting in R, Using Oracle R Enterprise and SQL)

I was really hoping that one of my rejected presentations would have been accepted. I really enjoy this presentation and I get to share stories about some of my predictive analytics projects. Ah well, maybe in 2017.

The last time I was in Chicago was over 15 years ago when I sent 5 days in Cellular One (The brand was sold to Trilogy Partners by AT&T in 2008 shortly after AT&T had completed its acquisition of Dobson Communications). I was there to kick off a project to build them a data warehouse and to build their first customer churn predictive model. I stayed in a hotel across the road from their office which was famous because a certain person had stayed in it why one the run. Unfortunately I didn't get time to visit downtown Chicago.

Wednesday, December 2, 2015

OUG Ireland 2015 : Call for Papers

We finally have confirmed dates for the OUG Ireland Annual Conference. It will be on the 3rd & 4th March, 2016.

In 2016 we are expanding the conference and marking it a 2 day conference. Over the past few years some of the feedback from delegates has been, can we have more sessions and add a second day. So we have listened and will now have a 2 day conference on 3rd & 4th March, 2016.

We also have a new venue too. The 2016 conference will be held in the Gresham Hotel, on O'Connell Street in central Dublin. It is so close to lots of the tourist attractions, lots of hotels and lots of entertainment.

The call for papers is now open and we are looking for presentations for the typical Business Analytics/Big Data, Hardcore DBA, Development, Cloud and Applications.

Do you have a story to share? Have you discovered something recently? Have you used a new product? Is there anything about some part of an Oracle Product you would like to share?

If so then click on the image below to go to the Submissions Website and get submitting. Closing date for submissions is 9am on 4th January.

NewImage

Hopefully I'll see you in Dublin in March.

Thursday, November 19, 2015

My presentations at UKOUG 2015

The annual conferences of the UKOUG are coming up soon and I just wanted to give you a quick overview of the 3 presentations I will be giving at these conferences in Birmingham between Sunday 6th and Wednesday 9th December.

I'll be making references to 50 Shades of Gray and Lord of the Rings in some of these presentations. Come along to find out why.

It all kicks off on the Sunday (12:30-14:20) with an almost 2 hour presentation titled 'Predictive Analytics in Oracle: Mining the Gold & Creating Valuable Products'. We have being hearing about Predictive Analytics for a few years now. But most people are still unsure of what it really means and how it could be used in their company. This presentations aims to simplify what Predictive Analytics means and to illustrate how various companies from a wide range of industries have been using Predictive Analytics to gain deeper, valuable and actionable insights of their data. A number of case studies will be presented, based on some of my recent projects, illustrating how predictive analytics has been used to generate significant benefits for each company. These case studies will include examples from Banking, Telecommunications, Financial Fraud Detection, Retail and Merchandising. The second half of this seminar will look at what functionality is available from Oracle that allows you to build and deploy Predictive Analytics in your applications. We will at what you can do with SQL, PL/SQL, Oracle Data Miner, Oracle R Enterprise and Real-Time Decisions. We will then look at how you can build these features and tools into your front end applications. In the final section of the seminar we will look at what you need to do to get started with Predictive Analytics and how you can avoid some of the typical pitfalls.

On Monday 7th, I have a presentation between 9:00 and 9:50 titled 'Is Oracle SQL the Best Language for Statistics?'. Did you know that Oracle comes with over 280+ statistical functions? and that these statistical functions are available in all versions of the Database? Most people do not seem to know this.When we hear about people performing statistical analytics we can hear them talking about Excel and R. But what if we could do statistical analysis in the database, without having to extract any data onto client machines. That would be really good and just think of the possible data security issues with using Excel, R and other tools.This presentation will explore the various statistical areas available in SQL and I will give a number of demonstrations on some of the more commonly used statistical techniques. With the introduction of Big Data SQL we can now use these same 280+ Oracle statistical functions on all our data including Hadoop and NoSQL. We can also greatly expand our statistical capabilities by using Oracle R Enterprise using the embedded capabilities in SQL.

My final presentation will on Wednesday (14:30-15:20) titled 'Automating Predictive Analytics in Your Applications'. What does your Oracle DBA and Oracle Developer need to know to implement Data Mining? How can the Oracle DBA and Oracle Developer use Data Mining. This presentation will look at what the Oracle DBA and Oracle Developer needs to know to deploying, implementing and integrating the predictive models produced by the Data Miner. These predictive models can be used in multiple databases and across their many applications. Examples will be given of the typical tasks data mining task for the DBA and for the Oracle Developer. These examples will include SQL, PL/SQL and embedded Oracle R Enterprise. I will show how easy it is to build automated Predictive Analytics into, not only your Dashboards, but your everyday applications. Here is a short video that gives an overview of my presentations for TECH15.



Sunday, November 15, 2015

Error when trying to use GLM in ORE

If you have tried to use the ore.odmGLM function in ORE and some other functions in ORE and you are using the Oracle 12.1.0.2 Database you will be getting an error something like the following in R.

> GLMmodel <- ore.odmGLM(AFFINITY_CARD ~., ANALYTIC_RECORD)
Error in .oci.GetQuery(conn, statement, data = data, prefetch = prefetch,  : 
  ORA-29400: data cartridge error
ORA-40024: solution to least squares problem could not be found
ORA-06512: at "SYS.DBMS_DATA_MINING", line 2153
ORA-06512: at line 1

or you get something like the following when using ore.glm

Error in .External(C_pt, q, df, lower.tail, log.p) :
  NULL value passed as symbol address

Similarly if you are using these functions with embedded R execution when the R code is warpped in SQL, you will see similar errors.

What you need to do is to download and install the stand alond Patch 20173897.

NewImage

Alternatively this patch is included in the latest Oracle 12.1.2 Database patch.

Thursday, November 5, 2015

Slide from my OOW15 Presentations

At Oracle Open World (OOW15) I gave 2 presentations on the Sunday during the Oracle User Group Forum. The slides are now available for download from the Oracle Open World website.

Go get them now!

More Than Another 12 on Oracle Database 12c [UGF3190]

During this sessions I was one of 16 presenters talking about various features in the Oracle Database. All of the presenters where from the EOUC region.

Real Business Value from Big Data and Advanced Analytics [UGF4519]

I co-presented with Antony Heljula from Peak Indicators. During this presentation we talked about some of the Advanced Analytics projects we have worked on over the past 18-24 months. We also announced a new Analytics-as-a-Service offering.

The slides are also available for most of the other Oracle Open World Presentations and these can be accessed here. Just go search for the topic you are interested in.

Check out my previous blog post that summarises just a small part of what I got up to at OOW15.


Friday, October 30, 2015

My OOW15 is now over

It seems to be traditional for people to write a blog post that summaries their Oracle Open World (OOW) experience. Well here is my attempt and it really only touches on a fraction of what I did at OOW, which was one of busiest OOWs I've experienced.

It all began back on Wednesday 21st October when I began my journey. Yes that is 9 days ago, a long long 9 days ago. I will be glad to get to home.

The first 2 days here in San Francisco was down at Oracle HQ were the traditional Oracle ACE Directors briefings are held. I'm one of the small (but growing) number of ACEDs and it is an honour he take part in the programme. The ACE Director briefings are 2 days of packed (did I say they are packed) full on talks by the leading VPs, SVPs, EVPs and especially Thomas Kurian. Yes we had Thomas Kurian come into talk to us for 60-90 minutes of pure gold. We get told all the things that Oracle is going to release or announce at OOW and for the next few months and beyond. Some of things that I was particularly interested in was the 12.2 database stuff.

Unfortunately we are under NDA for all of the stuff we were told, until Oracle announce it themselves.

On the Thursday night a few of us meet up with Bryn Llewellyn (the god father of PL/SQL) for a meal. Here is a photo to prove it. It was a lively dinner with some "interesting" discussions.

NewImage

On the Friday we all transferred hotels into a hotel beside Union Square.

We had the Saturday free, and I'm struggling to remember what I actually did. But it did consist of going out and about around San Francisco. Later that evening there was a meet up arranged by "That" Jeff Smyth in a local bar.

Sunday began with me walking across (and back) the Golden Gate Bridge with a few other ACEDs and ACEs.

NewImage

The rest of the Sunday was spent at the User Groups Forum. I was giving 2 presentations. The first presentation was part of the "Another 12 things about Oracle 12c". For this session there was 15 presenters and we each had 7 minutes to talk about a topic. Mine was on Oracle R Enterprise. It was an almost full room, which was great.

NewImage NewImage

Then I had my second presentation right afterwards and for that we had a full room. I was co-presenting with Tony Heljula and we were talking about some of the projects we have done using Oracle Advanced Analytics.

NewImage

After that my conference duties were done and I got to enjoy the rest of the conference.

Monday and Tuesday were a bit mental for me. I was basically in sessions from 8am until 6pm without a break. There was lots of really good topics, but unfortunately there was a couple presentations that were total rubbish. Where the title and abstract had no relevance to what was covered in the presentation, and even the presentation was rubbish. There was only a couple of these.

Wednesday was the same as Monday and Tuesday but this time I had time for lunch.

As usual the evenings are taken up with lots of socials and although I had great plans to go to lots of them, I failed and only got to one or two each evening.

On Wednesday night we ended up out at Treasure Island to be entertained by Elton John and Becks, and then back into Union Square for another social.

Thursday was very quiet and as things started to wind down at OOW finishing up at 3pm. A few of us went down to Fishermans Wharf and Pier39 for a wander around and a meal. Here are the some photos from the restaurant.

NewImage NewImage

Friday has finally arrived and it is time to go home. OTN are very generous and put on a limo for the ACEDs to bring us out to the airport.

NewImage

As always there are lots of people to thank but I won't start naming names here as I'm sure I will forget someone. But I do want to thank all the gang in OTN that look after the Oracle ACE programme. You really look after us, not just at OOW, but all year round. It is with your support that I get to go to wonderful conferences and to OOW.

Wednesday, October 21, 2015

People from Ireland who are Speaking at #OOW15

Oracle Open World 2015 will be kicking off in a few days time. There will be over 50K people attending this event, with a couple of hundred speakers.

I'm one of the lucky ones to have been selected to speak at Oracle Open World. This will be my third or fourth year in a row that I'm speaking at Oracle Open World.

Am I the only person from Ireland who is speaking at Oracle Open World ? No, I'm not, but there is a small number of us this year. There are basically 2, yes 2 other people from Ireland speaking at Oracle Open World. These are Richard Corbridge from the HSE, and Debra Lilley.

If you are interesting in attending these sessions here is the schedule.

NewImage

Please share this information to allow your colleagues and help spread the news about he Irish Speakers.

If you are attending OOW, then please tweet, blog, Facebook, etc about these presentations.

Hopefully I will see you are one or both of my presentations on the Sunday.

And I will be having a draw/raffle for a copy of my book at my second presentation on the Sunday at 15:30 in Moscone South - 307

NewImage

Monday, October 19, 2015

My schedule for OOW15

It seems to be a things that people blog about their schedule for Oracle Open World and talk about how busy they will be.

So to join the club this is my current schedule.

NewImage

The boxes that are (a kind of) orange with red text are when I have MY presentations.

The purple boxes indicate some fun event and entertainment.

When attending conferences sometime you get to hear about a good presentation or some other events just happen. So this schedule is tentative and will probably only reflect 40%-60% of what I will really attend/do. There is so much going on each day.

You will see I have some Hands-on-Labs booked. These are a great way to try out some product for an hour. I highly recommend you doing some too.

By far my favourite part of my OOW trip is the Oracle ACE Director briefing that OTN arranges for us. This briefing will happen on the Thursday (22nd) and Friday (23rd) before OOW, and this is held in Oracle HQ in Redwood Shores.

We have been given a tentative schedule for the ACED Briefing and for me I highlight of this briefing is when Thomas Kurian comes in an very openly talks to us, telling us lots and lots and lots about what is happening across the Oracle product set. I would lover to be able to share what we are told but we are all under NDA. After Thomas we have some Cloud updates, then we get onto some updates on Oracle Development Tools, lots of updates on what is coming next in the Oracle Database. So we get the likes of Penny Avril, Andy Mendelsohn, Tom Michelini, Steven Feurerstein, Roland Smart and Wim Coekaerts, among others. So lots and lots of EVPs, Sr VPs and VPs.

After the briefing is over we then hop on a bus and head to our hotel in download San Francisco, near Union Square, where we will be based for the OOW.

Thanks to all in OTN and to those who run the Oracle ACE programme for arranging and paying for my flights, hotels and transportation.

Monday, October 12, 2015

SQL and PL/SQL icons and stickers

Over the past couple of weeks I've been preparing my slides and presentations for Oracle Open World (2015).

One thing that occurred to me was that there was no icon or image to represent Oracle SQL and PL/SQL. I needed something that I could include in my presentations to represent these.

After a bit of Tweeting it turns out that there is no (official) icons or images for Oracle SQL and Oracle PL/SQL.

So I created some and here they are.

SQL icon sm PLSQL icon sm

and there are these

SQL 2 sm PLSQL 2 sm

Feel free to use these in your presentations and share around. All I ask is that you give me an odd acknowledgement from time to time.

Stickers

If you would like to get these as stickers and put them on your laptop, notebooks, or anywhere really, you can order them on Stickermule.

NewImage

NB: It is important to note that these are in no way approved or acknowledged or endorsed or anything else by Oracle.

Friday, October 9, 2015

From Zero to Dashboards in 10min with BICS

Unless you have been going around with your head in the clouds, all you can hear from all the big names in the IT worlds is about moving to the Cloud. (yes that was a poor attempt at a joke. It is Friday afternoon after all).

The title of this post is 'From Zero to Dashboards in 5 min with BICS'.

BICS stands for Oracle BI Cloud Service.

Over the past few months I've been working rolling out BICS at a number of different sites, and it is surprising how quickly you can get up and running with BICS.

There is no install. There is the minimum of setup and configuration. All you need to do is to create a user and give them some privileges. All of that takes 2 minutes.

Next get them to log in to BICS (2 minutes for the first time they log in) and load up some data using the Data Load feature (another 2 minutes).

Then you can move onto the fun part of creating some Dashboards using Visual Analyzer. See the example below of one that we created in 4 minutes.

Bics 1

So all of the above took 10 minutes. How quick and easy is that!

Give it a go yourself and see how quick it is for you.

Let me know if you have any question on using BICS and if you need any help.

Tuesday, September 8, 2015

24th Sept: Oracle UG Ireland: BA, Big Data & Tech SIG (Free) event

This is a Free Event.

Register today to reserve your place, as places are limited.

Our next BA, Big Data & Tech SIG will be on Thursday 24th September. This event will be held in Jurys Inn, Customs House, in Dublin.

We have decided to merge the two SIGs for this event to offer everyone even more content and learning opportunities. The day is filled with presentations from End Users and from Product Experts. We will also have some of the latest updates from Oracle.

It is the ideal opportunity for you to learn, share and network with other users.

Here is the agenda for this full day event.

NewImage

As you can see this a really good agenda. So I hope to see you there.

You need to book your place for this SIG. To book your place go to here.


If you have ideas of topics you would like covered at the next SIG, just let me know.

I'll see you there.

Tuesday, September 1, 2015

My presentations at OOW 2015

Oracle Open World is coming at the end of October. Lots of people are starting to get excited as Oracle Open World is the number one place for all the Oracle geeks from around the World to hang out for a week.

This year I will be giving 2 presentations. Both of them are on the Sunday. That means I get to really enjoy the rest of the conference. These presentations are:

Sunday 13:30-14:15 & 14:30-15:14  (a double session)
Session Title : 12 more things about Oracle 12c 
Description: These sessions are being organised by the EMEA Oracle User Group and follow on from the successful sessions that were at last years Oracle Open World. These sessions/presentations will consist of 12 presenters giving a 5-7 minute talk on a topic related to the Oracle Database. Many thanks to Debra Lilley for arranging these session.  

For me I will be talking about Running R in the Oracle Database using Oracle R Enterprise. 5-7 Minutes is not enough time for this topic, but the whole idea of these sessions and talks to to give people a taster of what is possible in the Oracle Database. You can then use your new knowledge to check out other sessions at OOW and/or to check it out when you get home.

I'll put up another blog post when I have the full list and schedule of presenters.

Sunday 15:30-16:15  (yes this is right after the previous session/presentation)
Session Title : Real Business Value from Big Data and Advanced Analytics
Description : I will be co-presenting with Antony Heljula and we will be talking about some of the advanced analytics and big data projects we have worked on. We have had some amazing results with these projects and we will be giving you a taster of what is possible. Also some of these project didn't involve big data. So even with small data you can have amazing results using Advanced Analytics / Predictive Analytics / Data Mining / Data Science.

After these presentations I get to enjoy the rest of Oracle Open World with having to worry about giving another presentation.

My diary for the conference is filling up fast and it looks like I will be busy each day from 8am to late into the evening. Oracle Open World has lots and lots of social and networking events and these are great for meeting up with people and making new contacts.

Wednesday, August 26, 2015

Enabling Autotrace in SQL Developer

This is mainly a note to myself to save me from looking up the details very time I need it. I can just go to my own post.

To use Autotrace in SQL you need to grant the schema the PLUSTRACE role.

But in SQL Developer you need the PLUSTRACE role that has a few additional privileges.

Connect to SYS as sysdba and run the following.

grant select on v_$session to plustrace;
grant select on v_$sql_plan to plustrace;
grant select on v_$sql_plan_statistics to plustrace;
grant select on v_$sql to plustrace;

Now you will be able to run Autotrace in SQL Developer.

(Make sure to reconnect your SQL Developer connection so that the privileges can be picked up).

Friday, August 14, 2015

Managing ORE in-database Data Stores using SQL

When working with ORE you will end up creating a number of different data stores in the database. Also as your data science team increases the number of data stores can grow to a very large number.

When you install Oracle R Enterprise you will get a number of views that are made available to ORE users to see what ORE Data stores they have and what objects exist in them. All using SQL.

Perhaps some of the time the ORE developers and data analysts will use the set of ORE functions to manage the in-database ORE Data stores. These include:

NewImage

When using these ORE function the schema user/data scientist can see what ORE Data stores they have. You can use the ore.delete to delete an ORE Data store when it is no longer needed.

But the problem here is that over time your schemas can get a bit clogged up with ORE Data stores. Particularly when the data scientist is not longer working on the project or there is no need to maintain ORE Data stores. This is common on data science projects when you might have a number of data scientists work in/sharing the one database schema.

For a DBA, who's role will be to clean up the ORE Data store that are no longer needed, you have 4 options.

The first of these, is if all the ORE Data stores exist in the data scientists schema and nothing else in the schema is needed then you can just go ahead and drop the schema.

The second option is to log into the schema using SQL and drop the ORE Data stores. See an example of this below.

The third option is to connect to the Oracle schema using R and ORE and then use the ore.delete function to drop the ORE Data stores.

The fourth option is to connect to the RQSYS schema. This schema is the owner of the views used to query the ORE Data stores in each schema. After the RQSYS schema was created it was locked as part of the ORE installation. You as the DBA will need to unlock and then connect.

The following SQL lists the ORE Data stores that were created for that schema.

column dsname format a20
column description format a35

SELECT * FROM rquser_DataStoreList;

DSNAME                     NOBJ     DSSIZE CDATE     DESCRIPTION
-------------------- ---------- ---------- --------- -----------------------------------
ORE_DS                        2       5104 04-AUG-15 Example of ORE Datastore
ORE_FOR_DELETION              1       1675 14-AUG-15 Need to Delete this ORE Datastore
ORE_DS2                       5   51466509 04-AUG-15 DS for all R Env Data

You can also view what objects have saved in the ORE Data store.

column objname format a15
column class format a15
SELECT * FROM rquser_DataStoreContents;
 
DSNAME               OBJNAME         CLASS              OBJSIZE     LENGTH       NROW       NCOL
-------------------- --------------- --------------- ---------- ---------- ---------- ----------
ORE_DS               CARS_DATA       ore.frame             1306         11         32         11
ORE_DS               cars_ds         data.frame            3798         11         32         11
ORE_DS2              cars_ds         data.frame            3798         11         32         11
ORE_DS2              cars_ore_ds     ore.frame             1675         11         32         11
ORE_DS2              sales_ds        data.frame        51455575          7     918843          7
ORE_DS2              usa_ds          ore.frame             2749         23      18520         23
ORE_DS2              usa_ds2         ore.frame             2712         23      18520         23
ORE_FOR_DELETION     cars_ore_ds     ore.frame             1675         11         32         11

To drop an ORE Data store for you current schema you can use the rqDropDataStore SQL function.

BEGIN
   rqDropDataStore('ORE_FOR_DELETION');
END;
/

For the DBA when you unlock and connect to the RQSYS schema you will be able to see all the ORE Data stores in the data. The views will contain an additional column.

But if you use the above SQL function to delete an ORE Data store it will not work. This because this SQL function will only drop and ORE Data store if it exists in your schema. If we have connected to the RQSYS schema we will not have any ORE Data stores in it.

We can create a procedure that will allow use to delete/drop any ORE Data store in any schema.

create or replace PROCEDURE my_ORE_Datastore_Drop(
  ds_owner  in VARCHAR2,
  ds_name  IN VARCHAR2
  )
IS
  del_objIds rqNumericSet;
BEGIN
  del_objIds := rq$DropDataStoreImpl(ds_owner, ds_name);
  IF del_objIds IS NULL THEN
    raise_application_error(-20101, 'DataStore ' ||
                            ds_name || ' does not exist');
  END IF;

  -- remove from rq$datastoreinventory
  BEGIN
    execute immediate
       'delete from RQ$DATASTOREINVENTORY c where c.objID IN (' ||
       'select column_value from table(:del_objIds))' using del_objIds;
     COMMIT;
  EXCEPTION WHEN others THEN null;
  END;
END;

We are the DBA, logged into the RQSYS schema can now delete any ORE Data store in the database, using the following.

BEGIN
   my_ORE_Datastore_Drop('ORE_USER', 'ORE_FOR_DELETION');
END;
/

Thursday, July 30, 2015

Check out What Sauron is saying about Oracle

Over past year we have been (hopefully) hearing about Oracle Big Data SQL.

This is a new(-ish) option from Oracle that allows us to run our SQL queries not just on the data in our Oracle Database but also against NoSQL databases and Hadoop. No extra coding is needed, no extra formatting is needed, etc.

All the hard work in connecting to the data in this systems, translating it into executable code on these systems, executing it, capturing the results and presenting the results back to us sitting in our schema in our Oracle Database.

How cool is that.

To learn more about Oracle Big Data SQL check out their webpage.

But what let us get back to the title of this blog post, 'What Sauron is saying about Oracle'. I used these back at one of my presentation at BIWA Summit in January 2015 and I've been meaning to post these since.

If you have read books or watch the movies you will remember the phrase.

NewImage

We can apply this phrase to Oracle SQL now.

NewImage

or maybe my alternative version might be better.

NewImage

Tuesday, July 28, 2015

Charting Number of R Packages over time (Part 3)

This is the third and final blog post on analysing the number of new R packages that have been submitted over time.

Check out the previous blog posts:

In this blog post I will show you how you can perform Forecasting on our data to make some predictions on the possible number of new packages over the next 12 months.

There are 2 important points to note here:

  1. Only time will tell if these predictions are correct or nearly connect. Just like with any other prediction techniques.
  2. You cannot use just one of the Forecasting techniques in isolation to make a prediction. You need to use a number of functions/algorithms to see which one suits your data best.

The second point above is very important with all prediction techniques. Sometimes you see people/articles talking about them only using algorithm X. They have not considered any of the other techniques/algorithms. It is their favourite or preferred method. But that does not mean it works or is suitable for all data sets and all scenarios.

In this blog post I'm going to use 3 different forecasting functions, the in-build Forecast function in R, using HoltWinters and finally using ARIMA. Yes there are many more (it is R after all) and I'll leave these for you to explore.

1. Convert data set to Time Series data format

The first thing I need to do is to convert the data I want analyze into TimeSeries format (ts). This looks to have one record or instance for each data point.

So you cannot not have any missing data, or in my case any missing dates. Yes (for my data set) we could have some months where we do not have any submissions. What I could do is to work out mean values (or things like that) and fill for the missing months. But I'm feeling a bit lazy and after examining the data I see that we have a continuous set of data from September 2009 onwards. This is fine as most of the growth up to that point is flat.

So I need to subset the data to only include cases greater than or equal to September 2009 and less than or equal to June 2015. I wanted to explore July 2015 as the number for this month is incomplete.

The following code builds on the work we did in the second blog post in the series

library(forecast)
library(ggplot2)

# Subset the data
sub_data <- subset(data.sum, Group.date >= as.Date("2009-08-01", "%Y-%m-%d"))
sub_data <- subset(sub_data, Group.date <= as.Date("2015-06-01", "%Y-%m-%d"))

# Subset again to only take out the data we want to use in the time series
sub_data2 <- sub_data[,c("R_NUM")]

# Create the time series data, stating that it is monthly (12) and giving the start and end dates
ts_data <- ts(sub_data2, frequency=12, start=c(2009, 8), end=c(2015, 6))

# View the time series data
ts_data
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2009                               2   3   4   3   4
2010   5   5   1  11   5   2   5   4   4   5   1   3
2011  11   4   3   6   6   5   9  15   5   8  23  18
2012  33  17  51  28  37  33  50  71  41 231  51  60
2013  75  67  76  81  76  74  77  89 111  96 111 200
2014 155 129 175 140 145 133 155 207 232 162 229 310
2015 308 343 332 378 418 558    

We now have the data prepared for input to our Forecasting functions in R.

2. Using Forecast in R

For the Forecast function all you need to do is pass in the Time Series dataset and tell the function how my steps into the future you want it to predict. In all my examples I'll ask the functions to predict for the next 12 months.

ts_forecast <- forecast(ts_data, h=12)
ts_forecast

         Point Forecast     Lo 80     Hi 80 Lo 95     Hi 95
Jul 2015       447.1965  95.67066  784.3163     0  958.6669
Aug 2015       499.4344 115.94329  873.1822     0 1069.8689
Sep 2015       551.7875 123.88426  952.0773     0 1230.4707
Oct 2015       603.7212 156.89486 1078.0395     0 1370.2069
Nov 2015       654.7647 143.29718 1179.4903     0 1603.8335
Dec 2015       704.5162 135.76829 1352.8925     0 1844.7230
Jan 2016       752.6447 151.09936 1502.6088     0 2100.9708
Feb 2016       798.8877 156.37383 1652.0847     0 2575.3715
Mar 2016       843.0474 159.67095 1848.1703     0 2888.1738
Apr 2016       884.9849 154.59456 2061.7990     0 3281.6062
May 2016       924.6136 148.04651 2325.9060     0 3891.5064
Jun 2016       961.8922 138.67935 2531.7578     0 4395.6033

plot(ts_forecast)

NewImage

For this we get a very large range of values and very wide predictive intervals. If we limit the y axis we can get a better picture of the actual predictions.

plot(ts_forecast, ylim=range(0:1000))
NewImage

3. Using HoltWinters

For HoltWinters we can use the in-built R function for this. All we need to do is to pass in the Time Series data set. The first part we can plot the HoltWinters for the existing data set

?HoltWinters
hw <- HoltWinters(ts_data)
plot(hw)
NewImage

Now we want to predict for the next 12 months

forecast <- predict(hw, n.ahead = 12, prediction.interval = T, level = 0.95)
forecast

              fit       upr      lwr
Jul 2015 519.9304  599.8097 440.0512
Aug 2015 560.1083  648.4183 471.7983
Sep 2015 601.4528  701.0163 501.8892
Oct 2015 643.9639  757.3750 530.5529
Nov 2015 681.5168  811.0727 551.9608
Dec 2015 724.7363  872.4508 577.0218
Jan 2016 773.8308  941.4768 606.1848
Feb 2016 809.8836  999.0401 620.7272
Mar 2016 847.1448 1059.2371 635.0525
Apr 2016 898.4476 1134.7795 662.1158
May 2016 933.8755 1195.6532 672.0977
Jun 2016 972.3866 1260.7376 684.0356

plot(hw, forecast)
NewImage

4. Using ARIMA

For ARIMA we need to perform a simple conversion of the Time Series data into ARIMA format and then perform the forecase

fc_arima <- auto.arima(ts_data)
fc_fc_arima  <- forecast(fc_arima, h=12)
fc_fc_arima

         Point Forecast    Lo 80     Hi 80    Lo 95     Hi 95
Jul 2015       524.4758 476.2203  572.7314 450.6753  598.2764
Aug 2015       567.1156 513.2301  621.0012 484.7048  649.5265
Sep 2015       609.7554 548.3239  671.1869 515.8041  703.7068
Oct 2015       652.3952 581.6843  723.1062 544.2522  760.5383
Nov 2015       695.0350 613.5293  776.5408 570.3828  819.6873
Dec 2015       737.6748 644.0577  831.2920 594.4998  880.8499
Jan 2016       780.3147 673.4319  887.1974 616.8516  943.7777
Feb 2016       822.9545 701.7797  944.1292 637.6337 1008.2752
Mar 2016       865.5943 729.2004 1001.9881 656.9978 1074.1907
Apr 2016       908.2341 755.7718 1060.6963 675.0631 1141.4050
May 2016       950.8739 781.5559 1120.1918 691.9244 1209.8233
Jun 2016       993.5137 806.6027 1180.4246 707.6581 1279.3693

plot(fc_fc_arima, ylim=range(0:800))
NewImage

As you can see there are very different results from each of these forecasting techniques. If this was a real life project on real data then we would go about exploring a lot more of the Forecasting function available in R. The reason for this is to identify which R function and Forecasting algorithm works best for our data.

Which Forecasting technique would you choose from the selection above?

But will this function and algorithm always work with our data? The answer is NO. As our data evolves so may the algorithm that works best for our data. This is why the data science/analytics world is iterative. We need to recheck/revalidate the functions/algorithms to see if we need to start using something else or not. When we do need to use another function/algorithm you need to ask yourself why this has happened, what has changed in the data, what has changed in the business, etc.

Wednesday, July 22, 2015

Charting Number of R Packages over time (Part 2)

This is the second blog post on charting the number of new R Packages over time.

Check out the first blog post that looked at getting the data, performing some simple graphing and then researching some issues that were identified using the graph.

In this blog post I will look at how you can aggregate the data, plot it, get a regression line, then plot it using ggplot2 and we will include a trend line using the geom_smooth.

1. Prepare the data

In my previous post we extracted and aggregated the data on a daily bases. This is the plot that was shown in my previous post. This gives us a very low level graph and perhaps we might get something a little bit more useable is we aggregated the data. I have the data in an Oracle Database so it would be easy for me to write another query to perform the necessary aggregation. But let's make things a little bit trickier. I'm going to use R to do the aggregation.

Our data set is in the data frame called data. What I want to do is to aggregate it up to monthly level. The first thing I did was to create a new column that contains the values of the new aggregate level.

data$R_MONTH <- format(rdate2, "%Y%m01")
data$R_MONTH <- as.Date(data$R_MONTH3, "%Y%m%d")
data.sum <- aggregate(x = data[c("R_NUM")],
                    FUN = sum,
                    by = list(Group.date = data$R_MONTH)
)

2. Plot the Data

We now have the data aggregated at monthly level. We can now plot the graph. Ignore the last data point on the chart. This is for July 2015 and I extracted the data on the 9th of July. So we do not have a full months of data here.

plot(as.Date(data.sum$Group.date), data.sum$R_NUM, type="b", xaxt="n", cex=0.75 , ylab="Num New Packages", main="Number of New Packages by Month")
axis(1, as.Date(data.sum$Group.date, "%Y-%d"), as.Date(data.sum$Group.date, "%Y-%d"), cex.axis=0.5, las=1)

This gives us the following graph.

NewImage

3. Plot the data using ggplot2

The basic plot function of R is great and allows us to quickly and easily get some good graphs produced. But it is a bit limited and perhaps we want to create something that is a bit more elaborate. ggplot2 is a very popular package that can allow us to create a graph, building it up in a number of steps and layers to give something that is a lot more professional.

In the following example I've kept things simple and Yes I could have done so much more. I'll leave that as an exercise for you go off an do.

The first step is to use the qplot function to produce a basic plot using ggplot2. This gives us something similar to what we got from the plot function.

library(ggplot2)
qplot(x=factor(data.sum$Group.date), y=data.sum$R_NUM, data=data.sum, 
       xlab="Year/Month", ylab='Num of New Packages', asp=0.5)

This gives us the following graph.

NewImage

Now if we use ggplot2 then we need to specify a lot more information. Here is the equivalent plot using ggplot2 (with a line plot).

NewImage

4. Include a trend line

We can very easily include a trend line in a ggplot2 graph using the geom_smooth command. In the following example we have the same chart and include a linear regression line.

plt <- ggplot(data.sum, aes(x=factor(data.sum$Group.date), y=data.sum$R_NUM)) + geom_line(aes(group=1)) +
  theme(text = element_text(size=7),
        axis.text.x = element_text(angle=90, vjust=1)) + xlab("Year / Month") + ylab("Num of New Packages") +
  geom_smooth(method='lm', se=TRUE, size = 0.75, fullrange=TRUE, aes(group=20))
plt

NewImage

We can tell a lot from this regression plot.

But perhaps we would like to see a trend line on the chart, with something like a moving averages plot. Plus I've added in a bit of scaling to help with representing the data at a monthly level.

library(scales)
plt <- ggplot(data.sum, aes(x=as.POSIXct(data.sum$Group.date), y=data.sum$R_NUM)) + geom_line() + geom_point() +
  theme(text = element_text(size=12),
        axis.text.x = element_text(angle=90, vjust=1)) + xlab("Year / Month") + ylab("Num of New Packages") +
  geom_smooth(method='loess', se=TRUE, size = 0.75, fullrange=TRUE) +
  scale_x_datetime(breaks = date_breaks("months"), labels = date_format("%b"))
plt
NewImage

In the third blog post on this topic I will look at how we can use some of the forecasting and predicting functions available in R. We can use these to see help us visualize what the future growth patterns might be for this data. I have some interesting things to show.

Monday, July 20, 2015

the R Consortium

On the 30th June (2015) a number of companies came together to form the R Consortium. The aim of the R Consortium is to support the R community and to help it evolve.

NewImage

In a way the formation of this group is not surprising as there is a growing list of companies who have their own support implementation of R that provides a number of additional and very important features. Most of these features evolve around making R more useable within applications and for easier production deployment.

Founding companies and organizations of the R Consortium include The R Foundation, Platinum members Microsoft and RStudio; Gold member TIBCO Software Inc.; and Silver members Alteryx, Google, HP, Mango Solutions, Ketchum Trading and Oracle.

It is important to note about this group is that they won't be looking at the R language but will be looking at the infrastructure that supports R.

The big boys are now onboard and promoting R and it will be interesting to see what directions they will come up with.

This is something worth keeping an eye on.

Friday, July 17, 2015

UKOUG Tech 15 : Acceptances & Agenda

Just over a week ago the Acceptance emails started to go out for the UKOUG TECH15 and APPS15 conferences. Also at this time the rejection emails started to go out too :-(

I was on the receiving end of both of these type of emails.

But I was delighted with the news that I received.

My topic areas crosses the TECH and APPs topics and also crosses the Business Analytics stream which looks to bridge this divide in the world of Business Intelligence.

So in December I will be giving the following presentations.

Sunday 6th : Super Sunday event : Business Analytics Stream

12:30 - 14:30 : (2 hours) : Predictive Analytics in Oracle: Mining the Gold & Creating Valuable Products


I'll be giving away a copy of my book on Oralce Data Mining to one lucky attendee at this Super Sunday session.


Monday 7th : 9:00-9:50 : Analytics Stream

Is Oracle the Best Language for Statistics


Wednesday 9th : 14:30-15:20 : Big Data & Data Warehousing Stream (TECH 15 & APPS15)

Automating Predictive Analytics in Your Applications


Check out the full agendas for TECH 15 and APPS15 by clicking on appropriate image below.

NewImage

NewImage

NewImage


Go Register for these events now!

Wednesday, July 15, 2015

Charting Number of R Packages over time (Part 1)

This is the first of a three part blog post on charting and analysing the number of R package submissions.

(I will update this blog post with links to the other two posts as they come available)

I'm sure most of you have heard of the R programming language. If not then perhaps it is something that you might want to go off an learn a bit about. Why? well it is one of the most popular languages for performing various types of statistics, advanced topics on statistics and machine learning and for generating lots of cool looking graphs.

If this is not something that you might be interested then it is time to go to another website/blog.

In this blog post I'm going to chart the number of packages submitted to R and are available for download and installation.

Why am I doing this? I got bored one day after coming back from my vacation and I though it would be a useful thing to do. Then after doing this I decided to use these graphs somewhere else, but you will have to wait until 2016 to find out!

The R website has a listing of all the packages and the dates that they were submitted.

NewImage

There are a variety of tools available that you can use to extract the information on this webpage and there are lots of examples or R code too. I'll leave that as a little exercise for you to do.

I extracted all of this information and stored it in a table in my Oracle Database (of course I did as I work with Oracle databases day in day out). This will allow me to easily reuse this data whenever I need it plus I can update this table with new packages from time to time.

NewImage

The following R code:

  1. Setups up and ROracle connection to my schema in my database
  2. Connects to the database
  3. Setups up a query to extract the data from the table
  4. Fetches this data into an R data frame called data
  5. Reformat the date columns to remove the time element to it
  6. Plot the data
library(ROracle)

drv <- dbDriver("Oracle")
# Create the connection string
host <- "localhost"
port <- 1521
service <- "pdb12c"
connect.string <- paste(
  "(DESCRIPTION=",
  "(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
  "(CONNECT_DATA=(SERVICE_NAME=", service, ")))", sep = "")

con <- dbConnect(drv, username = "brendan", password = "brendan",dbname=connect.string)

res<-dbSendQuery(con, "select r_date, count(*) r_num from r_packages
                       group by r_date order by 1 asc")
data <- fetch(res)

rdate<- data$R_DATE
rdate2<-as.Date(rdate,"%d/%m/%y")

plot(data$R_NUM~rdate2, data, type="l" , xaxt="n")
axis(1, rdate2, format(rdate2, "%b %y"), cex.axis=.7, las=1)

After I run the above code I get the following plot.

NewImage

(Yes I could have done a better job on laying out the chart with all sorts of labels and colors etc)

This chart gives us a plot of the number of new submissions by day.

There are 2 very obvious things that stand out from this graph. The easiest one to deal with is that we can see that there has been substantical growth in new submissions over the past 3 years. Perhaps we need to examine these a bit closer and when you do you will find that a lot of these are existing packages that have been resubmitted with updates.

There is a very obvious peak just over half ways along the chart. We really need to investigate this to understand what has happended. This peak occurs on the 29th October 2012. What happened on the 29th October 2012 as this is clearly an anomaly with the rest of the data. Well on this date R version 2.15.2 was release and a there was a lot of update pagackes got resubmitted.

Check out my next two blog posts were I will explore this data in a bit more detail.

Part 2 blog post

Part 3 blog post

Monday, July 13, 2015

V506 of Oracle OBIEE SampleApp Virtual Machine

A few days ago Oracle released the latest version of the Virtual Machine for OBIEE SampleApp. The current version has a number of new features and new product versions (see below).

To get this latest version go to the following link to download the VM files and to install. As always this is a beast of a VM and you should only consider the install and setup if you have the space and in particular you have 16G RAM.

Oracle Business Intelligence Enterprise Edition Samples on OTN.

NewImage

v506 New Features

  • DB 12.1.0.2 with the In-Memory option
  • Load and process JSON data
  • Integrates with Big Data SQL
  • Connects to Impala
  • Session tracking in UT
  • Exalytics Aggregation Functions
  • Lots of new Visualizations
  • Custom Style Features
  • Hierarchical Session Variables
  • etc.

Software on V506

  • Oracle Enterprise Linux 6.5 x64
  • OBIEE 11.1.1.9GA two distinct OBIEE instances, Essbase 11.1.2.4, updated BIMAD
  • Oracle MapViewer11.1.1.9.1
  • Oracle BICS Data Sync v1
  • Oracle Database 12c IMDB 12.1.0.2, PDB Install, AWM 12.1.0.2a, APEX 4.2.6 & ORDS 2.0.1, ODM, Oracle Spatial and Graph
  • ORE 1.4.1 & R 3.1.1
  • ENDECA 3.1, Server 7.6.1, Studio 3.1, Provisioning Services
  • Cloudera CDH 5.1.2, Oracle BigData SQL, Oracle BigData Connectors
  • Plug and Play Companions : EPM 11.1.2.3, BIApps Demos
  • Utils: Start scripts, MapBuilder, SQLDev 4.1
NewImage

Thursday, July 9, 2015

Oracle Architect's Guides to Big Data

Over the past couple of years we have had a lot of information about Big Data presented to us. But one of the things that still stands out is that there is still a bit of confusion on what Big Data is. Depending on who you are talking to you will get a different definition and interpretation of what Big Data is and what you can do with it.

For example there is one company I know of who are talking about their Big Data project. For them this involves processing approx. 1 million records. That is Big for them. For others that is tiny.

Oracle has recently put together a series of articles that talk about what architectural changes are needed to your technical infrastructure to support Big Data. In this case it is more about the volume of data rather than different types of data. Although this is covered by the architecture that Oracle gives.

As part of the Oracle Enterprise Architecture section of the Oracle website, they have put together a series of articles on how you can include Big Data within your Enterprise Information Architecture.

These are a good read and a great place to get a better understanding of what you need to be considering as you move to an architecture that includes Big Data.

NewImage

Wednesday, July 1, 2015

Extending vmdk Size for VirtualBox VM

Recently I ran out of space on one of my Windows virtual machines. I needed to increase the size of the disk to allow me to install some new software. When creating the VM I had created the disks as VMDK. Yes I know now that is not the best format to use :-( VMDK disks/files do not allow you to dynamically change their size :-( So what can you do? Is it possible in any way? If so how? Well this is what I did (after a bit of research using the google and StackOverFlow.
  1. Make a copy of the vmdk file on the OS. Just in case anything happens! (always have a backup)
  2. Clone the vmdk disk file into vdi format. To do this you need to use the VBoxManage command/app to clone the file into a vdi formatted file. For example this is what I ran.

    VBoxManage clonehd "Win7-11.2.0.3-ORE-03-Jan-14-disk1.vmdk" "cloned.vdi" --format vdi

  3. There was some suggestions that you could then clone the vdi file back into vmdk format. This did not work for me. It kept on giving me errors when the cloning process was nearly finished. After a bit of time researching this I wasn't able to find a solution to fix this. Instead I did the following
  4. Replace the vodka disk/file with the vdi disk/file in my VirtualBox VM. Open VirtualBox, select the VM and then click on the Storage section. In this I was able to add the new vdi disk/file and then removed the old vodka disk/file.

    NewImage

  5. Start up the VM. The VM starts up as normal and everything works OK.
  6. To allocate and make the extract space useable you need to allocated the new space to the c:\ drive. To do this I did the following:
  7. Click on Start Button, then right click on Computer and select Manage from the drop down menu.
  8. In the Computer Management console select Disk Management. A screen something like the following appears shows the amount of allocated and unallocated disk.

    NewImage

  9. Right click on the area for the c:\ drive and select Extend Volume from the drop down menu.
  10. Select all the defaults are you go through the Wizard to Extend the Volume. When you are finished the c:\ drive will be extended with all the extra space, as shown below.

    NewImage

  11. All done. You now have a larger disk/drive for your Windows VM.

Tuesday, June 23, 2015

Oracle Magazine - March/April 2001

The headline articles of Oracle Magazine for March/April 2001 were on using Oracle 9i Application server to deliver e-business and web based applications. There was some case studies of companies using this technologies including Tantalus Communications, Digital River Commerce System, Tomatoland.com and Oracle themselves.


OM 2001 March April

Other articles included:

  • Tom Kyte's column looked at tips on automation, cleanup and database maintenance. Some of the details included index rebuilds, indexing interMedia files, killing and cleaning up sessions, how to specify the column at runtime in an order by, and how to use DBMS_JOB for database maintenance.
  • Oracle announces the release of PORTAL.ORACLE.COM and MY.ORACLE.COM.
  • Fre Sansmark has an article on Database Benchmarking and discusses what it means and how well they address real-world performance questions.
  • Understanding XML Standards gives a brief introduction to what XML is about, explains the three layers of XML Grammar, XML based Protocols and XML Vocabularies. .
  • Part 3 of 'Exploring Oracle Text Basics' looks at text searching and comparisons, creating, indexing and loading data.
  • Creating Updatable Views explores the various requirements for creating an View that can be used to update data that is based on a single table or based on the joining of multiple tables..
  • Linking to Remote Databases explores the basics of Database Links and that the DBA needs to know to setup and manages these..
  • Steven Feuerstein's article looks at Advanced Native Dynamic SQL and the use of bind variables and their limitations.

To view the cover page and the table of contents click on the image at the top of this post or click here.

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

Monday, June 8, 2015

Blogger vs WordPress Evaluation

I have my blog (come website) on the go for a few years now. It all started out as a notebook really for myself and a way of recording certain things so that I could find them easily later.
Over time it has developed into some more that that and it now covers a number of technical how-to type of articles and some of my Oracle User Group (and ACE Director) activities.
It all started out on Blogger that is provided by Google. I've found Blogger really easy to use and to configure into getting something that is semi professional looking (at least it does to me).
All of this happened before WordPress really got going. I've had a look at WordPress from time to time over the years but never really invested enough time into it. Despite, what feels like everyone, saying that WordPress was the bees knees.
About 6 weeks ago I decided to put some solid time into investigating WordPress (about time some of you might say). The challenge for me was how easy would it be to replicate what I had on my blogger hosted website onto WordPress. I have to say it was easy enough.
Now I have 2 blogs / website running in parallel. Currently my blog/website is hosted on Blogger can it can be accessed via my custom domain of www.oralytics.com. The WordPress blog/website can be accessed at oralytics.wordpress.com
As you will see they are very similar to each other. Yes there are some cosmetic differences but all the content is the same.
What I'm going to do is to run these two environments in parallel for the next (maybe) 6 to 8 weeks. At some point I will switch over my domain name (www.oralytics.com) to point at the WordPress site. And then I might switch it back.
The big question for me to answer is to which one of these 2 environments will become my main site.
Blogger is FREE.
WordPress is FREE, well it is if I use the WordPress.com site.
But if I use the WordPress.com site then I get this annoying banner at the bottom of the browser window that does some advertising for WordPress and the Theme that I'm using.
The alternative to this where I do not get any of these adverts appearing is to pay for Hosting and to pay for a theme. Over a 3 year period that comes out at about $300+. The cheapest WordPress hosting that I could find, at the moment, is with GoDaddy.
What do you think I should do?
1. Stay with what I have on Blogger (www.oralytics.com) (Free)
2. Switch to the Free WordPress.com option (oralytics.wordpress.com) (Free)
3. Buy WordPress hosting and pay for a theme. (Costs $)
When I put out my original enquiry a few weeks ago lots of people came back with really good advice. In particular I wanted to mention Jeff Smyth, Tim Hall and Martin Widlake for the advice, help and suggestions, which I think lots of others found very useful.

Wednesday, June 3, 2015

PMML in Oracle Data Mining

PMML (Predictive Model Markup Langauge) is an XML formatted output that defines the core elements and settings for your Predictive Models. This XML formatted output can be used to migrate your models from one data mining or predictive modelling tool to another data mining or predictive modelling tool, such as Oracle.

Using PMML to migrate your models from one tool to another allows for you to use the most appropriate tools for developing your models and then allows them to be imported into another tool that will be used for deploying your predictive models in batch or real-time mode. In particular the ability to use your Predictive Model within your everyday applications enables you to work in the area of Automatic or Prescriptive Analytics. Oracle Data Mining and the Oracle Database are ideal or even the best possible tools to allow for Automatic and Prescriptive Analytics for your transa

PMML is an XML based standard specified by the Data Mining Group

Oracle Data Mining supports the importing of PMML models that are compliant with version 3.1 of the standard and for Regression Models only. The regression models can be for linear regression or binary logistic regression.

The Data Mining Group Archive webpage have a number of sample PMML files for you to download and then to load into your Oracle database.

To Load the PMML file into your Oracle Database you can use the DBMS_DATA_MINING.IMPORT_MODEL function. I’ve given examples of how you can use this function to import an Oracle Data Mining model that was exported using the EXPORT_MODEL function.

The syntax of the IMPORT_MODEL function when importing a PMML file is the following

DBMS_DATA_MINING.IMPORT_MODEL (
      model_name        IN  VARCHAR2,
      pmmldoc           IN  XMLTYPE
      strict_check      IN  BOOLEAN DEFAULT FALSE);

The following example shows how you can load the version 3.1 Logistic Regression PMML file from the Data Mining Group archive webpage

NewImage

 

BEGIN    
   dbms_data_mining.IMPORT_MODEL (‘PMML_MODEL',
        XMLType (bfilename (‘IMPORT_DIR', 'sas_3.1_iris_logistic_reg.xml'),
          nls_charset_id ('AL32UTF8')
        ));
END;

 

This example uses the default value for STRICT_CHECK as FALASE. In this case if there are any errors in the PMML structure then these will be ignored and the imported model may contain “features” that may make it perform in a slightly odd manner.

Wednesday, May 27, 2015

R (ROracle) and Oracle DATE formats

 When you comes to working with R to access and process your data there are a number of little features and behaviours you need to look out for.

One of these is the DATE datatype.

The main issue that you have to look for is the TIMEZONE conversion that happens then you extract the data from the database into your R environment.

There is a datatype conversions from the Oracle DATE into the POSIXct format. The POSIXct datatype also includes the timezone. But the Oracle DATE datatype does not have a Timezone part of it.

When you look into this a bit more you will see that the main issue is what Timezone your R session has. By default your R session will inherit the OS session timezone. For me here in Ireland we have the time timezone as the UK. You would time that the timezone would therefore be GMT. But this is not the case. What we have for timezone is BST (or British Standard Time) and this takes into account the day light savings time. So on the 26th May, BST is one hour ahead of GMT.

OK. Let's have a look at a sample scenario.

The Problem

As mentioned above, when I select date of type DATE from Oracle into R, using ROracle, I end up getting a different date value than what was in the database. Similarly when I process and store the data.

The following outlines the data setup and some of the R code that was used to generate the issue/problem.

Data Set-up
Create a table that contains a DATE field and insert some records.

CREATE TABLE STAFF
    (STAFF_NUMBER VARCHAR2(20),
       FIRST_NAME VARCHAR2(20),
       SURNAME VARCHAR2(20),
       DOB DATE,
       PROG_CODE VARCHAR2(6 BYTE),
       PRIMARY KEY (STAFF_NUMBER));

insert into staff values (123456789, 'Brendan', 'Tierney', to_date('01/06/1975', 'DD/MM/YYYY'), 'DEPT_1');
insert into staff values (234567890, 'Sean', 'Reilly', to_date('21/10/1980', 'DD/MM/YYYY'), 'DEPT_2');
insert into staff values (345678901, 'John', 'Smith', to_date('12/03/1973', 'DD/MM/YYYY'), 'DEPT_3');
insert into staff values (456789012, 'Barry', 'Connolly', to_date('25/01/1970', 'DD/MM/YYYY'), 'DEPT_4');


You can query this data in SQL without any problems. As you can see there is no timezone element to these dates.

Selecting the data
I now establish my connection to my schema in my 12c database using ROracle. I won't bore you with the details here of how to do it but check out point 3 on this post for some details.

When I select the data I get the following.

> res<-dbsendquery br="" con="" from="" select="" staff="">> data <- br="" fetch="" res="">> data$DOB
[1] "1975-06-01 01:00:00 BST" "1980-10-21 01:00:00 BST" "1973-03-12 00:00:00 BST"
[4] "1970-01-25 01:00:00 BST"


As you can see two things have happened to my date data when it has been extracted from Oracle. Firstly it has assigned a timezone to the data, even though there was no timezone part of the original data. Secondly it has performed some sort of timezone conversion to from GMT to BST. The difference between GMT and BTS is the day light savings time. Hence the 01:00:00 being added to the time element that was extract. This time should have been 00:00:00. You can see we have a mixture of times!

So there appears to be some difference between the R date or timezone to what is being used in Oracle.

To add to this problem I was playing around with some dates and different records. I kept on getting this scenario but I also got the following, where we have a mixture of GMT and BST times and timezones. I'm not sure why we would get this mixture.

> data$DOB
[1] "1995-01-19 00:00:00 GMT" "1965-06-20 01:00:00 BST" "1973-10-20 01:00:00 BST"
[4] "2000-12-28 00:00:00 GMT"


This is all a bit confusing and annoying. So let us look at how you can now fix this.

The Solution
Fixing the problem : Setting Session variablesWhat you have to do to fix this and to ensure that there is consistency between that is in Oracle and what is read out and converted into R (POSIXct) format, you need to define two R session variables. These session variables are used to ensure the consistency in the date and time conversions.

These session variables are TZ for the R session timezone setting and Oracle ORA_SDTZ setting for specifying the timezone to be used for your Oracle connections.

The trick there is that these session variables need to be set before you create your ROracle connection. The following is the R code to set these session variables.

> Sys.setenv(TZ = "GMT")
> Sys.setenv(ORA_SDTZ = "GMT")

So you really need to have some knowledge of what kind of Dates you are working with in the database and if a timezone if part of it or is important. Alternatively you could set the above variables to UDT.

Selecting the data (correctly this time)
Now when we select our data from our table in our schema we now get the following, after reconnecting or creating a new connection to your Oracle schema.

> data$DOB
[1] "1975-06-01 GMT" "1980-10-21 GMT" "1973-03-12 GMT" "1970-01-25 GMT"
Now you can see we do not have any time element to the dates and this is correct in this example. So all is good.

We can now update the data and do whatever processing we want with the data in our R script.

But what happens when we save the data back to our Oracle schema. In the following R code we will add 2 days to the DOB attribute and then create a new table in our schema to save the updated data.

> data$DOB
[1] "1975-06-01 GMT" "1980-10-21 GMT" "1973-03-12 GMT" "1970-01-25 GMT"

> data$DOB <- br="" data="" days="">> data$DOB
[1] "1975-06-03 GMT" "1980-10-23 GMT" "1973-03-14 GMT" "1970-01-27 GMT"


> dbWriteTable(con, "STAFF_2", data, overwrite = TRUE, row.names = FALSE)
[1] TRUE


I've used the R package Libridate to do the date and time processing.

When we look at this newly created table in our Oracle schema we will see that we don't have DATA datatype for DOB, but instead it is created using a TIMESTAMP data type.







If you are working with TIMESTAMP etc type of data types (i.e. data types that have a timezone element that is part of it) then that is a slightly different problem. Perhaps one that I'll look at soonish.