Brendan Tierney - Oralytics Blog: Oracle

Showing posts with label Oracle. Show all posts

Monday, July 28, 2014

BUCKET_WIDTH: Calculating the size of the bucket

Some time ago I had some blog posts introducing some of the basic Statistical function available in Oracle. Here are the links to these.

The first blog post in the series looked at the DBMS_STAT_FUNCS PL/SQL package, what it can be used for and I give some sample code on how to use it in your data science projects. I also give some sample code that I typically run to gather some additional stats.
The second blog post looks at some of the other statistical functions that exist in SQL that you will/may use regularly in your data science projects.
The third blog post provides a summary of the other statistical functions that exist in the database.

Most people do not realise that Oracle has over 250+ statistical functions that are available (no addition cost) in all the database versions.

I've had a query about one of the functions BUCKET_WIDTH. The question was wondering if it was possible to get the width of the bucket in each case. There does not seem to be a build in feature to get this value, so we have to calculate this ourselves.

Here is an example of how to calculate the bucket width, as on the example I used in my previous blog post.

SELECT bucket, max(age)-min(age) BUCKET_WIDTH, count(*)

FROM (SELECT cust_id,

age,

width_bucket(age,

(SELECT min(age) from mining_data_build_v),

(select max(age)+1 from mining_data_build_v),

10) bucket

FROM mining_data_build_v

GROUP BY cust_id, age )

GROUP BY bucket

ORDER BY bucket;

What this query gives is an approximate value of the size of the Bucket Width based on the values/records that are in a bucket. The actual values used cannot be determined exactly as there is not function/value in SQL that tells us the actual value.

Thursday, July 17, 2014

OTN Latin America (North) Tour 2014

For a few years now I (and I'm sure you have too) have heard about and followed the various Oracle User Group tours that OTN arranges/facilitates. A tour consists of a number of Oracle User Groups in a region coordinating together to have their conferences organised so that they can get speakers from across the world to come and present.

For most presenters it involves lots of travel. So instead of them doing all that travelling to present at one conference, they can now extend their travels a little and present in a number of countries. Most of the speakers are Oracle ACE Directors and OTN is very generous with their support in that they pay for all the flights, transportation and hotels. Without the generous support of OTN these tours and perhaps many of the conference would not take place.

With envy I used to follow the various speakers on tweeter as they talked about their travels from country to country and their experiences of meeting the people and exploring the various countries. Yes their time in each country seemed to be limited but they always got to see and do so much.

Earlier this year there was an call for presentations for the various OTN Tours in 2014. I submitted 3 presentations that coverd Oracle Advanced Analytics Option (Oracle Data Mining and Oracle R Enterprise). I thought I didn't stand a chance given the speakers that have participated in previous years.

A couple of weeks ago I received an email saying that I had been accepted onto the OTN Latin America (North) Tour. So you can imagine my excitement. The full OTN Tour North leg covers a number of countries across central and south America and is over a 2 week period. Unfortunately I'm not able to be away for that long, so I was accepted for the conferences on the first week of the tour. This will include Panama, Costa Rica and Mexico :-)

Some of you might think this is a bit of a golly and a holiday. What I've discovered over the past week or more is that it will be far from that. There is a lot of work in preparing the presentations, giving the presentation, setting up live demos between presentation, various meetings with people at the conferences etc etc etc. Then there is all the travel, all the airports, all the airport transfers, all the overnights in hotels. Over the course of 7 days I will be staying 6 different hotels.

I have spent the last week just trying to arrange my flights and hotels. This also involved trying to coordinate with other speakers so that we can travel together as much as possible.

Here are the dates and the presentations that I will be giving at these conferences:

4th August : Panama (in Panama City)

10:00-11:00 : Getting Started with Oracle Data Miner & Predictive Analytics

11:00-12:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

13:00-13:40 : Sentiment Analysis Using Oracle Data Mining

6th August : Costa Rica (in San Carlos)

10:00-11:00 : Getting Started with Oracle Data Miner & Predictive Analytics

13:00-14:00 : Sentiment Analysis Using Oracle Data Mining

16:00-17:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

8th August : Mexico (in Mexico City)

14:00-15:00 : Getting Started with Oracle Data Miner & Predictive Analytics

15:00-16:00 : Combining the Power of R and in-Datbaase Data Mining. Running R in the Database. Seriously!

When the agenda for the conferences are available I will have another blog post with their details.

If you are at one of these conference do please say hello :-)

I've finally booked all my flights and hotels. Many thanks to my fellow ACE Director presenters for your research and sharing of travel plans. It looks like there will be a groups of us all travelling together.

Now the next challenge is to prepare the presentations and live demos (yes live demos).

I hope to blog about each of the conferences and my travels to/from each country. It really depends on what time I will have and access to the internet. Perhaps this is something I will try to do on my various plane flights or waiting at the airports. So watch out for these :-)

Updated with some stats on my travels

My travel plans for the OTN Latin America tour of user group conferences involves

12,200 flying miles,
29.75 of flying time,
way too many hours hanging around in airports
over 8 days
staying in 6 hotels
plus 1 over night flight,
giving 8 hours of presentations in 3 countries

Why do we do this? Because we love sharing with the Oracle User Groups around the world. I'm only doing 1 week of the tour. Some people are doing 2 weeks :-(

Monday, June 23, 2014

Oracle Magazine September/October 2000

The headline articles of Oracle Magazine for September/October 2000 were on e-Business Integration, including online healthy prescription for online retailing, streamlineing the pulp and fiber industries, and the health care industry. Plus there was lots and lots of articles and news items all on businesses delivering solutions via the internet.

As this was the Oracle Open World edition (and you see the label on the cover saying Biggest Ever) you can imagine there was a LOT of advertisements and sponsored articles. The following of other articles below will not cover these and will only look at the main content articles.

Other articles included:

Tom Kyte's article is on Tips for Migrating, Indexing and Using Packaged Procedures. In his article he gives some tips for migrating to Oracle 8.1i. He also discusses some scenarios around creating (or not) indexes on foreign keys. He also looks at the scenario of compiling linked procedures and how the use of packages avoids the identified issues.
Do you remember the Internet File System. There was an article that gave an overview of this that was available in Oracle 8i and was capable of managing over 150 different file types.
Autodesk releaseed OnSite, an enterprise solution for bringing design and location based information to the point of work via mobile devices. Autodesk On Site used Oracle 8i Lite and the Palm OS platform to provide an interactive, two way communication environment between the mobile worker and the overall decision support system.
The Oracle Academic Initiative began in 1997. In 2000 Oracle donated software licences, support services and Oracle training material to 17 educational institutions valued at $60 million
There was page after page, after page of announcements and news from various Oracle Partners.
Douglas Scherer gives the first part of an article that looks at how you can use Oracle 8i interMedia for managing and deploying content rich data on the internet.
Managing Your Resources looks at some of the new Oracle 8i EE helps DBAs to define plan, assign users to groups and prioritise resource allocations.
With the release of Oracle 8.1.6 came the new Statspack. Connie Dialeris and Graham Wood give an overview of the main features of Statspack, providing some guidance on how to use it in a proactive manner and gives a step-by-step guide to how you can trouble shoot performance problems with Statspack.
The final article was on Oracle Warehouse Builder (OWB). This was an overview type of article and gave an overview of the main components and gives some guidelines for setting up some different types of integration.

To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

Sunday, March 30, 2014

Gartner 2014 Advanced Analytics Quadrant

The Gartner 2014 Advanced Analytics Quadrant is out now. Well it is if you can find it.

Some of the companies have put it up on their websites to promote their position.

For some reason Oracle hasn't and I wonder why?

You can see that some typical technologies are missing from this, but this is to be expected. How much are companies really deploying these alternatives on real problems and in production. Perhaps the positioning of Revolution Analysis might be an indicator. At some point there might be a shift from investigative analysis into more main stream projects and then into production.

What is still evident from this years quadrant is that SAS and IBM (SPSS) still have very dominant positions and perhaps will have for some time to come.

It will be interesting how this will all play out over the next few years.

Sunday, September 22, 2013

Oracle Magazine review–May/June 2000

The headline articles of Oracle Magazine for May/June 2000 were on the evolution of organisations into adopting a e-business model. It included 8 steps on evolving to e-business, how to use the Oracle Internet Platform, Web Portal and Java technology.

Other articles included:

Tom Kyte has an article on Oracle Availability Options and explains when to implement Oracle Parallel Server or replication or a standby database.
There is a new release of Oracle Discover 3i and Oracle Reports 6i that support XML and are part of the Oracle Intelligent WebHouse initiative.
Oracle licenced the mobile moddleware developed by Nettech System, to support Oracle’s steps into this field.
There is an overview of the IOUG-A Live! 2000 conference which is being held between 7-11 May in the Anaheim Convention Centre. Over 4,000 attendees are expected.
Kelli Wiseth gives and overview of Java 2, explaining the differences between J2SE and J2EE. The article also discusses how Java is part of the Oracle Internet Platform.
Steven Feuerstein gives the second part of his article on using Java Classes and Objects in the Oracle 8i database.
Richard Niemiec has an articles on Fundamental Tuning Goals and details the followings:

Allocate the right amount of memory for the Oracle instance.
Keep the right data in memory.
Find problem queries.

Kevin Loney had an article on how to protect your database from security threats. These included:

Guard your backups and development environments
Know your default user and applications accounts
Control the distribution of database names and locations
Use auditing effectively
Make password changes mandatory yet simple
Isolate your production database

Venkat Devraj talks discuss six storage tips for 24x7 availability

Know and understand RAID options
Choose your disk-array size with caution
Do not use read ahead caches for online transaction processing applications
Do not reply on write caches to eliminate I/O hot spots
Consider using multilevel RAIDS
Ensure that your stripe sizes are consistent with your OS and database block sizes

Friday, August 30, 2013

Oracle Magazine-March/April 2000

The headline articles of Oracle Magazine for March/April 2000 were focused on e-business. There was articles covering the typical issues in setting up an e-business, the technical environment and some reports from organisation who have used the Oracle tools.

Other articles included:

Oracle releases their Oracle XML Developer’s Kit (Oracle XDK), with support for a variety of programming languages. It included XML Parsers for Java, C, C++ and PL/SQL. XSL Processor, XML Class Generator, and XML Transviewer Java Beans.
Oracle 8i Lite for the Palm Computing and Psion EPOC operating systems is available.
Oracle acquires Carleton, who were innovators of data quality and mainframe data extraction software for customer focused data warehousing applications.
Oracle releases Oracle Fail Safe 3.0 which was used to protect Microsoft Windows NT applications and databases, and supported Oracle 7, 8 and 8i, Oracle Developer Server 6.0 Forms and Reports Servers, Oracle Application Server 4.0 and Microsoft Internet Information Server 4.0.
Steven Feuerstein has an article about getting started with Calling Java from PL/SQL and gives a simple example to illustrate how to do this. The necessary system privileges included JAVASYSPRIV for the DBA and JAVAUSERPRIV for those schemas who want to call the Java code
Graham Wood and Connie Dialeris give an overview of Statspack that was was released with Oracle 8.1.6. The article covered the various features, how to install it and how to configure the Snapshot Level & SQL Thresholds. The article also gave an example of how to use DBMS_JOB to automate the collecion of the statistics.
A Step-by-Step guide on how to use RMAN (that most of use know and love!), including the RMAN architecture, how to setup a backup, starting a backup and the all important step of recovering a backup.
With Oracle 7 came the ability to Clone a database. In this article it goes through the steps required to setup and clone a production database.

Thursday, July 25, 2013

12c New Data Mining functions

With the release of Oracle 12c we get new functions/procedures and some updated ones for Oracle Data Miner that is part of the Advanced Analytics option.

The following are the new functions/procedures and the functions/procedures that have been updated in 12c, with a link to the 12c Documentation that explains what they do.

CLUSTER_DETAILS is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns an XML string that describes the predicted cluster or a specified cluster.
CLUSTER_DISTANCE is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns the raw distance between each row and the centroid of either the predicted cluster or a specified.
CLUSTER_ID has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering.
CLUSTER_PROBABILITY has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.
CLUSTER_SET has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE
FEATURE_DETAILS is a new function that predicts feature matches for each row. It can use a pre-defined feature extraction model or perform dynamic feature extraction. The function returns an XML string that describes the predicted feature or a specified feature.
FEATURE_ID has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction.
FEATURE_SET has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.
FEATURE_VALUE has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.
PREDICTION_BOUNDS now returns the upper and lower bounds of the prediction as the BINARY_DOUBLE data type. It previously returned these values as the NUMBER data type.
PREDICTION_COST has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned cost has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION_DETAILS has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.
PREDICTION_PROBABILITY has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION_SET has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.

Tuesday, July 23, 2013

Oracle Data Miner New Features (SQL Dev 4)

With the release of the new Oracle 12c database and SQL Developer 4 we have a range of Oracle Data Miner new features . Some of these are embedded into the database and are only available in 12c. Check out my previous blog post on these new features.

In this blog post I will look at the new Oracle Data Miner features that come with the ODM tool in SQL Developer4.

The new features of the Oracle Data Miner tool can be grouped into 2 categories. The first category contains the new features that are available to all user of the tool (11.2g and 12c). The second category contains the new features that are only available in 12c. The new features of each of these categories will be explained below.

Category 1 – Common new features for 11.2g and 12c Database users

There is a new View Data feature that allows you to drill down to view the customer object and to view nested tables.

A new Graph Node that allows you to create graphs such as line, bar, scatter and boxplots for data at any stage of a workflow. You can specify any of the attributes from the data source for the graphs. You don’t seem to be limited to the number of graphs you can create.

A new SQL Node. This is welcome addition, as there has been many times that I’ve need to write some SQL or PL/SQL to do a specific piece of processing on the data that was not available with the other nodes. There are 2 important elements to this SQL node really. The first is that you can write SQL and PL/SQL code to do whatever processing you want to do. But you can only do it on the Data node you are connected to.

The second is that you can use it to call some ORE code. This allows you to use the power of R and extensive range of packages that are available to expand the analytic functionality that is available in the database. If there is some particular function that you cannot do in Oracle and it is available in R, you can now embed this function/code as an ORE object in the database. You can then called using SQL.

WARNING: this particular feature will only work if you have ORE installed on your 11.2.0.3g or 12.1c database

New Model Build Node features, include node level text specifications for text transformations, displays the heuristic rules responsible for excluding predictor columns and being able to control the amount of classification and regression test results that are generated. I’ll be covering these in later blog posts.

New Workflow SQL Script Deployment features. Up to now the workflow SQL script, I found to be of limited use. The development team have put a lot of work into generating a proper script that can be used by developers and DBA. But there are some limitations still. You can use the script will run the workflow automatically in the database without having the use the ODM tool. But it can only be run the in the schema that the workflow was generated. You will still have to do a lot of coding (although a lot less than you used to) to get your ODM models and workflows to run in another schema or database.

This will output the script to a file buried deep somewhere inside you SQL Developer directory. Unfortunately in the EA1 release, the size of this location field is small and scrolling has not been enabled. So you cannot (currently) scroll to the end of the field to see the actual location. You can edit this location to have a different shorter location.

Maybe this will be fixed for the official release.

Category 2 – New features for 12c Database users.

Now for the new features that are only visible when you are running ODM / SQL Dev 4 against a 12c database. No configuration changes are needed. The ODM tool checks to see what version of the database you are logging into. It will then present the available features based on the version of the database.

New Predictive Query nodes allows you to build a node based on the new non-transient feature in 12c called Predictive Queries (PQs). In SQL Developer we get 3 addition types of Predictive Queries. These can be used for Anomaly Detection, Clustering and Feature Extraction

It is important to remember that underlying model produced by these PQs to not exist in the database after the query has executed. The model is created, used on the data and then the model deleted.

The Clustering node has the new algorithm Expectation Maximization in addition to the existing algorithms of K-Means and O-Cluster.

The Feature Extraction node has the new algorithm called Principal Component Analysis in addition to the existing Non-Negative Matrix Factorization algorithm.

Text Transformations are now built into the model build nodes. These text transformations will be part of the Automatic Data Processing steps for the model build nodes. This is illustrated in the above images.

The Generalized Linear Model that is part of the Classification Node has a Feature Selection option in the Algorithm Settings. The default setting is Ridge Regression. Now there is an additional option of using Feature Selection.

Prediction Result Explanations gives the scoring details used to to explain why the prediction was made.

Look out for blog post on each of these new features.

Friday, July 19, 2013

Oracle 12c Books

Oracle 12c is only a few days old and there are books available on Amazon. The carousels below list some of the books available on Amazon.com and Amazon.co.uk

Amazon.com Widgets

Amazon.co.uk Widgets

Thursday, July 18, 2013

Upgrading your ODM Repository for SQL Dev 4

For those users of Oracle Data Miner (ODM) that is part of SQL Developer, now that Oracle have finally released SQL Developer 4, you might want to upgrade to this new release. There are a lot of new features. Some of these are available for 11.2g and 12.1c databases and some are only available for 12.1c users.

I will have another blog post soon on the new Oracle Data Miner (ODM) features that are available in SQL Developer 4.

The instructions given below are what I did to upgrade so that I could use the new ODM tool/SQL Developer 4.

Step 1 – Install SQL Developer 4 : I have another blog post on what this involves, so check it out and complete the steps before you continue with the result of the steps below.

Step 2 – Make ODM Visible : After SQL Developer 4 opens you should see all your migrated connections. To make ODM visible you need to click on the Tools menu, select Oracle Data Miner and then Make Visible. This will open a number of tabs on the left hand side of SQL Developer. These will include Data Miner (connections), Workflow Structure and Workflow Jobs.

Step 3 – Open an ODM Connection : Take one your ODM connections and double click on it. SQL Developer 4 / ODM will check what versions of the ODM repository exists in your database. If this is your first time connecting from SQL Developer 4, you will be told that you will need to upgrade your repository

Step 4 – Upgrade the ODM Repository : Select the Yes button on the Upgrade Repository window. You will then be asked for the SYS password. If you do not have access to this you can talk nicely to your DBA and ask them to enter the password for you.

You may or may not get a warning message like the following. Just click OK to continue.

Step 5 – Start the Repository Upgrade : When the Migrate Data Miner Repository window opens, just click the Start button.

This might be a good time to go off an make yourself a coffee. The upgrade process tool approx. 8 minutes on my laptop. If you were running this on a server located somewhere then the script will take a little bit longer to run!

The progress bar will let you know how things are progressing. It also gives some messages to let you known at what stage of the process it is at.

Step 6 – All finished : When the Repository Migration has finished you will get a window with a message saying Task Successfully Complete. Click on the Close button to close this window.

Step 7 – Open an Existing Workflow : Just to make sure that everything has worked with the install and ODM Repository migration, open one of your existing workflows. If it opens then everything should be OK.

When you open the workflow, the new Workflow Editor tab opens on the right hand side of SQL Developer. This seems to have replaced the Component Palette we had with the pervious version of the ODM tool. Expand the headings under the Workflow Editor to see the different nodes that are available. Most of these are the same but we have 2 new nodes under the Data section. These are Graph and SQL Query. I’ll have more on these in another post or posts.

Wednesday, July 17, 2013

Auto-Starting your pluggables in 12c

After installing 12c you get your container database and a pluggable. But the problem that most people have is that when they restart their server or in my case my VMs the container database gets started but the pluggable database does not automatically start. This means that you have to manually go in an start it. But this is a pain. Surely there is an easy way to get your pluggable databases to start. You would have though that Oracle would have some easy way of doing this. If there is, I haven’t found it yet.

But I have come across how to automatically start your 12c pluggable databases, using a trigger.

CREATE or REPLACE trigger OPEN_ALL_PLUGGABLES
   after startup
   on database
BEGIN
   execute immediate 'alter pluggable database all open';
END open_all_pdbs;

Let us test this out. I’ve started my VirtualBox VM that has 12c installed on Windows 7. Here is the code that I ran to verify that the container has been started and the pluggable is in MOUNTED mode.

C:\Users\oracle>sqlplus / as sysdba

SQL*Plus: Release 12.1.0.1.0 Production on Wed Jul 17 15:27:35 2013

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing opt
ions

SQL> select name,DB_UNIQUE_NAME from v$database;

NAME DB_UNIQUE_NAME
--------- ------------------------------
ORCL orcl

SQL> SELECT v.name, v.open_mode, NVL(v.restricted, 'n/a') "RESTRICTED", d.status

2 FROM v$pdbs v, dba_pdbs d
3 WHERE v.guid = d.guid
4 ORDER BY v.create_scn;

NAME                           OPEN_MODE RES STATUS
------------------------------ ---------- --- -------------
PDB$SEED                       READ ONLY NO NORMAL
PDB12C                         MOUNTED    n/a NORMAL

SQL>

Next we will create the procedure (given above).

To test the automatic starting of the pluggables, we need to shut down the container database, by issuing the shutdown command.

SQL> shutdown
Database closed.
Database dismounted.
ORACLE instance shut down.

SQL> select name,DB_UNIQUE_NAME from v$database;
select name,DB_UNIQUE_NAME from v$database
*
ERROR at line 1:
ORA-01034: ORACLE not available
Process ID: 0
Session ID: 0 Serial number: 0

This shows us that the container database is shutdown.

Now we can start the container and test to see if the pluggable database is started automatically by the trigger.

SQL> startup
ORACLE instance started.

Total System Global Area 855982080 bytes
Fixed Size                  2408408 bytes
Variable Size             562036776 bytes
Database Buffers          285212672 bytes
Redo Buffers                6324224 bytes
Database mounted.
Database opened.
SQL>

SQL> select name,DB_UNIQUE_NAME from v$database;

NAME DB_UNIQUE_NAME
--------- ------------------------------
ORCL orcl

SQL> select status from v$instance;

STATUS
------------
OPEN

SQL> SELECT v.name, v.open_mode, NVL(v.restricted, 'n/a') "RESTRICTED", d.status

2 FROM v$pdbs v, dba_pdbs d
3 WHERE v.guid = d.guid
4 ORDER BY v.create_scn;

NAME                           OPEN_MODE RES STATUS
------------------------------ ---------- --- -------------
PDB$SEED                       READ ONLY NO NORMAL
PDB12C                         READ WRITE NO NORMAL

SQL>

We can see that the pluggable was started.

Monday, July 15, 2013

Installing Oracle 12c on Windows 7 64bit

Here are the steps I when through to install Oracle 12.1c on Windows 7 64 bit.

Unzip the two 12c downloads files into the same directory. I called this directory database

Go down a couple of levels in the database directory until you come to the directory that contains setup.exe. Double click on this to start the installer.
Step 1 – Configure Security Updates: Un-tick the tick-box and click the Next button. A warning message will appear. You can click on the Yes button to proceed.
Step 2 – Software Update : select the Skip Software Updates option and then click the Next button.
Step 3 – Installation Option : select the Create and Configure a Database option and then click the Next button.
Step 4 – System Class: Select the Server Class option and then click the Next button
Step 5 – Grid Installation Options : Select the Single Instance Database Installation option and then click the next button.
Step 6 – Install Types : Select the Typical install option and then click the Next button.
Step 7 - Installation Location : Select the Use Windows Built-in Account option and then click the Next button. An warning message appears. Click the Yes button.
Step 8 – Typical Installation. Set Global Database Name to cdb12c for the container database name. Set the Administrative password for the container database. Set the name of the pluggable database that will be created. Set this to pdb12c. Or you can accept the default names. Then click the Next button. If you get a warning message saying the password does not conform to the recommended standards, you can click the Yes button to ignore this warning and proceed.
Step 9 – Prerequisite Checks : the install will check to see that you have enough space and necessary permissions etc.
Step 10 – Summary – You should now be ready to start the install. Click the Install button.

You can now sit back, relax and watch the installation of 12.1c complete.

You may get some Windows Security Alert windows pop up. Just click on the Allow Access button.

Then the Database Configuration Assistant will start. This step might take a while to complete.

When everything is done you will get something like the following

Now you are almost ready to start using your Pluggable 12c database on windows. The final two steps that you need to do is to add an entry to your tnsnames.ora file. You can manually do this if you know what you are doing or you can select Net Configuration Assistant under the Oracle –Ora12cDB Home 1 section of the windows menu. The second thing you need to do is to create a new user/schema.

Check out my previous blog post called ‘My first steps with 12c’ for how to do these last two steps. The ‘My fist steps with 12c’ post was based on installing 12c on Linux 6.

Monday, July 8, 2013

12c Roundup so far and Events

I’m on vacation at the moment. As a result I’ve missed all the 12c launch and excitement that goes with it. I’ve managed to get a few minutes to put this post together. The aim of this post is to list some interesting blog posts (by other people over the past few days). I intend to expand the list when I get time.

I also wanted to highlight two 12c launch events. The first of these is the official Oracle 12c webcast. It is on Wednesday 10th July. Click on the following image to register etc. The webcast will have Mark Hurd, Andy Mendelsohn and Tom Kyte.

The second 12c launch event will be hosted by Oracle in Ireland. This will be on the 5th September in the Gibson Hotel (Dublin) between 13:00 and 17:30. I believe their might be some 12c goodies available for the attendees. Again click on the image below to register and to check out the agenda.

The following are some articles and blog posts that have been published since 12c has been launched. This is not a complete list or and indication of quality, but I’ve noted them for me to come back to after my vacation to read. You might have come across others. If so let me know and I will add them to the list.

12.1c Download page

12.1c Documentation page

12.1c New Features Guide

Oracle Advanced Analytics Option 12c and SQL Dev 4 new features

Oracle Database 12c: Oracle Multitenant Option

Oracle website for Multitenent

New DB12c feature involves invisibility

12c - SQL Text Expansion

Ever expanding SQL for 12c

Oracle 12c Magazine by @leight0nn in Flipboard

How long can you hold off on Oracle 12c

Oracle 12c Install articles by Tim Hall (oraclebase) on Linux5 and Linux6

Over the coming weeks (after my vacation) I will be posting some articles on the Advanced Analytics Option in 12c. There are a number of new features. Also when SQL Developer 4 comes out I will be including all the new functionality that is included in the updated ODM tool.

Wednesday, June 12, 2013

Part 3–Getting start with Statistics for Oracle Data Science projects

This is the Part 3 blog post on getting started with Statistics for Oracle Data Science projects.

The first blog post in the series looked at the DBMS_STAT_FUNCS PL/SQL package, what it can be used for and I give some sample code on how to use it in your data science projects. I also give some sample code that I typically run to gather some additional stats.
The second blog post will look at some of the other statistical functions that exist in SQL that you will/may use regularly in your data science projects.This is the second blog on getting started with Statistics for Oracle Data Science projects.
The third blog post will provide a summary of the other statistical functions that exist in the database.

The table below is a collection of most of the statistical functions in Oracle 11.2. The links in the table bring you to the relevant section of the Oracle documentation where you will find a description of each function, the syntax and some examples of each.

ABS	LENGTH2	REGR_AVGX
ACOS	LENGTH4	REGR_ACGY
Aggregrate functions	LENGTHB	REGR_COUNT
Analytic functions	LENGTHC	REGR_INTERCEPT
Arithmetic operators	LN	REGR_R2
ASIN	LNNVL	REGR_SLOPE
ATAN	LOG	REGR_SXX
ATAN2	LOWER	REGR_SXY
AVG	LPAD	REGR_SYY
CAST	LTRIM	ROLLUP clause
Comparison functions	MAX	ROUND
CONCAT	MEDIAN	SAMPLE
CORR	MIN	SIN
CORR_K	MOD	SINH
CORR_S	MODEL clause	SQRT
COS	NTH_VALUE	STATS_BINOMIAL_TEST
COSH	Numeric Functions	STATS_CROSSTAB
COUNT	PERCENT_RANK	STATS_F_TEST
COVAR_POP	PERCENTILE_CONT	STATS_KS_TEST
COVAR_SAMP	PERCENTILE_DISC	STATS_MODE
CUBE clause	Pivot operations	STATS_MW_TEST
CUME_DIST	POWER	STATS_ONE_WAY_ANOVA
CV	PREDICTION	STATS_T_TEST_INDEP
Data functions	PREDICTION_BOUNDS	STATS_T_TEST_INDEPU
DENSE_RANK	PREDICTION_COST	STATS_T_TEST_ONE
EXP	PREDICTION_PROBABILITY	STATS_T_TEST_PAIRED
FLOOR	PREDICTION_SET	STATS_WSR_TEST
GREATEST	PRESENTNNV	STDDEV
Grouping Sets	PRESENTNTV	STDEEV_POP
INTERSECT	Prior clause	STDDEV_SAMP
Interval arithmetic	PRIOR	SUM
INTERVAL	RANK	TAN
Julian dates	RAWTOHEX	TANH
LAG	REGEXP_COUNT	t-test
LAST	REGEXP_INSTR	VAR_POP
LEAD	REGEXP_LIKE	VAR_SAMP
LEAST	REGEXP_REPLACE	VARIANCE
LENGTH	REGEXP_SUBSTR	WIDTH_BUCKET

The list about may not be complete (I’m sure it is not), but it will cover most of what you will need to use in your Oracle projects.

If you come across or know of other useful statistical functions in Oracle let me know the details and I will update the table above to include them.

Thursday, May 16, 2013

Outputting your data using inbuilt SQL Dev formatting

Oracle has build a number of formatting options into SQL Developer to allow you to output your data in some standard formats. This removes the need to use other tools or to write extra code or performs various follow up steps.
All you need to do is to add a comment and use the Scrip button
SELECT /*csv*/ * FROM scott.emp;
SELECT /*xml*/ * FROM scott.emp;
SELECT /*html*/ * FROM scott.emp;
SELECT /*delimited*/ * FROM scott.emp;
SELECT /*insert*/ * FROM SCOTT.EMP;
SELECT /*loader*/ * FROM scott.emp;
SELECT /*fixed*/ * FROM scott.emp;
SELECT /*text*/ * FROM scott.emp;
Hint: for some of these it is best to list the schema and table name in upper case
These are comments and not hints so they will not work in SQL*Plus.

Wednesday, May 15, 2013

Review of Oracle Magazine-July/August 1999

The headline articles for the July/August 1999 edition of Oracle Magazine were focused on Business Intelligence and included topics on architectures, business plans, data integration, portals, dashboards, Oracle Express, data marts and data warehouses.

Other articles included:

15 Rules for Enterprise Portals

Gear it to casual users
Use intuitive classifications and searching
Allow access to a publish/subscribe engine
Enable universal connectivity to information resources
Provide dynamic access to information resources
Set up intelligent routing
Integrate a business intelligence toolset
Use a server based architecture
Build in distributed, multithreaded services
Enable flexible permission granting
Append external interfaces
Provide programmatic interfaces
Establish internet security
Make it cost effective to deploy
Ensure that it can be customized and personalized

Oracle Application Server release 4.0.8 was available for beta testing and includes support for Enterprise JavaBeans. Java Servlets, Java Server Pages and allows developers to build robust self service applications quickly
Oracle and MapInfo joined forces to release an internet-based spatial-data analysis solution to help organizations to understand and visualize data and to identify patterns and customer trends
Oracle makes available Oracle iTV platform, that is a solution that makes it possible for broadcast, cable and telecommunications providers to deliver interactive services .
Nine tips for using Oracle Discover included:

Us the decode statement
Implement summary redirection
create optional conditions (filters)
use query statistics
perform regular maintenance on the query statistics tables
familiarize yourself with the EUL tables
make regular backups
modify registry settings
delete objects with care

Standardizing your interfaces. The first of a three part article on creating interfaces to the database. This article focused on showing how to setup and use UTL_FILE for loading data into and getting data out of the database.
Creating a Virtual Private Database in Oracle 8i describes how to approach such a project to implement fine grained access control and gives the following steps for setting up a VPD

create the application context
create a package that sets the context
create the policy function
associate the policy function with a table or view

To view the cover page and the table of contents click on the image at the top of this post or click here.

My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions and a PDF for the very first Oracle Magazine from June 1987.

Thursday, April 25, 2013

Oracle Magazine-September/October 1999

The headline articles in the September/October 1999 edition of Oracle Magazine focused on how the Oracle technology can be used to educate staff and to keep their skills up to date. either on site or remote via on-demand training resources.

Other articles included:

Oracle announce that they have acquired Thinking Machine’s data mining business. This data mining product was called Darwin and is now called Oracle Data Mining. I will have a separate blog post for this announcement.
Oracle 8i Lite has shipped and comes with three component: Oracle Lite a single user (50K to 750K foot print), Web-to-Go allows users to access the same data and web applications both online and offline, iConnect that was a flexible architecture that enables reliable and scalable bi-directional synchronization of data and applications. Oracle 8i Lite was supported on MS Windows 95, 98 and NT, Windows CE, Palm OS and EPOC 32.
Oracle XML Parser for C and Oracle XML Parser for C++ are released and supports DOM and Simple API for XML (SAX) interfaces.
Oracle XML SQL utilities and XSQL Servlet facilitates the reading and writing of XML information from and to the Oracle database.
Siemens announce that they plan to build an Oracle 8i Applicance on its Primergy line of servers, based on Intel Pentium II Xeon processors.
Singapore Telecom’s Magix Server delivers the World’s first nationwide video on demand service. Their 12,000 subscriber were able to use a web-browser to select a video from the Megix Web side and SingTel automates the streaming of them to their computer.
Oracle 8i comes with some improvements in PL/SQL. These included Autonomous Transactions, Native Dynamic SQL, Invoker rights procedures, user-defined operators, new operators, bulk binds.
Part 2 of the article on exporting an Oracle Database to a Flat File. In this part of the article it looks at how you can use the UTL_FILE package.
How you can speed up query response times by using a Materialized Views. The article suggests the following steps to analyze the performance impact:
- Configure the server parameters
- Grant privileges to the appropriate schema
- Create a materialized view
- Refresh the optimizer statistics
- Confirm that the materialized view is being used
- Manually refresh a materialized view
Oracle introduces Oracle Log Miner to allow a DBA to analyze the REDO log files

Friday, April 19, 2013

Part 2–Getting start with Statistics for Oracle Data Science projects

This is the second blog on getting started with Statistics for Oracle Data Science projects.

The first blog post in the series looked at the DBMS_STAT_FUNCS PL/SQL package, what it can be used for and I give some sample code on how to use it in your data science projects. I also give some sample code that I typically run to gather some additional stats.
The second blog post will look at some of the other statistical functions that exist in SQL that you will/may use regularly in your data science projects.
The third blog post will provide a summary of the other statistical functions that exist in the database.

In this blog post I will look at 3 more useful statistical functions that are available in the Oracle database. Remember these come are standard with the database. The first function I will look at is the WIDTH_BUCKET function. This can be used to create some histograms of the data. A common task in analytics projects is to produce some cross tabs of the data. Oracle has the STATS_CROSSTAB. The last function I will look the different ways you an sample the data.

Histograms using WIDTH_BUCKET

When exploring your data it is useful to group values together into a number of buckets. Typically you might want to define the width of each bucket yourself before passing the data into your data mining tools, but before you can decide what these are you need to do some exploring using a variety of widths. A good way to do this is to use the WIDTH_BUCKET function. This takes the following inputs:

Expression: This is the expression or attribute on which the you want to build the histogram.

Min Value: This is the lower or starting value of the first bucket

Max Value: This is the last or highest value for the last bucket

Num Buckets: This is the number of buckets you want created.

Typically the Min Value and the Max Value can be calculated using the MIN and MAX functions. As a starting point you generally would select 10 for the number of buckets. This is the number you will change, downwards as well as upwards, to if a particular pattern exists in the attribute.

Using the example scenario that I used in the first blog post, let us start by calculating the MIN and MAX for the AGE attribute.

Lets say that we wanted to create 10 buckets. This would create a bucket width of 7.3 for each bucket, giving us the following.

Bucket 1 : 17-24.3
Bucket 2: 24.3-31.6
Bucket 3: 31.6-38.8
Bucket 4: 38.8-46.1
Bucket 5: 46.1-53.4
Bucket 6: 53.4-60.7
Bucket 7: 60.7-68
Bucket 8: 68-75.3
Bucket 9: 75.3-82.6
Bucket 10: 82.6-90

These are the buckets that the WIDTH_BUCKET function gives us in the following:

SELECT cust_id,
       age,
       width_bucket(age,
                    (SELECT min(age) from mining_data_build_v),
                    (select max(age)+1 from mining_data_build_v),
                    10) bucket
from mining_data_build_v
where rownum <=12
group by cust_id, age

An additional level of detail that is needed to allow us to plot the histograms for AGE, we need to aggregate up for all the records by bucket.

select intvl, count(*) freq
from (select width_bucket(salary,
(select min(salary) from employees),
(select max(salary)+1 from employees), 10) intvl
from HR.employees)
group by intvl
order by intvl;

We can take this code and embed it into the GATHER_DATA_STATS procedure that I gave in my Part 1 blog post.

Cross Tabs using STATS_CROSSTAB

Typically cross tabulation (or crosstabs for short) is a statistical process that summarises categorical data to create a contingency table. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

Because Crosstabs creates a row for each value in one variable and a column for each value in the other, the procedure is not suitable for continuous variables that assume many values.

In Oracle we can perform crosstabs using one of their reporting tools. But if you don’t have one of these we will need to use the in-database function STATS_CROSSTAB. This function takes three parameters, the first two of these are the attributes you want to compare and the third is what test we want to perform. The tests available include:

CHISQ_OBS: Observed value of chi-squared
CHISQ_SIG: Significance of observed chi-squared
CHISQ_DF: Degree of freedom for chi-squared
PHI_COEFFICIENT: Phi coefficient
CRAMERS_V: Cramer’s V statistic
CONT_COEFFICIENT: Contingency coefficient
COHENS_K: Cohen’s kappa

CHISQ_SIG is the default.

Now let us look at some examples using our same data set.

Sampling Data

When our datasets are of relatively small size consisting of a few hundred thousand records we can explore the data is a relatively short period of time. But if your data sets are larger that that you may need to explore the data by taking a sample of it. What sampling does is that it takes a “random” selection of records from our data set up to the new number of records we have specified in the sample.

In Oracle the SAMPLE function takes a percentage figure. This is the percentage of the entire data set you want to have in the Sampled result.

There is also a variant called SAMPLE BLOCK and the figure given is the percentage of records to select from each block.

Each time you use the SAMPLE function Oracle will generate a random seed number that it will use as a Seed for the SAMPLE function. If you omit a Seed number (like in the above examples), you will get a different result set in each case and the result set will have a slightly different number of records. If you run the sample code above over and over again you will see that the number of records returned varies by a small amount.

If you would like to have the same Sample data set returned each time then you will need to specify a Seed value. The Seed much be an integer between 0 and 4294967295.

In this case because we have specified the Seed we get the same “random” records being returned with each execution.

Pages

Monday, July 28, 2014

Thursday, July 17, 2014

Monday, June 23, 2014

Sunday, March 30, 2014

Sunday, September 22, 2013

Friday, August 30, 2013

Thursday, July 25, 2013

Tuesday, July 23, 2013

Friday, July 19, 2013

Thursday, July 18, 2013

Wednesday, July 17, 2013

Monday, July 15, 2013

Monday, July 8, 2013

Wednesday, June 12, 2013

Thursday, May 16, 2013

Wednesday, May 15, 2013

Thursday, April 25, 2013

Friday, April 19, 2013