Brendan Tierney - Oralytics Blog: SQL

Showing posts with label SQL. Show all posts

Thursday, April 30, 2015

Viewing Models Details for Decision Trees using SQL

When you are working with and developing Decision Trees by far the easiest way to visualise these is by using the Oracle Data Miner (ODMr) tool that is part of SQL Developer.
Developing your Decision Tree models using the ODMr allows you to explore the decision tree produced, to drill in on each of the nodes of the tree and to see all the statistics etc that relate to each node and branch of the tree.
But when you are working with the DBMS_DATA_MINING PL/SQL package and with the SQL commands for Oracle Data Mining you don't have the same luxury of the graphical tool that we have in ODMr. For example here is an image of part of a Decision Tree I have and was developed using ODMr.
Blog dt 1

What if we are not using the ODMr tool? In that case you will be using SQL and PL/SQL. When using these you do not have luxury of viewing the Decision Tree.
So what can you see of the Decision Tree? Most of the model details can be used by a variety of functions that can apply the model to your data. I've covered many of these over the years on this blog.
For most of the data mining algorithms there is a PL/SQL function available in the DBMS_DATA_MINING package that allows you to see inside the models to find out the settings, rules, etc. Most of these packages have a name something like GET_MODEL_DETAILS_XXXX, where XXXX is the name of the algorithm. For example GET_MODEL_DETAILS_NB will get the details of a Naive Bayes model. But when you look through the list there doesn't seem to be one for Decision Trees.
Actually there is and it is called GET_MODEL_DETAILS_XML. This function takes one parameter, the name of the Decision Tree model and produces an XML formatted output that contains the attributes used by the model, the overall model settings, then for each node and branch the attributes and the values used and the other statistical measures required for each node/branch.
The following SQL uses this PL/SQL function to get the Decision Tree details for model called CLAS_DT_1_59.
SELECT dbms_data_mining.get_model_details_xml('CLAS_DT_1_59')
FROM dual;

If you are using SQL Developer you will need to double click on the output column and click on the pencil icon to view the full listing.
Blog dt 2

Nothing too fancy like what we get in ODMr, but it is something that we can work with.
If you examine the XML output you will see references to PMML. This refers to the Predictive Model Markup Language (PMML) and this is defined by the Data Mining Group (www.dmg.org). I will discuss the PMML in another blog post and how you can use it with Oracle Data Mining.

Friday, March 6, 2015

RIP SQL*Plus & hello SQL Command Line

Over the past couple of months Oracle has been releasing some EA (Early Adopter) versions of a new tool that is currently called SQL Command Line.

The team behind this new tool is the SQL Developer development team and they have been working on creating a new command line SQL tool that is based on some of the technology that is included in SQL Developer.

SQL Command Line in an stand alone tool and all you need to do is to download and un-zip the tile.

What I want to show in this blog post is some of new features that are available and that I have found particularly useful. But before we get onto those commands let us first have a look at how you can get setup and running with SQL Command Line.

Download & Setup

The current download of SQL Command Line can be found under the SQL Developer 4.1 EA Download page. I'm assuming when 4.1 is formally released the download for SQL Command line will be on the main SQL Developer Download web page.

After you have downloaded the file, all you need to do is to unzip the file and then copy the unzipped directory to where you want the software to be located on your client.

Now you are ready to get started with using SQL Command Line.

Connecting to your Oracle Schema

(That) Jeff Smith and Barry McGillin have a couple of good blog posts on the different connection methods and some setup or configuration you might need to consider. Check out these links for more details.

For me I did not have to do any additional setup or configuration. I was able to use the TNS Names and the EZConnect methods without any problems.

The following how to connect to my (DMUSER) schema using the EZConnect method. With this method we pass in the username, password, the host name, port number and the service name. Just like this

> sql dmuser/dmuser@localhost:1521/pdb12c

We can not have a look at the JDBC connection details.

SQL> show jdbc

-- Database Info --

Database Product Name: Oracle

Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Database Major Version: 12

Database Minor Version: 1

-- Driver Info --

Driver Name: Oracle JDBC driver

Driver Version: 12.1.0.2.0

Driver Major Version: 12

Driver Minor Version: 1

Driver URL: jdbc:oracle:thin:@localhost:1521/pdb12c

SQL>

If we have a TNSNAMES.ORA file on our computer and the directory that it is in, is on the search PATH, then we can use the service names defined in the TNSNAMES.ORA file. The following example shows you how to use this in two ways. The first shows how to enter all the details when you are starting SQL CL and the other is when SQL CL prompts you for each parameter.

> sql dmuser/dmuser@pdb12c

and when we are prompted to enter the parameters, we get the following.

> sql

SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:12 2015

SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:14 2015

Username? (''?) dmuser

Password? (**********?) ******

Database? (''?) pdb12c

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production

SQL>

As you can see these work in the same way as when we use SQL*Plus.

Now that you are connected to your schema, what else can you do? The following sections are some useful commands.

Commands & Help

The following list of commands is by no means a complete list of commands available in SQL Command Line. Theoretically everything you can currently do in SQL*Plus you can also do in SQL Command Line (theoretically) But the commands I give examples of below are some of my favourites (so far).

You can get the list of commands by typing help at the SQL prompt.

SQL> help

Then to get help on a specific command you can just add the command after the help.

SQL> help cd

---

Changes path to look for script at after startup.

(show SQLPATH shows the full search path currently:

- CD current directory setting set by last cd command

- baseURL (url for subscripts)

- topURL (top most url when starting script)

- Last Node opened (i.e. file in worksheet)

- Where last script started

- Last opened on sqlplus path related file chooser

- SQLPATH setting

- "." if in SQLDeveloper UI (included in SQLPATH in command line (sdsql))

SQL>

Some work is still needed on the help documentation and what is listed for each command, as the current version is missing some important details.

Alais

This is by far my favourite new feature. This allows us to take some of our most common SQL statements and to create a shortcut for it.

Very soon I will not be using Oracle SQL but I will be using My SQL, as I will have created my own personalised version of SQL.

To list what aliases you have defined in your schema you can type

SQL > alais

Oracle will have a few aliases already defined in SQL CL. By having a look at some of these you can see some of what you want they can do and get ideas for what you might want to do with them. To list the contents of an alias, you can use the following command.

alias list {alias name}

for example

SQL > alias list tables

This command lists the query that is used for the 'tables' alias that comes with SQL CL.

I use Oracle Data Miner a lot and when you use this tool it can create a number of tables with a variety of names in your schema. Most of these you will never need to look at. So what I do is create an alias that excludes these from the list of tables in my schema.

SQL> alias tables2=select table_name from user_tables where table_name not like 'ODMR$%' and table_name not like 'DM$%' and table_name not like 'SYS_IOT%';

So now all I need to do to list my important data only tables (and exclude all the Oracle Data Miner tables) I can run my alias 'table2'.

SQL> tables2

You will quickly build up a suite of commands using aliases.

info and >info+

info and info+ are the new commands to replace the DESC command.

The difference between info and info+ is that info+ gives you some statistical information about the table and the attributes in the table. This is illustrated in the following examples.

Example using 'info'

Example using 'info+'

CTAS & DDL

If you want to get the DDL script to create a copy of a table you have two options open to you. The first of these is the DDL command. This creates a DDL statement based on the meta data for the table, just like in the following

An alternative to this is to use the CTAS command that will give a slightly different output to DDL command. With the CTAS we also get the CREATE TABLE .. AS SELECT ...

History

In SQL*Plus we had a limited scroll through our previous commands. The same kind of scrolling is available in SQL CL, but we can get to see all our previous commands using the 'history' command. The following illustrates how you can list all you previous commands, I'm sure it is limited to a certain number or will be otherwise it will become a very long, long list.

SQL> history

To find out how often each command has been run you can run

SQL> history usage

and to find out how long the query took to run the last time it was run

SQL> history time

There are lots more that I could show, but this post is way, way to long as it is. What I suggest you do is go and download SQL CL (Command Line) and start using it today.

Wednesday, November 12, 2014

Approximate Count Distinct (12.1.0.2 new feature)

With the release of the Oracle Database 12.1.0.2 there was a number of new features and options. Most of the publicity has been around the in-Memory option. But there was lots of other features for the DBA and a few for the developer.

One of the new SQL functions is the APPROX_COUNT_DISTINCT(). This function is different to the tradition count distinct, COUNT(DISTINCT expression), in that is performs an approximate count distinct. The theory is that this approximate count is a lot more efficient than performing the full count distinct.

The APPROX_COUNT_DISTINCT() function is really only suitable when you are processing very large volumes of data and when the data set contains a large number of distinct values.

The general syntax of the function is:

... APPROX_COUNT_DISTINCT(expression) ...

and returns a Number.

The function returns the approximate number of records that contain distinct value for the expression.

SELECT approx_count_distinct(cust_id)

FROM mining_data_build_v;

The APPROX_COUNT_DISTINCT() function ignores records that contain a null value for the expression. Plus is performs less work on the sorting and aggregations. Just run and Explain Plan and you can see the differences.

In some of the material from Oracle the APPROX_COUNT_DISTINCT() function can be 5x to 50x++ times faster. But it depends on the number of distinct values and the complexity of the SQL query.

As the result / returned value from the function may not be 100% accurate, Oracle says that the functions has an accuracy of >97% (with 95% confidence).

The function cannot be used on the following data types: BFILE, BLOB, CLOB, LONG, LONG RAW and NCLOB

Wednesday, October 29, 2014

Something new in 12c: FETCH FIRST x ROWS

In this post I want to show some example of using a new feature in 12c for selecting the first X number of records from the results set of a query.

See the bottom of this post for the background and some of the reasons for this post.

Before we had the 12c Database if we only wanted to see a subset or the initial set of records from the results of a query we could add something like the following to our query

...

AND ROWNUM <= 5;

The could use the pseudo column ROWNUM to restrict the number of records that would be displayed. This was particularly useful when the results many 10s, 100s, or millions of records. It allowed us to quickly see a subset and to see if the results where what we expected.

In my book (Predictive Analytics Using Oracle Data Miner) I had lots of examples of using ROWNUM.

What I wasn't aware of when I was writing my book was that there was a new way of doing this in 12c. We now have something like the following:

...

FETCH FIRST x ROWS ONLY;

There is an example:

SELECT * FROM mining_data_build_v

FETCH FIRST 10 ROWS ONLY;

Fetch first 1

There are a number of different ways you can use the row limiting feature. Here is the syntax for it:

[ OFFSET offset { ROW | ROWS } ]

[ FETCH { FIRST | NEXT } [ { rowcount | percent PERCENT } ]

{ ROW | ROWS } { ONLY | WITH TIES } ]

In most cases you will probably use the number of rows. But there many be cases where you might what to use the PERCENT. In previous versions of the database you would have used SAMPLE to bring back a certain percentage of records.

select CUST_GENDER from mining_data_build_v

FETCH FIRST 2 PERCENT ROWS ONLY;

This will set the first 2 percent of the records.

You can also decide from what point in the result set you want the records to be displayed from. In the previous examples above the results displayed will befing with the first records. In the following example the results set will be processed to record 60 and then the first 5 records will be selected and displayed. This will be records 61, 62, 63, 64 and 65. So the first record processed will be the OFFSET record + 1.

select CUST_GENDER from mining_data_build_v

OFFSET 60 ROWS FETCH FIRST 5 ROWS ONLY;

Similar to the PERCENT example above you can use the OFFSET value, for example.

select CUST_GENDER from mining_data_build_v

OFFSET 60 ROWS FETCH FIRST 2 PERCENT ROWS ONLY;

This query will go to records 61 and return the next 2 percent of the records.

The background to this post

There are a number of reasons that I really love attending Oracle User Group conferences. One of the challenges I set myself is to go to presentations on topics that I think I know or know very well. I can list many, many reasons for this but there are 2 main points. The first is that you are getting someone elses perspective on the topic and hence you might learn something new or understand it better. The second is that you might actually learn something new, like some new command, parameter setting or something else like that.

At Oracle Open World recently I attended the EMEA 12 things about 12c set of presentations that Debra Lilly arranged during the User Group Forum on the Sunday. During these session Alex Nuijten gave an overview of some 12c new SQL features. One of these was the command FETCH FIRST x ROWS. This blog post illustrates some of the different ways of using this command.

Wednesday, July 30, 2014

Tokenizing a String : Using Regular Expressions

In my previous blog post I gave some PL/SQL that performed the tokenising of a string. Check out this blog post here.

Thanks also to the people who sent me links examples of how to tokenise a string using the MODEL clause. Yes there are lots of examples of this out there on the interest.

While performing the various searches on the internet I did come across some examples of using Regular Expressions to extract the tokens. The following example is thanks to a blog post by Tanel Poder

I've made some minor changes to it to remove any of the special characters we want to remove.

column token format a40

define separator=" "

define mystring="$My OTN LA Tour (2014?) will consist of Panama, CostRica and Mexico."

define myremove="\?|\#|\$|\.|\,|\;|\:|\&|$|$|\-";

SELECT regexp_replace(REGEXP_REPLACE(

REGEXP_SUBSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL )

, '&separator$', ''), '&myremove', '') TOKEN

FROM

DUAL

CONNECT BY

REGEXP_INSTR( '&mystring'||'&separator', '(.*?)&separator', 1, LEVEL ) > 0

ORDER BY

LEVEL ASC

When we run this code we get the following output.

So we have a number of options open to use to tokenise strings using SQL and PL/SQL, using a number of approaches including substring-ing, using pipelined functions, using the Model clause and also using Regular Expressions.

Monday, July 28, 2014

BUCKET_WIDTH: Calculating the size of the bucket

Some time ago I had some blog posts introducing some of the basic Statistical function available in Oracle. Here are the links to these.

Most people do not realise that Oracle has over 250+ statistical functions that are available (no addition cost) in all the database versions.

I've had a query about one of the functions BUCKET_WIDTH. The question was wondering if it was possible to get the width of the bucket in each case. There does not seem to be a build in feature to get this value, so we have to calculate this ourselves.

Here is an example of how to calculate the bucket width, as on the example I used in my previous blog post.

SELECT bucket, max(age)-min(age) BUCKET_WIDTH, count(*)

FROM (SELECT cust_id,

age,

width_bucket(age,

(SELECT min(age) from mining_data_build_v),

(select max(age)+1 from mining_data_build_v),

10) bucket

FROM mining_data_build_v

GROUP BY cust_id, age )

GROUP BY bucket

ORDER BY bucket;

What this query gives is an approximate value of the size of the Bucket Width based on the values/records that are in a bucket. The actual values used cannot be determined exactly as there is not function/value in SQL that tells us the actual value.

Thursday, July 24, 2014

Tokenizing a String

Over the past while I've been working a lot with text strings. Some of these have been short in length like tweets from Twitter, or longer pieces of text like product reviews. Plus others of various lengths.

In all these scenarios I have to break up the data into individual works or Tokens.

The examples given below illustrate how you can take a string and break it into its individual tokens. In addition to tokenising the string I've also included some code to remove any special characters that might be included with the string.

These include ? # $ . ; : &

This list of special characters to ignore are just an example and is not an exhaustive list. You can add whatever characters to the list yourself. To remove these special characters I've used regular expressions as this seemed to be the easiest way to do this.

Using PL/SQL

The following example shows a simple PL/SQL unit that will tokenise a string.

DECLARE

vDelimiter VARCHAR2(5) := ' ';

vString VARCHAR2(32767) := 'Hello Brendan How are you today?'||vDelimiter;

vPosition PLS_INTEGER;

vToken VARCHAR2(32767);

vRemove VARCHAR2(100) := '\?|\#|\$|\.|\,|\;|\:|\&';

vReplace VARCHAR2(100) := '';

BEGIN

dbms_output.put_line('String = '||vString);

dbms_output.put_line('');

dbms_output.put_line('Tokens');

dbms_output.put_line('------------------------');

vPosition := INSTR(vString, vDelimiter);

WHILE vPosition > 0 LOOP

vToken := LTRIM(RTRIM(SUBSTR(vString, 1, vPosition-1)));

vToken := regexp_replace(vToken, vRemove, vReplace);

vString := SUBSTR(vString, vPosition + LENGTH(vDelimiter));

dbms_output.put_line(vPosition||': '||vToken);

vPosition := INSTR(vString, vDelimiter);

END LOOP;

END;

When we run this (with Serveroutput On) we get the following output.

A slight adjustment is needed to the output of this code to remove the numbers or positions of the token separator/delimiter.

Tokenizer using a Function

To make this more usable we will really need to convert this into an iterative function. The following code illustrates this, how to call the function and what the output looks like.

CREATE OR replace TYPE token_list

AS TABLE OF VARCHAR2(32767);

CREATE OR replace FUNCTION TOKENIZER(pString IN VARCHAR2,

pDelimiter IN VARCHAR2)

RETURN token_list pipelined

vPosition INTEGER;

vPrevPosition INTEGER := 1;

vRemove VARCHAR2(100) := '\?|\#|\$|\.|\,|\;|\:|\&';

vReplace VARCHAR2(100) := '';

vString VARCHAR2(32767) := regexp_replace(pString, vRemove, vReplace);

BEGIN

LOOP

vPosition := INSTR (vString, pDelimiter, vPrevPosition);

IF vPosition = 0 THEN

pipe ROW (SUBSTR(vString, vPrevPosition ));

EXIT;

ELSE

pipe ROW (SUBSTR(vString, vPrevPosition, vPosition - vPrevPosition ));

vPrevPosition := vPosition + 1;

END IF;

END LOOP;

END TOKENIZER;

Here are a couple of examples to show how it works and returns the Tokens.

SELECT column_value TOKEN

FROM TABLE(tokenizer('It is a hot and sunny day in Ireland.', ' '))

, dual;

How if we add in some of the special characters we should see a cleaned up set of tokens.

SELECT column_value TOKEN

FROM TABLE(tokenizer('$$$It is a hot and sunny day in #Ireland.', ' '))

, dual;

Wednesday, December 11, 2013

Running PL/SQL Procedures in Parallel

As your data volumes increase, particularly as you evolve into the big data world, you will be start to see that your Oracle Data Mining scoring functions will start to take longer and longer. To apply an Oracle Data Mining model to new data is a very quick process. The models are, what Oracle calls, first class objects in the database. This basically means that they run Very quickly with very little overhead.

But as the data volumes increase you will start to see that your Apply process or scoring the data will start to take longer and longer. As with all OLTP or OLAP environments as the data grows you will start to use other in-database features to help your code run quicker. One example of this is to use the Parallel Option.

You can use the Parallel Option to run your Oracle Data Mining functions in real-time and in batch processing mode. The examples given below shows you how you can do this.

Let us first start with some basics. What are the typical commands necessary to setup our schema or objects to use Parallel. The following commands are examples of what we can use

ALTER session enable parallel dml;
ALTER TABLE table_name PARALLEL (DEGREE 8);
ALTER TABLE table_name NOPARALLEL;
CREATE TABLE … PARALLEL degree …
ALTER TABLE … PARALLEL degree …
CREATE INDEX … PARALLEL degree …
ALTER INDEX … PARALLEL degree …

You can force parallel operations for tables that have a degree of 1 by using the force option.

ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;

alter session force parallel query PARALLEL 2

You can disable parallel processing with the following session statements.

ALTER SESSION DISABLE PARALLEL DDL;
ALTER SESSION DISABLE PARALLEL DML;
ALTER SESSION DISABLE PARALLEL QUERY;

We can also tell the database what degree of Parallelism to use

ALTER SESSION FORCE PARALLEL DDL PARALLEL 32;
ALTER SESSION FORCE PARALLEL DML PARALLEL 32;
ALTER SESSION FORCE PARALLEL QUERY PARALLEL 32;

Using your Oracle Data Mining model in real-time using Parallel

When you want to use your Oracle Data Mining model in real-time, on one record or a set of records you will be using the PREDICTION and PREDICTION_PROBABILITY function. The following example shows how a Classification model is being applied to some data in a view called MINING_DATA_APPLY_V.

column prob format 99.99999
SELECT cust_id,
       PREDICTION(DEMO_CLASS_DT_MODEL USING *) Pred,
       PREDICTION_PROBABILITY(DEMO_CLASS_DT_MODEL USING *) Prob
FROM   mining_data_apply_v
WHERE rownum <= 18
/

   CUST_ID       PRED      PROB
---------- ---------- ---------
    100574          0    .63415
    100577          1    .73663
    100586          0    .95219
    100593          0    .60061
    100598          0    .95219
    100599          0    .95219
    100601          1    .73663
    100603          0    .95219
    100612          1    .73663
    100619          0    .95219
    100621          1    .73663
    100626          1    .73663
    100627          0    .95219
    100628          0    .95219
    100633          1    .73663
    100640          0    .95219
    100648          1    .73663
    100650          0    .60061

If the volume of data warrants the use of the Parallel option then we can add the necessary hint to the above query as illustrated in the example below.

SELECT /*+ PARALLEL(mining_data_apply_v, 4) */
       cust_id,
       PREDICTION(DEMO_CLASS_DT_MODEL USING *) Pred,
       PREDICTION_PROBABILITY(DEMO_CLASS_DT_MODEL USING *) Prob
FROM   mining_data_apply_v
WHERE rownum <= 18
/

If you turn on autotrace you will see that Parallel was used. So you should now be able to use your Oracle Data Mining models to work on a Very large number of records and by adjusting the degree of parallelism you can improvements.

Using your Oracle Data Mining model in Batch mode using Parallel

When you want to perform some batch scoring of your data using your Oracle Data Mining model you will have to use the APPLY procedure that is part of the DBMS_DATA_MINING package. But the problem with using a procedure or function is that you cannot give it a hint to tell it to use the parallel option. So unless you have the tables(s) setup with parallel and/or the session to use parallel, then you cannot run your Oracle Data Mining model in Parallel using the APPLY procedure.

So how can you get the DBMA_DATA_MINING.APPLY procedure to run in parallel?

The answer is that you can use the DBMS_PARALLEL_EXECUTE package. The following steps walks you through what you need to do to use the DMBS_PARALLEL_EXECUTE package to run your Oracle Data Mining models in parallel.

The first step required is for you to put the DBMS_DATA_MINING.APPLY code into a stored procedure. The following code shows how our DEMO_CLASS_DT_MODEL can be used by the APPLY procedure and how all of this can be incorporated into a stored procedure called SCORE_DATA.

create or replace procedure score_data
is
begin

dbms_data_mining.apply(
model_name => 'DEMO_CLAS_DT_MODEL',
data_table_name => 'NEW_DATA_TO_SCORE',
case_id_column_name => 'CUST_ID',
result_table_name => 'NEW_DATA_SCORED');

end;
/

Next we need to create a Parallel Task for the DBMS_PARALLEL_EXECUTE package. In the following example this is called ODM_SCORE_DATA.

-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('ODM_SCORE_DATA');

Next we need to define the Parallel Workload Chunks details

 -- Chunk the table by ROWID
 DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('ODM_SCORE_DATA', 'DMUSER', 'NEW_DATA_TO_SCORE', true, 100);

The scheduled jobs take an unassigned workload chunk, process it and will then move onto the next unassigned chunk.

Now you are ready to execute the stored procedure for your Oracle Data Mining model, in parallel by 10.

DECLARE
   l_sql_stmt   varchar2(200);
BEGIN
   -- Execute the DML in parallel
   l_sql_stmt := 'begin score_data(); end;';

   DBMS_PARALLEL_EXECUTE.RUN_TASK('ODM_SCORE_DATA', l_sql_stmt, DBMS_SQL.NATIVE,
                                  parallel_level => 10);
END;
/

When every thing is finished you can then clean up and remove the task using

BEGIN
dbms_parallel_execute.drop_task('ODM_SCORE_DATA');
END;
/

NOTE: The schema that will be running the above code will need to have the necessary privileges to run DBMS_SCHEDULER, for example

grant create job to dmuser;

Thursday, September 19, 2013

Nested Tables (and Data) in Oracle & ODM

Oracle Data Mining uses Nested data types/tables to store some of its data. Oracle Data Mining creates a number of tables/objects that contain nested data when it is preparing data for input to the data mining algorithms and when outputting certain results from the algorithms. In Oracle 11.2g there are two nested data types used and in Oracle 12.1c we get an additional two nested data types. These are setup when you install the Oracle Data Miner Repository. If you log into SQL*Plus or SQL Developer you can describe them like any other table or object.

DM_NESTED_NUMERICALS

DM_NESTED_CATEGORICALS

The following two Nested data types are only available in 12.1c

DM_NESTED_BINARY_DOUBLES

DM_NESTED_BINARY_FLOATS

These Nested data types are used by Oracle Data Miner in preparing data for input to the data mining algorithms and for producing the some of the outputs from the algorithms.

Creating your own Nested Tables

To create your own Nested Data Types and Nested Tables you need to performs steps that are similar to what is illustrated in the following steps. These steps show you how to define a data type, how to create a nested table, how to insert data into the nested table and how to select the data from the nested table.

1. Set up the Object Type

Create a Type object that will defines the structure of the data. In these examples we want to capture the products and quantity purchased by a customer.

create type CUST_ORDER as object
(product_id varchar2(6),
quantity_sold number(6));
/

2. Create a Type as a Table

Now you need to create a Type as a table.

create type cust_orders_type as table of CUST_ORDER;
/

3. Create the table using the Nested Data

Now you can create the nested table.

create table customer_orders_nested (
cust_id       number(6) primary key,
order_date    date,
sales_person varchar2(30),
c_order       CUST_ORDERS_TYPE)
NESTED TABLE c_order STORE AS c_order_table;

4. Insert a Record and Query

This insert statement shows you how to insert one record into the nested column.

insert into customer_orders_nested
values (1, sysdate, 'BT', CUST_ORDERS_TYPE(cust_order('P1', 2)) );

When we select the data from the table we get

select * from customer_orders_nested;

CUST_ID ORDER_DAT SALES_PERSON
---------- --------- ------------------------------
C_ORDER(PRODUCT_ID, QUANTITY_SOLD)
-----------------------------------------------------
1 19-SEP-13 BT
CUST_ORDERS_TYPE(CUST_ORDER('P1', 2))

It can be a bit difficult to read the data in the nested column so we can convert the nested column into a table to display the results in a better way

select cust_id, order_date, sales_person, product_id, quantity_sold
from customer_orders_nested, table(c_order)

CUST_ID ORDER_DAT SALES_PERSON PRODUC QUANTITY_SOLD
---------- --------- ------------------------------ ------ -------------
1 19-SEP-13 BT P1 2

5. Insert many Nested Data items & Query

To insert many entries into the nested column you can do this

insert into customer_orders_nested
values (2, sysdate, 'BT2', CUST_ORDERS_TYPE(CUST_ORDER('P2', 2), CUST_ORDER('P3',3)));

When we do a Select * we get

CUST_ID ORDER_DAT SALES_PERSON
---------- --------- ------------------------------
C_ORDER(PRODUCT_ID, QUANTITY_SOLD)
-------------------------------------------------------------
1 19-SEP-13 BT
CUST_ORDERS_TYPE(CUST_ORDER2('P1', 2))

2 19-SEP-13 BT2
CUST_ORDERS_TYPE(CUST_ORDER2('P2', 2), CUST_ORDER2('P3', 3))

Again it is not easy to ready the data in the nested column, so if we convert it to a table again we now get a row being displayed for each entry in the nested column.

select cust_id, order_date, sales_person, product_id, quantity_sold
from customer_orders_nested, table(c_order);

   CUST_ID ORDER_DAT SALES_PERSON                   PRODUC QUANTITY_SOLD
---------- --------- ------------------------------ ------ -------------
         1 19-SEP-13 BT                             P1                 2
         2 19-SEP-13 BT2                            P2                 2
         2 19-SEP-13 BT2                            P3                 3

Thursday, July 25, 2013

12c New Data Mining functions

With the release of Oracle 12c we get new functions/procedures and some updated ones for Oracle Data Miner that is part of the Advanced Analytics option.

The following are the new functions/procedures and the functions/procedures that have been updated in 12c, with a link to the 12c Documentation that explains what they do.

CLUSTER_DETAILS is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns an XML string that describes the predicted cluster or a specified cluster.
CLUSTER_DISTANCE is a new function that predicts cluster membership for each row. It can use a pre-defined clustering model or perform dynamic clustering. The function returns the raw distance between each row and the centroid of either the predicted cluster or a specified.
CLUSTER_ID has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering.
CLUSTER_PROBABILITY has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.
CLUSTER_SET has been enhanced so that it can either use a pre-defined clustering model or perform dynamic clustering. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE
FEATURE_DETAILS is a new function that predicts feature matches for each row. It can use a pre-defined feature extraction model or perform dynamic feature extraction. The function returns an XML string that describes the predicted feature or a specified feature.
FEATURE_ID has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction.
FEATURE_SET has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.
FEATURE_VALUE has been enhanced so that it can either use a pre-defined feature extraction model or perform dynamic feature extraction. The data type of the return value has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.
PREDICTION_BOUNDS now returns the upper and lower bounds of the prediction as the BINARY_DOUBLE data type. It previously returned these values as the NUMBER data type.
PREDICTION_COST has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned cost has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION_DETAILS has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction.
PREDICTION_PROBABILITY has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.
PREDICTION_SET has been enhanced so that it can either use a pre-defined predictive model or perform dynamic prediction. The data type of the returned probability has been changed from NUMBER to BINARY_DOUBLE.

Wednesday, June 12, 2013

Part 3–Getting start with Statistics for Oracle Data Science projects

This is the Part 3 blog post on getting started with Statistics for Oracle Data Science projects.

The first blog post in the series looked at the DBMS_STAT_FUNCS PL/SQL package, what it can be used for and I give some sample code on how to use it in your data science projects. I also give some sample code that I typically run to gather some additional stats.
The second blog post will look at some of the other statistical functions that exist in SQL that you will/may use regularly in your data science projects.This is the second blog on getting started with Statistics for Oracle Data Science projects.
The third blog post will provide a summary of the other statistical functions that exist in the database.

The table below is a collection of most of the statistical functions in Oracle 11.2. The links in the table bring you to the relevant section of the Oracle documentation where you will find a description of each function, the syntax and some examples of each.

ABS	LENGTH2	REGR_AVGX
ACOS	LENGTH4	REGR_ACGY
Aggregrate functions	LENGTHB	REGR_COUNT
Analytic functions	LENGTHC	REGR_INTERCEPT
Arithmetic operators	LN	REGR_R2
ASIN	LNNVL	REGR_SLOPE
ATAN	LOG	REGR_SXX
ATAN2	LOWER	REGR_SXY
AVG	LPAD	REGR_SYY
CAST	LTRIM	ROLLUP clause
Comparison functions	MAX	ROUND
CONCAT	MEDIAN	SAMPLE
CORR	MIN	SIN
CORR_K	MOD	SINH
CORR_S	MODEL clause	SQRT
COS	NTH_VALUE	STATS_BINOMIAL_TEST
COSH	Numeric Functions	STATS_CROSSTAB
COUNT	PERCENT_RANK	STATS_F_TEST
COVAR_POP	PERCENTILE_CONT	STATS_KS_TEST
COVAR_SAMP	PERCENTILE_DISC	STATS_MODE
CUBE clause	Pivot operations	STATS_MW_TEST
CUME_DIST	POWER	STATS_ONE_WAY_ANOVA
CV	PREDICTION	STATS_T_TEST_INDEP
Data functions	PREDICTION_BOUNDS	STATS_T_TEST_INDEPU
DENSE_RANK	PREDICTION_COST	STATS_T_TEST_ONE
EXP	PREDICTION_PROBABILITY	STATS_T_TEST_PAIRED
FLOOR	PREDICTION_SET	STATS_WSR_TEST
GREATEST	PRESENTNNV	STDDEV
Grouping Sets	PRESENTNTV	STDEEV_POP
INTERSECT	Prior clause	STDDEV_SAMP
Interval arithmetic	PRIOR	SUM
INTERVAL	RANK	TAN
Julian dates	RAWTOHEX	TANH
LAG	REGEXP_COUNT	t-test
LAST	REGEXP_INSTR	VAR_POP
LEAD	REGEXP_LIKE	VAR_SAMP
LEAST	REGEXP_REPLACE	VARIANCE
LENGTH	REGEXP_SUBSTR	WIDTH_BUCKET

The list about may not be complete (I’m sure it is not), but it will cover most of what you will need to use in your Oracle projects.

If you come across or know of other useful statistical functions in Oracle let me know the details and I will update the table above to include them.

Thursday, May 16, 2013

Outputting your data using inbuilt SQL Dev formatting

Oracle has build a number of formatting options into SQL Developer to allow you to output your data in some standard formats. This removes the need to use other tools or to write extra code or performs various follow up steps.
All you need to do is to add a comment and use the Scrip button
SELECT /*csv*/ * FROM scott.emp;
SELECT /*xml*/ * FROM scott.emp;
SELECT /*html*/ * FROM scott.emp;
SELECT /*delimited*/ * FROM scott.emp;
SELECT /*insert*/ * FROM SCOTT.EMP;
SELECT /*loader*/ * FROM scott.emp;
SELECT /*fixed*/ * FROM scott.emp;
SELECT /*text*/ * FROM scott.emp;
Hint: for some of these it is best to list the schema and table name in upper case
These are comments and not hints so they will not work in SQL*Plus.

Wednesday, January 2, 2013

OUG Norway April 2013 - New Year’s News

I received an email at 23:24 on the 1st January from the OUG in Norway telling me that I’ve had two presentations accepted for the Annual OUG Norway seminar event. This will be on during the 17th-19th April.

The first day of this event (17th April) will be held in a hotel in Oslo. Then on the morning of 18th April we board the Color Magic cruise for the next two days of the conference. The ferry/cruise will go from Oslo to Kiel in Germany and then back again to Oslo, returning around 10am on Saturday 20th April.

I will be giving two presentations on the Oracle Advanced Analytics Option. The first presentation, ‘Using Predictive Analytics in Oracle’, will give an overview of the Oracle Advanced Analytics Option and will then focus on the Oracle Data Miner work-flow tool. This will presentation will include a live demo of using Oracle Data Miner to create some data mining models.

The second presentation, ‘How to Deploy and Use your Oracle Data Miner Models in Production’, builds on the examples given in the first presentation and will show how you can migrate, user and update your Oracle Data Miner models using the features available in SQL and PL/SQL. Again a demo will be given.

Friday, November 9, 2012

Update on : Adding numbers between

Over the past few days I’ve had a number of emails and comments based on my previous post. My previous post was called ‘Adding numbers between two values’. I included some PL/SQL code that can be used to add up the numbers between two values. I mentioned that this was a question that my pre-teen son (a few year pre-teen) had asked me.

There are two main solutions to the same problem. One involves just using a SELECT and the other involves using recursion. I will come back the these alternative solutions below.

But let me start off with a bit more detail and background to why I approached the problem the way that I did. The main reason is that my son is a pre-teen. Over the past couple of years he as expressed an interest in what his daddy does. We even have matching ORACLENERD t-shirts Smile

When I was working through the problem with my son I wanted to show him how to take a problem and by breaking it down into its different parts we can work out an overall solution. We can then take each of these parts and translate them into code. In this case some PL/SQL, yes it is a bit nerdy and we do have the t-shirt. The code that I gave illustrates many different parts of the language and hopefully he will use some of these features as we continue on our learning experience.

It is good sometimes to break a problem down into smaller parts. That way we can understand it better, what works and what does not work, if something does not work then we will know what bit and also leads to easier maintenance. At a later point as you develop an in-depth knowledge of certain features of a language you can then rewrite what you have to be more efficient.

All part of the learning experience.

Ok lets take a look at the other ways to answer this problem. The first approach is to just use a single SELECT statement.

SELECT sum(rownum + &&Start_Number - 1)
FROM dual
CONNECT by level <= &End_Number - &&Start_Number + 1;

An even simpler way is

SELECT sum(level)
FROM dual
CONNECT BY level between &Start_Number and &End_Number;

These queries create a hierarchical query that produce all the numbers between the Start_Number parameter and the End_Number parameter. The SUM is needed to all all the numbers/rows produced. This is nice and simple (but not that easy for by son at this point).

Thank you to everyone who contacted me about this. I really appreciated your feedback and please keep your comments coming for all my posts.

Tuesday, November 6, 2012

Adding numbers between two values

My son asked me the other day, what is the total number if you add all the numbers between Zero and 100.
We could have sat down to work it out with some paper and a pen, but instead I decided to introduce him to the world of Oracle, SQL and PL/SQL
The first step we took was to work out how you would do it on paper for some of the numbers. Then we translated this into some PL/SQL code. OK I did a lot this but he did seem to understand and follow what I was doing.
So the following Function is what we ended up with to add all the numbers between two numbers and return the answer.
CREATE or REPLACE function AddNumsBetween
       ( pStartNum IN NUMBER,
         pEndNum IN NUMBER)
   RETURN NUMBER
IS
   vSum   Number := 0;
BEGIN
   FOR i IN pStartNum .. pEndNum LOOP
      vSum := vSum + i;
   END LOOP;
   return vSum;
END;
/
The next step was to write some code to call this function. The code prompts the user to enter the Start number and End number.
set serveroutput on
DECLARE
   vStartNum NUMBER := 0;
   vEndNum   NUMBER := 100;
   vAnswer    NUMBER := 0;
BEGIN
   vStartNum := &Start_Number;
   vEndNum := &End_Number;
   vAnswer := AddNumsBetween(vStartNum, vEndNum);
   dbms_output.put_line('The sum of numbers between '||vStartNum||' and '||vEndNum||' is '||vAnswer||'.');
END;
/
To answer by son’s original query, we used Zero and 100 as our inputs.

The answer to the question is 5,050.

Thursday, November 1, 2012

ASCII to character conversion in Oracle

Here is code code that will produce formatted output of the characters and their ascii values. The formatting is broken into lower case letters, uppercase letters, characters with an ascii value less than an ‘a’ and characters whose ascii characters are greater than a ‘z.

Code

set serveroutput on FORMAT WRAPPED
DECLARE
    vTab           VARCHAR2(5) := CHR(9);
    vNum          NUMBER := 0;
    vString       VARCHAR2(80) := '';
BEGIN
   --
   -- Formatted lower case letter to ASCII values
   --
   dbms_output.put_line('Formatted Lower Case Letters to ASCII values');
   dbms_output.put_line('-------------------------------------------------------');
   FOR i IN ASCII('a') .. ASCII('z') LOOP
      IF vNum < 6 THEN
         vString := vString||CHR(i)||' : '||i||vTab;
         vNum := vNum + 1;
      ELSIF vNum = 6 then
         dbms_output.put_line(vString||CHR(i)||' : '||i);
         vNum := 0;
         vString := '';
      ELSE
         dbms_output.put_line('ERROR');
      END IF;
   END LOOP;
   dbms_output.put_line(vString);

   --
   -- Formatted upper case letter to ASCII values
   --
   vString := '';
   vNum := 0;

   dbms_output.new_line;
   dbms_output.new_line;
   dbms_output.put_line('Formatted Upper Case Letters to ASCII values');
   dbms_output.put_line('-------------------------------------------------------');
   FOR i IN ASCII('A') .. ASCII('Z') LOOP
      IF vNum < 6 THEN
         vString := vString||CHR(i)||' : '||i||vTab;
         vNum := vNum + 1;
      ELSIF vNum = 6 then
         dbms_output.put_line(vString||CHR(i)||' : '||i);
         vNum := 0;
         vString := '';
      ELSE
         dbms_output.put_line('ERROR');
      END IF;
   END LOOP;
   dbms_output.put_line(vString);

   --
   -- Formatted chars less than 'a' to ASCII values
   --
   vString := '';
   vNum := 0;

   dbms_output.new_line;
   dbms_output.new_line;
   dbms_output.put_line('Formatted Letters, less than a to ASCII values');
   dbms_output.put_line('-------------------------------------------------------');
   FOR i in 0 .. ASCII('a')-1 LOOP
      IF vNum < 6 THEN
         vString := vString||CHR(i)||' : '||i||vTab;
         vNum := vNum + 1;
      ELSIF vNum = 6 then
         dbms_output.put_line(vString||CHR(i)||' : '||i);
         vNum := 0;
         vString := '';
      ELSE
         dbms_output.put_line('ERROR');
      END IF;
   END LOOP;
   dbms_output.put_line(vString);

   --
   -- Formatted chars greater than 'Z' to ASCII values
   --
   vNum := 0;
   vString := '';

   dbms_output.new_line;
   dbms_output.new_line;
   dbms_output.put_line('Formatted Letters, greater than z to ASCII values');
   dbms_output.put_line('-------------------------------------------------------');
   FOR i IN ASCII('z') .. ASCII('z')+133 LOOP
      IF vNum < 6 THEN
         vString := vString||CHR(i)||' : '||i||vTab;
         vNum := vNum + 1;
      ELSIF vNum = 6 then
         dbms_output.put_line(vString||CHR(i)||' : '||i);
         vNum := 0;
         vString := '';
      ELSE
         dbms_output.put_line('ERROR');
      END IF;
   END LOOP;
   dbms_output.put_line(vString);

END;
/

Output

Formatted Lower Case Letters to ASCII values
-------------------------------------------------------
a : 97 b : 98 c : 99 d : 100 e : 101 f : 102 g : 103
h : 104 i : 105 j : 106 k : 107 l : 108 m : 109 n : 110
o : 111 p : 112 q : 113 r : 114 s : 115 t : 116 u : 117
v : 118 w : 119 x : 120 y : 121 z : 122

Formatted Upper Case Letters to ASCII values
-------------------------------------------------------
A : 65 B : 66 C : 67 D : 68 E : 69 F : 70 G : 71
H : 72 I : 73 J : 74 K : 75 L : 76 M : 77 N : 78
O : 79 P : 80 Q : 81 R : 82 S : 83 T : 84 U : 85
V : 86 W : 87 X : 88 Y : 89 Z : 90

Formatted Letters, less than a to ASCII values
-------------------------------------------------------
: 0   ☺ : 1   ☻ : 2   ♥ : 3   ♦ : 4   ♣ : 5   ♠ : 6
: 7 : 8                 : 9
: 13   ♂ : 11 ♀ : 12
♫ : 14 ☼ : 15 ► : 16 ◄ : 17 ↕ : 18 ‼ : 19 ¶ : 20
§ : 21 ▬ : 22 ↨ : 23 ↑ : 24 ↓ : 25 → : 26 ← : 27
∟ : 28 ↔ : 29 ▲ : 30 ▼ : 31    : 32 ! : 33 " : 34
# : 35 $ : 36 % : 37 & : 38 ' : 39 ( : 40 ) : 41
* : 42 + : 43 , : 44 - : 45 . : 46 / : 47 0 : 48
1 : 49 2 : 50 3 : 51 4 : 52 5 : 53 6 : 54 7 : 55
8 : 56 9 : 57 : : 58 ; : 59 < : 60 = : 61 > : 62
? : 63 @ : 64 A : 65 B : 66 C : 67 D : 68 E : 69
F : 70 G : 71 H : 72 I : 73 J : 74 K : 75 L : 76
M : 77 N : 78 O : 79 P : 80 Q : 81 R : 82 S : 83
T : 84 U : 85 V : 86 W : 87 X : 88 Y : 89 Z : 90
[ : 91 \ : 92 ] : 93 ^ : 94 _ : 95 ` : 96

Formatted Letters, greater than z to ASCII values
-------------------------------------------------------
z : 122 { : 123 | : 124 } : 125 ~ : 126 ⌂ : 127 Ç : 128
ü : 129 é : 130 â : 131 ä : 132 à : 133 å : 134 ç : 135
ê : 136 ë : 137 è : 138 ï : 139 î : 140 ì : 141 Ä : 142
Å : 143 É : 144 æ : 145 Æ : 146 ô : 147 ö : 148 ò : 149
û : 150 ù : 151 ÿ : 152 Ö : 153 Ü : 154 ø : 155 £ : 156
Ø : 157 × : 158 ƒ : 159 á : 160 í : 161 ó : 162 ú : 163
ñ : 164 Ñ : 165 ª : 166 º : 167 ¿ : 168 ® : 169 ¬ : 170
½ : 171 ¼ : 172 ¡ : 173 « : 174 » : 175 ░ : 176 ▒ : 177
▓ : 178 │ : 179 ┤ : 180 Á : 181 Â : 182 À : 183 © : 184
╣ : 185 ║ : 186 ╗ : 187 ╝ : 188 ¢ : 189 ¥ : 190 ┐ : 191
└ : 192 ┴ : 193 ┬ : 194 ├ : 195 ─ : 196 ┼ : 197 ã : 198
Ã : 199 ╚ : 200 ╔ : 201 ╩ : 202 ╦ : 203 ╠ : 204 ═ : 205
╬ : 206 ¤ : 207 ð : 208 Ð : 209 Ê : 210 Ë : 211 È : 212
ı : 213 Í : 214 Î : 215 Ï : 216 ┘ : 217 ┌ : 218 █ : 219
▄ : 220 ¦ : 221 Ì : 222 ▀ : 223 Ó : 224 ß : 225 Ô : 226
Ò : 227 õ : 228 Õ : 229 µ : 230 þ : 231 Þ : 232 Ú : 233
Û : 234 Ù : 235 ý : 236 Ý : 237 ¯ : 238 ´ : 239 : 240
± : 241 ‗ : 242 ¾ : 243 ¶ : 244 § : 245 ÷ : 246 ¸ : 247
° : 248 ¨ : 249 · : 250 ¹ : 251 ³ : 252 ² : 253 ■ : 254
: 255

PL/SQL procedure successfully completed.

Observations

There are two things that stand out in this. The first is there is sound produced. This is because one of the characters is defined this way. It is ASCII number 7. This can be repeated using the following:

select chr(7) from dual.

The second is the formatting of the lines for ascii codes 8 to 12. We can see that one of the ascii codes does not get displayed and the ordering of this is not as expected. This is due to ascii 10 being a line feed.

Pages

Thursday, April 30, 2015

Friday, March 6, 2015

Wednesday, November 12, 2014

Wednesday, October 29, 2014

Wednesday, July 30, 2014

Monday, July 28, 2014

Thursday, July 24, 2014

Wednesday, December 11, 2013

Thursday, September 19, 2013

Thursday, July 25, 2013

Wednesday, June 12, 2013

Thursday, May 16, 2013

Wednesday, January 2, 2013

Friday, November 9, 2012

Tuesday, November 6, 2012

Thursday, November 1, 2012