Brendan Tierney - Oralytics Blog: 12.2

Showing posts with label 12.2. Show all posts

Monday, August 7, 2017

Auto enabling APPROX_* function in the Oracle Database

With the releases of 12.1 and 12.2 of Oracle Database we have seen some new functions that perform approximate calculations. These include:

APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT_DETAIL
APPROX_COUNT_DISTINCT_AGG
APPROX_MEDIAN
APPROX_PERCENTILE
APPROX_PERCENTILE_DETAIL
APPROX_PERCENTILE_AGG

These functions can be used when approximate answers can be used instead of the exact answer. Yes can have many scenarios for these and particularly as we move into the big data world, the ability to process our data quickly is slightly more important and exact numbers. For example, is there really a difference between 40% of our customers being of type X versus 41%. The real answer to this is, 'It Depends!', but for a lot of analytical and advanced analytical methods this difference doesn't really make a difference.

There are various reports of performance improvement of anything from 6x to 50x with the response times of the queries that are using these functions, instead of using the more traditional functions.

If you are a BI or big data analyst and you have build lots of code and queries using the more traditional functions. But what if you now want to use the newer functions. Does this mean you have go and modify all the code you have written over the years? you can imagine getting approval to do this!

The simple answer to this question is 'No'. No you don't have to change any code, but with some parameter changes for the DB or your session you can tell the database to automatically switch from using the traditional functions (count, etc) to the newer more optimised and significantly faster APPROX_* functions.

So how can you do this magic?

First let us see what the current settings values are:

SELECT name, value 
FROM   v$ses_optimizer_env 
WHERE  sid = sys_context('USERENV','SID') 
AND    name like '%approx%';

NewImage

Now let us run a query to test what happens using the default settings (on a table I have with 10,500 records).

set auto trace on

select count(distinct cust_id) from test_inmemory;

COUNT(DISTINCTCUST_ID)
----------------------
		  1500


Execution Plan
----------------------------------------------------------
Plan hash value: 2131129625

--------------------------------------------------------------------------------------
| Id  | Operation	     | Name	     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |		     |	   1 |	  13 |	  70   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE      |		     |	   1 |	  13 |		  |	     |
|   2 |   VIEW		     | VW_DAG_0      |	1500 | 19500 |	  70   (2)| 00:00:01 |
|   3 |    HASH GROUP BY     |		     |	1500 |	7500 |	  70   (2)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| TEST_INMEMORY | 10500 | 52500 |	  69   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Let us now set the automatic usage of the APPROX_* function.

alter session set approx_for_aggregation = TRUE;

SQL> select count(distinct cust_id) from test_inmemory;

COUNT(DISTINCTCUST_ID)
----------------------
		  1495


Execution Plan
----------------------------------------------------------
Plan hash value: 1029766195

---------------------------------------------------------------------------------------
| Id  | Operation	      | Name	      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      | 	      |     1 |     5 |    69	(0)| 00:00:01 |
|   1 |  SORT AGGREGATE APPROX| 	      |     1 |     5 | 	   |	      |
|   2 |   TABLE ACCESS FULL   | TEST_INMEMORY | 10500 | 52500 |    69	(0)| 00:00:01 |
---------------------------------------------------------------------------------------

We can see above that the APPROX_* equivalent function was used, and slightly less work. But we only used this on a very small table.

The full list of session level settings is:

alter session set approx_for_aggregation = TRUE;
alter session set approx_for_aggregation = FALSE;

alter session set approx_for_count_distinct = TRUE;
alter session set approx_for_count_distinct = FALSE;

alter session set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter session set approx_for_percentile = PERCENTILE_DISC;
alter session set approx_for_percentile = NONE;

Or at a system wide level:

alter system set approx_for_aggregation = TRUE;
alter system set approx_for_aggregation = FALSE;

alter system set approx_for_count_distinct = TRUE;
alter system set approx_for_count_distinct = FALSE;

alter system set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter system set approx_for_percentile = PERCENTILE_DISC;
alter system set approx_for_percentile = NONE;

And to reset back to the default settings:

alter system reset approx_for_aggregation;
alter system reset approx_for_count_distinct;
alter system reset approx_for_percentile;

Friday, March 3, 2017

Blog posts on Oracle Advanced Analytics features in 12.2

A couple of days ago Oracle finally provided us with an on-premises download for Oracle 12.2 Database.

Go and download load it from here

Download the Database App Development VM with 12.2 (This is what I did)

Over the past couple of months I've been using the DBaaS of 12.2, trying out some of the new Advanced Analytics option new features, and other new features. Here are the links to the blog posts on these new 12.2 new features. There will be more coming over the next few months.

New OAA features in Oracle 12.2 Database

Explicit Semantic Analysis in Oracle 12.2c Database

Explicit Semantic Analysis setup using SQL and PL/SQL

and slightly related is the new SQL Developer 4.2

Oracle Data Miner 4.2 New Features

Monday, January 23, 2017

Oracle Data Miner 4.2 New Features

Oracle Data Miner 4.2 (part of SQL Developer 4.2) got released as an Early Adopter versions (EA) a few weeks ago.

I had an earlier blog post that looked that the new Oracle Advanced Analytics in-database new features with the Oracle 12.2 Database.

With the new/updated Oracle Data Miner (ODMr) there are a number of new features. These can be categories as 1) features all ODMr users can use now, 2) New features that are only usable when using Oracle 12.2 Database, and 3) Updates to existing algorithms that have been exposed via the ODMr tool.

The following is a round up of the main new features you can enjoy as part of ODMr 4.2 (mainly covering points 1 and 2 above)

You can now schedule workflows to run based on a defined schedule
Support for additional data types (RAW, ROWID, UROWID, URITYPE)
Better support for processing JSON data in the JSON Query node
Additional insights are displayed as part of the Model Details View
Additional alert monitoring and reporting
Better support for processing in-memory data
A new R Model node that allows you to include in-database ORE user defined R function to support model build, model testing and applying of new model.
New Explicit Semantic Analysis node (Explicit Feature Extraction)
New Feature Compare and Test nodes
New workflow status profiling perfoance improvements
Refresh the input data definition in nodes
Attribute Filter node now allows for unsupervised attribute importance ranking
The ability to build Partitioned data mining models

Look out for the blog posts on most of these new features over the coming months.

WARNING: Most of these new features requires an Oracle 12.2 Database.

Monday, January 16, 2017

Explicit Semantic Analysis setup using SQL and PL/SQL

In my previous blog post I introduced the new Explicit Semantic Analysis (ESA) algorithm and gave an example of how you can build an ESA model and use it. Check out this link for that blog post.

In this blog post I will show you how you can manually create an ESA model. The reason that I'm showing you this way is that the workflow (in ODMr and it's scheduler) may not be for everyone. You may want to automate the creation or recreation of the ESA model from time to time based on certain business requirements.

In my previous blog post I showed how you can setup a training data set. This comes with ODMr 4.2 but you may need to expand this data set or to use an alternative data set that is more in keeping with your domain.

Setup the ODM Settings table

As with all ODM algorithms we need to create a settings table. This settings table allows us to store the various parameters and their values, that will be used by the algorithm.

-- Create the settings table
CREATE TABLE ESA_settings (
    setting_name VARCHAR2(30),
    setting_value VARCHAR2(30));

-- Populate the settings table
-- Specify ESA. By default, Naive Bayes is used for classification.
-- Specify ADP. By default, ADP is not used. Need to turn this on.
BEGIN
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (dbms_data_mining.algo_name,       
           dbms_data_mining.algo_explicit_semantic_analys);
   
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
  
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (odms_sampling,odms_sampling_disable);
  
    commit;
END;

These are the minimum number of parameter setting needed to run the ESA algorithm. The other ESA algorithm setting include:

Setup the Oracle Text Policy

You also need to setup an Oracle Text Policy and a lexer for the Stopwords.

DECLARE
   v_policy_name  varchar2(30);
   v_lexer_name   varchar2(3)
BEGIN
    v_policy_name  := 'ESA_TEXT_POLICY';
    v_lexer_name   := 'ESA_LEXER';
    ctx_ddl.create_preference(v_lexer_name, 'BASIC_LEXER');
    v_stoplist_name := 'CTXSYS.DEFAULT_STOPLIST';  -- default stop list
    ctx_ddl.create_policy(policy_name => v_policy_name, lexer => v_lexer_name, stoplist => v_stoplist_name);
END;

Create the ESA model

Once we have the settings table created with the parameter values set for the algorithm and the Oracle Text policy created, we can now create the model.

To ensure that the Oracle Text Policy is applied to the text we want to analyse we need to create a transformation list and add the Text Policy to it.

We can then pass the text transformation list as a parameter to the CREATE_MODEL, procedure.

DECLARE
   v_xlst              dbms_data_mining_transform.TRANSFORM_LIST;
   v_policy_name       VARCHAR2(130) := 'ESA_TEXT_POLICY';
   v_model_name        varchar2(50) := 'ESA_MODEL_DEMO_2';
BEGIN
   v_xlst := dbms_data_mining_transform.TRANSFORM_LIST();
   DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM(v_xlst, '"TEXT"', NULL, '"TEXT"', '"TEXT"', 'TEXT(POLICY_NAME:'||v_policy_name||')(MAX_FEATURES:3000)(MIN_DOCUMENTS:1)(TOKEN_TYPE:NORMAL)');

    DBMS_DATA_MINING.DROP_MODEL(v_model_name, TRUE);
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => v_model_name,
        mining_function     => DBMS_DATA_MINING.FEATURE_EXTRACTION,
        data_table_name     => 'WIKISAMPLE',
        case_id_column_name => 'TITLE',
        target_column_name  => NULL,
        settings_table_name => 'ESA_SETTINGS',
        xform_list          => v_xlst);
END;

NOTE: Yes we could have merged all of the above code into one PL/SQL block.

Use the ESA model

We can now use the FEATURE_COMPARE function to use the model we just created, just like I did in my previous blog post.

SELECT FEATURE_COMPARE(ESA_MODEL_DEMO_2
               USING 'Oracle Database is the best available for managing your data' text 
               AND USING 'The SQL language is the one language that all databases have in common' text) similarity 
FROM DUAL;

Go give the ESA algorithm a go and see where you could apply it within your applications.

Wednesday, January 4, 2017

Explicit Semantic Analysis in Oracle 12.2c Database

A new Oracle Data Mining algorithm in the Oracle 12.2c Database is called Explicit Semantic Analysis.

[The following examples are built using Oracle Data Miner 4.2 (SQL Developer 4.2) and the Oracle 12.2 Database cloud service (extreme edition) ]

The Explicit Semantic Analysis algorithm is an unsupervised algorithm used for feature extraction. ESA does not discover latent features but instead uses explicit features based on an existing knowledge base. There is no setup or install necessary to use this algorithm All you need is a licence for the Advanced Analytics Option for the database. The out from the algorithm is a distance measure that indicates how similar or dis-similar the input texts are, using the ESA model (and the training data set used). Let us look at an example. Setup training data for ESA Algorithm

Oracle Data Miner 4.2 (that comes with SQL Developer 4.2) has a data Wiki data set from 2005. This contains over 200,000 features. To locate the file go to.

...\sqldeveloper\dataminer\scripts\instWikiSampleData.sql

This file contains the DDL and the insert statements for the Wiki data set.

After you run this script a new table called WIKISAMPLE table exists and contains records

This gives us the base/seed data set to feed into the ESA algorithm.

Create the ESA Model using ODMr

To create the ESA model we have 2 ways of doing this. In this blog post I'll show you the easiest way by using the Oracle Data Miner (ODMr) tool. I'll have another blog post that will show you the SQL needed to create the model.

In an ODMr workflow create a new Data Source node. Then set this node to have the WIKISAMPLE table as it's data source.

Next you need to create the ESA node on the workflow. This node can be found in the Models section, of the Workflow Editor. The node is called Explicit Feature Extraction. Click on this node, in the model section, and then move your mouse to your workflow and click again. The ESA node will be created.

Join the Data Node to the ESA node by right clicking on the data node and then clicking on the ESA node.

Double click on the ESA node to edit the properties of the node and the algorithm.

Explore the ESA Model and ESA Model Features

After the model node has finished you can now explore the results generated by the ESA model. Right click on the model node and select 'View Model'. The model properties window opens and it has 2 main tabs. The first of these is the coefficients tab. Here you can select a particular topic (click on the search icon beside the Feature ID) and select it from the list. The attributes and their coefficient values will be displayed.

Next you can examine the second tab that is labeled as Features. In this table we can select a particular record and have a tag cloud and coefficients displayed. The tag cloud is a great way to see visually what words are important.

How to use the ESA model to Compare new data using SQL

Now that we have the ESA model created, we can not use it model to compare other similar sets of documents.

You will need to use the FEATURE_COMPARE SQL function to evaluate the input texts, using the ESA model to compare for similarity. For example,

SELECT FEATURE_COMPARE(feat_esa_1_1
          USING 'Oracle Database is the best available for managing your data' text 
          AND USING 'The SQL language is the one language that all databases have in common' text) similarity 
FROM DUAL;

The result we get is 0.7629.

The result generate by the query is a distance measure. The FEATURE_COMPARE function returns a comparison number in the range 0 to 1. Where 0 indicates that the text are not similar or related. If a 1 is returned then that indicated that the text are very similar or very related.

You can use this returned value to make a decision on what happens next. For example, it can be used to decide what the next step should be in your workflow and you can easily write application logic to manage this.

The examples given here are for general text. In the real world you would probably need a bigger data set. But if you were to use this approach in other domains, such as legal, banking, insurance, etc. then you would need to create a training data set based on the typical language that is used in each of those domains. This will then allow you to compare documents with each domain with greater accuracy.

[The above examples are built using Oracle Data Miner 4.2 (SQL Developer 4.2) and the Oracle 12.2 Database cloud service (extreme edition) ]

Wednesday, December 7, 2016

12.2 DBaaS (Extreme Edition) possible bug/issue with the DB install/setup

A few weeks ago the 12.2 Oracle Database was released on the cloud. I immediately set an account and got my 12.2 DBaaS setup. This was a relatively painless process and quick.

For me I wanted to test out all the new Oracle Advanced Analytics new features and the new features in SQL Developer 4.2 that only become visible when you are using the 12.2 Database.

When you are go to use the Oracle Data Miner (GUI tool) in SQL Developer 4.2, it will check to see if the ODMr repository is installed in the database. If it isn't then you will be promoted for the SYS password.

This is the problem. In previous version of the DBaaS (12.1, etc) this was not an issue.

When you go to create your DBaaS you are asked for a password that will be used for the admin accounts of the database.

But when I entered the password for SYS, I got an error saying invalid password.

After using ssh to create a terminal connection to the DBaaS I was able to to connect to the container using

sqlplus / as sysdba

and also using

sqlplus sys/ as sysdba

Those worked fine. But when I tried to connect to the PDB1 I got the invalid username and/or password error.

sqlplus sys/@pdb1 as sysdba

I reconnected as follows

sqlplus / as sysdba

and then changed the password for SYS with containers=all

This command completed without errors but when I tried using the new password to connect the the PDB1 I got the same error.

After 2 weeks working with Oracle Support they eventually pointed me to the issue of the password file for the PDB1 was missing. They claim this is due to an error when I was creating/installing the database.

But this was a DBaaS and I didn't install the database. This is a problem with how Oracle have configured the installation.

The answer was to create a password file for the PDB1 using the following

% orapwd file=$ORACLE_HOME/dbs/orapwPDB1 password= entries=10

I then changed the password again for SYS, then tried to connect as SYS to the PDB1, and if by magic I was connected.

I then tried installing the ODMr repository again (in SQL Developer) and when I entered the new password for SYS, it worked !

It's a pity that it took Oracle Support 2 weeks to get me to this point.

As 12.2 is a cloud service hopefully Oracle will get that issue fixed soon so that one one else has to suffer like I did.

Pages

Monday, August 7, 2017

Auto enabling APPROX_* function in the Oracle Database

Friday, March 3, 2017

Blog posts on Oracle Advanced Analytics features in 12.2

Monday, January 23, 2017

Oracle Data Miner 4.2 New Features

Monday, January 16, 2017

Explicit Semantic Analysis setup using SQL and PL/SQL

Wednesday, January 4, 2017

Explicit Semantic Analysis in Oracle 12.2c Database

Wednesday, December 7, 2016

12.2 DBaaS (Extreme Edition) possible bug/issue with the DB install/setup