As your data volumes increase, particularly as you evolve into the big data world, you will be start to see that your Oracle Data Mining scoring functions will start to take longer and longer. To apply an Oracle Data Mining model to new data is a very quick process. The models are, what Oracle calls, first class objects in the database. This basically means that they run Very quickly with very little overhead.
But as the data volumes increase you will start to see that your Apply process or scoring the data will start to take longer and longer. As with all OLTP or OLAP environments as the data grows you will start to use other in-database features to help your code run quicker. One example of this is to use the Parallel Option.
You can use the Parallel Option to run your Oracle Data Mining functions in real-time and in batch processing mode. The examples given below shows you how you can do this.
Let us first start with some basics. What are the typical commands necessary to setup our schema or objects to use Parallel. The following commands are examples of what we can use
ALTER session enable parallel dml;
ALTER TABLE table_name PARALLEL (DEGREE 8);
ALTER TABLE table_name NOPARALLEL;
CREATE TABLE … PARALLEL degree …
ALTER TABLE … PARALLEL degree …
CREATE INDEX … PARALLEL degree …
ALTER INDEX … PARALLEL degree …
You can force parallel operations for tables that have a degree of 1 by using the force option.
ALTER SESSION ENABLE PARALLEL DDL;
ALTER SESSION ENABLE PARALLEL DML;
ALTER SESSION ENABLE PARALLEL QUERY;alter session force parallel query PARALLEL 2
You can disable parallel processing with the following session statements.
ALTER SESSION DISABLE PARALLEL DDL;
ALTER SESSION DISABLE PARALLEL DML;
ALTER SESSION DISABLE PARALLEL QUERY;
We can also tell the database what degree of Parallelism to use
ALTER SESSION FORCE PARALLEL DDL PARALLEL 32;
ALTER SESSION FORCE PARALLEL DML PARALLEL 32;
ALTER SESSION FORCE PARALLEL QUERY PARALLEL 32;
Using your Oracle Data Mining model in real-time using Parallel
When you want to use your Oracle Data Mining model in real-time, on one record or a set of records you will be using the PREDICTION and PREDICTION_PROBABILITY function. The following example shows how a Classification model is being applied to some data in a view called MINING_DATA_APPLY_V.
column prob format 99.99999
SELECT cust_id,
PREDICTION(DEMO_CLASS_DT_MODEL USING *) Pred,
PREDICTION_PROBABILITY(DEMO_CLASS_DT_MODEL USING *) Prob
FROM mining_data_apply_v
WHERE rownum <= 18
/
CUST_ID PRED PROB
---------- ---------- ---------
100574 0 .63415
100577 1 .73663
100586 0 .95219
100593 0 .60061
100598 0 .95219
100599 0 .95219
100601 1 .73663
100603 0 .95219
100612 1 .73663
100619 0 .95219
100621 1 .73663
100626 1 .73663
100627 0 .95219
100628 0 .95219
100633 1 .73663
100640 0 .95219
100648 1 .73663
100650 0 .60061
If the volume of data warrants the use of the Parallel option then we can add the necessary hint to the above query as illustrated in the example below.
SELECT /*+ PARALLEL(mining_data_apply_v, 4) */
cust_id,
PREDICTION(DEMO_CLASS_DT_MODEL USING *) Pred,
PREDICTION_PROBABILITY(DEMO_CLASS_DT_MODEL USING *) Prob
FROM mining_data_apply_v
WHERE rownum <= 18
/
If you turn on autotrace you will see that Parallel was used. So you should now be able to use your Oracle Data Mining models to work on a Very large number of records and by adjusting the degree of parallelism you can improvements.
Using your Oracle Data Mining model in Batch mode using Parallel
When you want to perform some batch scoring of your data using your Oracle Data Mining model you will have to use the APPLY procedure that is part of the DBMS_DATA_MINING package. But the problem with using a procedure or function is that you cannot give it a hint to tell it to use the parallel option. So unless you have the tables(s) setup with parallel and/or the session to use parallel, then you cannot run your Oracle Data Mining model in Parallel using the APPLY procedure.
So how can you get the DBMA_DATA_MINING.APPLY procedure to run in parallel?
The answer is that you can use the DBMS_PARALLEL_EXECUTE package. The following steps walks you through what you need to do to use the DMBS_PARALLEL_EXECUTE package to run your Oracle Data Mining models in parallel.
The first step required is for you to put the DBMS_DATA_MINING.APPLY code into a stored procedure. The following code shows how our DEMO_CLASS_DT_MODEL can be used by the APPLY procedure and how all of this can be incorporated into a stored procedure called SCORE_DATA.
create or replace procedure score_data
is
begin
dbms_data_mining.apply(
model_name => 'DEMO_CLAS_DT_MODEL',
data_table_name => 'NEW_DATA_TO_SCORE',
case_id_column_name => 'CUST_ID',
result_table_name => 'NEW_DATA_SCORED');
end;
/
Next we need to create a Parallel Task for the DBMS_PARALLEL_EXECUTE package. In the following example this is called ODM_SCORE_DATA.
-- Create the TASK
DBMS_PARALLEL_EXECUTE.CREATE_TASK ('ODM_SCORE_DATA');
Next we need to define the Parallel Workload Chunks details
-- Chunk the table by ROWID
DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('ODM_SCORE_DATA', 'DMUSER', 'NEW_DATA_TO_SCORE', true, 100);
The scheduled jobs take an unassigned workload chunk, process it and will then move onto the next unassigned chunk.
Now you are ready to execute the stored procedure for your Oracle Data Mining model, in parallel by 10.
DECLARE
l_sql_stmt varchar2(200);
BEGIN
-- Execute the DML in parallel
l_sql_stmt := 'begin score_data(); end;';
DBMS_PARALLEL_EXECUTE.RUN_TASK('ODM_SCORE_DATA', l_sql_stmt, DBMS_SQL.NATIVE,
parallel_level => 10);
END;
/
When every thing is finished you can then clean up and remove the task using
BEGIN
dbms_parallel_execute.drop_task('ODM_SCORE_DATA');
END;
/
NOTE: The schema that will be running the above code will need to have the necessary privileges to run DBMS_SCHEDULER, for example
grant create job to dmuser;
No comments:
Post a Comment