With the releases of 12.1 and 12.2 of Oracle Database we have seen some new functions that perform approximate calculations. These include:
- APPROX_COUNT_DISTINCT
- APPROX_COUNT_DISTINCT_DETAIL
- APPROX_COUNT_DISTINCT_AGG
- APPROX_MEDIAN
- APPROX_PERCENTILE
- APPROX_PERCENTILE_DETAIL
- APPROX_PERCENTILE_AGG
These functions can be used when approximate answers can be used instead of the exact answer. Yes can have many scenarios for these and particularly as we move into the big data world, the ability to process our data quickly is slightly more important and exact numbers. For example, is there really a difference between 40% of our customers being of type X versus 41%. The real answer to this is, 'It Depends!', but for a lot of analytical and advanced analytical methods this difference doesn't really make a difference.
There are various reports of performance improvement of anything from 6x to 50x with the response times of the queries that are using these functions, instead of using the more traditional functions.
If you are a BI or big data analyst and you have build lots of code and queries using the more traditional functions. But what if you now want to use the newer functions. Does this mean you have go and modify all the code you have written over the years? you can imagine getting approval to do this!
The simple answer to this question is 'No'. No you don't have to change any code, but with some parameter changes for the DB or your session you can tell the database to automatically switch from using the traditional functions (count, etc) to the newer more optimised and significantly faster APPROX_* functions.
So how can you do this magic?
First let us see what the current settings values are:
SELECT name, value
FROM v$ses_optimizer_env
WHERE sid = sys_context('USERENV','SID')
AND name like '%approx%';
Now let us run a query to test what happens using the default settings (on a table I have with 10,500 records).
set auto trace on
select count(distinct cust_id) from test_inmemory;
COUNT(DISTINCTCUST_ID)
----------------------
1500
Execution Plan
----------------------------------------------------------
Plan hash value: 2131129625
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 70 (2)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
| 2 | VIEW | VW_DAG_0 | 1500 | 19500 | 70 (2)| 00:00:01 |
| 3 | HASH GROUP BY | | 1500 | 7500 | 70 (2)| 00:00:01 |
| 4 | TABLE ACCESS FULL| TEST_INMEMORY | 10500 | 52500 | 69 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Let us now set the automatic usage of the APPROX_* function.
alter session set approx_for_aggregation = TRUE;
SQL> select count(distinct cust_id) from test_inmemory;
COUNT(DISTINCTCUST_ID)
----------------------
1495
Execution Plan
----------------------------------------------------------
Plan hash value: 1029766195
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 5 | 69 (0)| 00:00:01 |
| 1 | SORT AGGREGATE APPROX| | 1 | 5 | | |
| 2 | TABLE ACCESS FULL | TEST_INMEMORY | 10500 | 52500 | 69 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
We can see above that the APPROX_* equivalent function was used, and slightly less work. But we only used this on a very small table.
The full list of session level settings is:
alter session set approx_for_aggregation = TRUE;
alter session set approx_for_aggregation = FALSE;
alter session set approx_for_count_distinct = TRUE;
alter session set approx_for_count_distinct = FALSE;
alter session set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter session set approx_for_percentile = PERCENTILE_DISC;
alter session set approx_for_percentile = NONE;
Or at a system wide level:
alter system set approx_for_aggregation = TRUE;
alter system set approx_for_aggregation = FALSE;
alter system set approx_for_count_distinct = TRUE;
alter system set approx_for_count_distinct = FALSE;
alter system set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter system set approx_for_percentile = PERCENTILE_DISC;
alter system set approx_for_percentile = NONE;
And to reset back to the default settings:
alter system reset approx_for_aggregation;
alter system reset approx_for_count_distinct;
alter system reset approx_for_percentile;