Brendan Tierney - Oralytics Blog: November 2014

Friday, November 14, 2014

OUG Ireland 2015 : Now open for Submissions

OUG Ireland Call for submissions is now open.

The closing date for submissions is 5th January, 2015.

and the submission webpage can be found here.

The OUG Ireland conference will be on Thursday 19th March. Yes it is only a one day conference :-( but we will be 5 or 6 or more streams. So there will be something for everyone and plenty of choice.

On Friday 20th March we will have Maria Colgan, formally the Optimizer Lady and now the In-Memory Queen (or something like that), giving a full day workshop on the In-Database option and the Optimizer. She will also be about for the main conference on the 19th, so you can expect a presentation or two from her on the Thursday.

Agenda selection day is the 8th January, 2015. So hopefully you will be getting the acceptance emails soon after that or during week of 12th January.

There is a committee of about 10 people who are involved in selecting presentations and setting the agenda. If it was up to me then I would accept everything/everyone. So if your presentation is not accepted this time, please don't blame me :-) I said YES to your presentation, I really, really did. I fought so hard to have your presentation included. If your presentation is not accepted then the blame is down to the other committee members :-)

The conference will be held in Croke Park, and is a 15-20 minute taxi ride from the Airport.

You can follow the Conference and other OUG Ireland events using the twitter tag #oug_ire

Wednesday, November 12, 2014

Approximate Count Distinct (12.1.0.2 new feature)

With the release of the Oracle Database 12.1.0.2 there was a number of new features and options. Most of the publicity has been around the in-Memory option. But there was lots of other features for the DBA and a few for the developer.

One of the new SQL functions is the APPROX_COUNT_DISTINCT(). This function is different to the tradition count distinct, COUNT(DISTINCT expression), in that is performs an approximate count distinct. The theory is that this approximate count is a lot more efficient than performing the full count distinct.

The APPROX_COUNT_DISTINCT() function is really only suitable when you are processing very large volumes of data and when the data set contains a large number of distinct values.

The general syntax of the function is:

... APPROX_COUNT_DISTINCT(expression) ...

and returns a Number.

The function returns the approximate number of records that contain distinct value for the expression.

SELECT approx_count_distinct(cust_id)

FROM mining_data_build_v;

The APPROX_COUNT_DISTINCT() function ignores records that contain a null value for the expression. Plus is performs less work on the sorting and aggregations. Just run and Explain Plan and you can see the differences.

In some of the material from Oracle the APPROX_COUNT_DISTINCT() function can be 5x to 50x++ times faster. But it depends on the number of distinct values and the complexity of the SQL query.

As the result / returned value from the function may not be 100% accurate, Oracle says that the functions has an accuracy of >97% (with 95% confidence).

The function cannot be used on the following data types: BFILE, BLOB, CLOB, LONG, LONG RAW and NCLOB

Friday, November 7, 2014

ODMr : Graph Node: Zooming in on Graphs

When Oracle Data Miner (ODMr) 4.0 (which is part of SQL Developer) came out back in late 2013 there was a number of new features added to the tool. One of these was a Graph node that allows us to create various graphs and charts that include Line, Scatter, Bar, Histogram and Box plot.

I've been using this node recently to produce graphs and particularly scatter plots. I've been using the scatter plots to graph the Actual values in a data set against the Predicted values that was generated by ODMr. In this scenario I had a separate data set for training my ODM data mining models and another testing data set for, well testing how well the model performed against an unseen data set.

In general the graphs produced by the Graph node look good and gives you the information that you need. But what I found was that as you increased the size of the data set, the scatter plot can look a messy. This was in part due to the size of the square used to represent a data point. As the volume of data increased then your scatter plot could just look like a coloured in area of blue squares. This is illustrated in the following image.

Graph node 1

What I discovered today is that you can zoom in on this graph to explore different regions and data point on it. This do this you need to select an data that is within the x-axis and y-axis area. When you do this you will see a box form on your graph that selects the area that you indicate by moving your mouse. After you have finished selecting the area, the Graph Node will zooms into this part of the graph and shows the data points. For example if I select the area from about 1000 on the x-axis and 1000 on the y-axis, I will get the following.

Graph node 2

Again if I select a similar are area of 350 on the x-axis and 400 on the y-axis I get the following zoomed area.

Graph node 3

You can keep zooming in on various areas.

At some point you will have finished zooming in and you will want to return to the original graph. To zoom back outward all you need to do in the graph is to click on it. When you do this you will go back to the previous step or image of the graph. You can keep doing this until you get back to the original graph. Alternatively you can zoom in and out on various parts of the graph.

Hopefully you will find this feature useful.

Pages