Tuesday, January 29, 2019

Machine Learning on Oracle Autonomous Data Warehouse

Last week I wrote a blog post about how long it took to create machine learning models on Oracle Database Cloud service. There was some impressive results and some surprising results too.

I decided to try out the exact same tests, using the exact same data on the Oracle Autonomous Data Warehouse Cloud service (ADW).



When creating the ADW service I took the basic configuration and didn't change anything. The inbuilt machine learning for the Autonomous service will magically workout my needs and make the necessary adjustments, Right? It can handle any data volume and any data processing requirements, Right?

Here are the results.


* You will notice that there is no time given for creating a SVM model for the 10M record data set.  After waiting for 4 hours I got bored and gave up waiting (I actually did this three time to make sure it wasn't a once off) 

Update: The SVM Model for the 10M record data set eventually finished after 547 minutes! That's 9.1 hours !

[I also had a 50M record data set. I just didn't waste time trying that.]

[Neural Networks algorithm hasn't been ported onto ADW at this point in time]

If you look back at the results from using the DBaaS you will see it was significantly quicker than the ADW.  (for some it would be quicker using Python on my laptop)

Before you believe the hype, go test it yourself and make sure it measures up.

I re-ran my test cases over a number of days to see if the machine learning aspect of the Autonomous kicked in to learn from the processing and make any performance improvements. Sadly the results were basically the same or slightly slower. Disappointing.

When some tells you, you should be using this, ask them have they actually used and tested it themselves. And more importantly, don't believe them. Go test it yourself.

Friday, January 25, 2019

How long does it take to build a Machine Learning model using Oracle Cloud

Everyday someone talks about the the processing power needed for Machine Learning, and the vast computing needed for these tasks. It has become evident that most of these people have never created a machine learning model. Never. But like to make up stuff and try to make themselves look like an expert, or as I and others like to call them a "fake expert".

When you question these "fake experts" about this topic, they huff and puff about lots of things and never answer the question or try to claim it is so difficult, you simply don't understand.

Having worked in the area of machine learning for a very very long time, I've never really had performance issues with creating models. Yes most of the time I've been able to use my laptop. Yes my laptop to build models large models. In a couple of these my laptop couldn't cope and I moved onto a server.

But over the past few years we keep hearing about using cloud services for machine learning. If you are doing machine learning you need to computing capabilities that are available with cloud services.
So, the results below show the results of building machine learning models, using different algorithms, with different sizes of data sets.

For this test, I used a basic cloud service. Well maybe it isn't basic, but for others they will consider it very basic with very little compute involved.

I used an Oracle Cloud DBaaS for this experiment. I selected an Oracle 18c Extreme edition cloud service. This comes with the in-database machine learning option. This comes with 1 OCPUs, 7.5G Memory and 170GB storage. This is the basic configuration.


Next I created data sets with different sizes. These were based on one particular data set, as this ensures that as the data set size increases, the same kind of data and processing required remained consistent, instead of using completely different data sets.

The data set consisted of the following number of records, 72K, 660K, 210K, 2M, 10M and 50M.
I then created machine learning models using Decisions Tree, Naive Bayes, Support Vector Machine, Generaliszd Linear Models (GLM) and Neural Networks. Yes it was a typical classification problem.
The following table below shows the length of time in seconds to build the models. All data preparations etc was done prior to this.

Note: It should be noted that Automatic Data Preparation was turned on for these algorithms. This performed additional algorithm specific data preparation for each model. That means the times given in the following tables is for some data preparation time and for building the models.



Converting the above table into minutes.



It is clear that the Neural Network model takes a lot longer to build than all the other algorithms. In this test the Neural Network model had only one hidden layer.

When we chart the build timings, leaving out Neural Networks, we get.
 
We can see Naive Bayes, Decision Tree, GLM and SVM algorithms have very similar model build timings, but as the data volumes increase the Decision Tree algorithm become less efficient.

Overall it doesn't take a long time to build models. In a way it is a very trivial task!

I mentioned at the start of this post I had created a data set of 50M records. Unfortunately I wasn't able to get models build for this data set using this cloud instance. It used used so much TEMP tablespace that the file volumes on my cloud instance ran out of space!

I suppose if I wanted to go bigger with my data, I needed a bigger boat!
I haven't included any timings for model scoring using these models. Why? the scored data is immediately returned event for large the largest data sets.