Showing posts with label Cloud. Show all posts
Showing posts with label Cloud. Show all posts

Monday, February 4, 2019

Machine Learning Models in Python - How long does it take

We keep hearing from people about all the computing resources needed for machine learning. Sometimes it can put people off from trying it as they will think I don't have those kind of resources.

This is another blog post in my series on 'How long does it take to create a machine learning model?'

Check out my previous blog post that used data sets containing 72K, 210K, 660K, 2M and 10M records.
There was some surprising results in those these.

In this test, I'll be using Python and SciKitLearn package to create models using the same algorithms. There are a few things to keep in mind. Firstly, although they maybe based on the same algorithms, the actual implementation of them will be different in each environment (SQL vs Python).

With using Python for machine learning, one of the challenges we have is getting access to the data. Assuming the data lives in a Database then time is needed to extract that data to the local Python environment. Secondly, when using Python you will be using a computer with significantly less computing resources than a Database server. In this test I used my laptop (MacBook Pro). Thirdly, when extracting the data from the database, what method should be used.

I've addressed these below and the Oracle Database I used was the DBaaS I used in my first experiment. This is a Database hosted on Oracle Cloud.

Extracting Data to CSV File
This kind of depends on how you do this. There are hundreds of possibilities available to you, but if you are working with an Oracle Database you will probably be using SQL Developer. I used the 'export' option to create a CSV file for each of the data sets. The following table shows how long it took for each data set.

As you can see this is an incredibly slow way of exporting this data. Like I said, there are quicker ways of doing this.

After downloading the data sets, the next step is to see how load it takes to load these CSV files into a pandas data frame in Python. The following table show the timings in seconds.


You can see that Python is very efficient at loading these data sets into a pandas data frame in my Python environment.

Extracting Data using cx_Oracle Python package
As I'll be using Python to create the models and the data exists in an Oracle Database (on Oracle Cloud), I can use the cx_Oracle package to download the data sets into my Python environment. After using the cx_Oracle package to download the data I then converted it into a pandas data frame.


You can see that using cx_Oracle to download the data is a very efficient way of accessing the data.  But if the data already exists in CSV files, then the previous method would be quicker to use.

I had the array fetch size set to 10,000.  I also experimented with smaller and larger numbers for the array fetch size, but 10,000 seemed to give a quickest results.

How long to create Machine Learning Models in Python
Now we get onto checking out the timings of how long it takes to create a number of machine learning models using different algorithms and using the default settings. The algorithms include Naive Bayes, Decision Tree, GLM, SVM and Neural Networks.


I had to stop including SVM in the tests as it was taking way too long to run. For example I killed the SVM model build on the 210K data set after it was running for 5 hours.

The Neural Network models created had 3 hidden layers.

In addition to creating the models, there was some minor data preparation steps performed including factorizing, normalization and one-hot-coding. This data preparation would be comparable to the automatic data preparation steps performed by Oracle, although Oracle Automatic Data Preparation does a bit of extra work.


At the point I would encourage you to look back at my previous blog posts on timings using Oracle DBaaS and ADW.  You will see that Python, in these test cases, was quicker at creating the machine learning models. But with Python the data needed to be extracted from the database and that can take time!

A separate consideration is being able to deploy the models. The time it takes to build models is perhaps not the main consideration. You need to consider ease of deployment and use of the models.

Friday, March 2, 2018

Oracle 18c DBaaS Cloud Setup

The 18c Oracle DBaaS is now available. This is the only place that Oracle 18c will be available until later in 2018. So if you want to try it out, then you are going to need to get some Oracle Cloud credits, or you may already have a paying account for Oracle Cloud.

NewImage

The following outlines the steps you need to go through to gets Oracle 18c setup.

1. Log into your Oracle Cloud

Log into your Oracle Cloud environment. Depending on your access path you will get to your dashboard.

Select Create Instance from the dashboard.

NewImage

2. Create a new Database

From the list of services to create, select Database.

NewImage

3. Click 'Create Instance'

NewImage

4. Enter the Database Instance details

Enter the details for your new Oracle 18c Database. I've called mine 'db18c'.

Then for the Software Release dropdown list, select 'Oracle Database 18c'.

Next select the Software Edition from the dropdown list.

NewImage

5. Fill in the Instance Details

Fill in the details for 'DB Name', 'PDB Name', 'Administration Password', 'Confirm Password', setup the SSH Public Key, and then decide if you need the Backup and Recovery option.

NewImage<

6. Create the DBaaS

Double check everything and when ready click on the 'Create' button.

NewImage

7. Wait for Everything to be Create

Now is the time to be patient and wait while your cloud service is created.

NewImage

I've created two different version of the 18c Oracle DBaaS. The Enterprise Edition to 30 minutes to complete and the High Performance service too 47 minutes.

No it's time to go play.

Tuesday, April 12, 2016

Oracle Advanced Analytics on Oracle Cloud

You have heard about the cloud? Right? Even the Oracle Cloud?

If you haven't, then maybe we need to look at how you can learn more about the Oracle Cloud.

Over the past while, and in the past few weeks in particular, Oracle has been advertising about how you can get a trail Oracle cloud service setup for FREE. Well it is free for one month when you set it up on the Oracle website (cloud.oracle.com).

As I like to talk about and use the Oracle Advanced Analytics (OAA) option (a lot), I thought I'd just give you some pointers on how to use OAA on the Oracle cloud.

To do this you need to set up an account on the Oracle cloud website (your Oracle single sign on should help with making that process a lot quicker). There are lots of websites and blog that will talk/show you through the process. Then you need to select what Database as a Service that you want to setup

OAA is not available on the Database Schema Service just yet (maybe one day they will)

Although Oracle Advanced Analytics comes pre-installed in the Oracle Enterprise Edition database (yes it is a separately priced option) when you install it on your own servers, but for the Enterprise Edition DaaS OAA is not part of it.

DaaS has the following versions

  • Standard Edition Service 
  • Enterprise Edition Service 
  • High Performance Service 
  • Extreme Performance Service

OAA is only available for these last two versions of the DaaS.

High Performance DaaS: Multitenant, Partitioning, Real Application Testing, Advanced Compression, Advanced Security, Label Security, Database Vault, OLAP, Advanced Analytics, Spatial and Graph, Diagnostics Pack, Tuning Pack, Database Lifecycle Management Pack, Data Masking & Subsetting Pack and Cloud Management Pack for Oracle Database.

Extreme Performance DaaS: In-Memory Database, RAC (Real Application Clusters), Active Data Guard, Multitenant, Partitioning, Real Application Testing, Advanced Compression, Advanced Security, Label Security, Database Vault, OLAP, Advanced Analytics, Spatial and Graph, Diagnostics Pack, Tuning Pack, Database Lifecycle Management Pack, Data Masking & Subsetting Pack and Cloud Management Pack for Oracle Database.

Oracle Advanced Analytics has two main products or components. The first is the in-database Oracle Data Mining features. This are part of the High Performance and Extreme Performance DaaS offerings. But Oracle R Enterprise is not installed on these DaaS. Well if kind of is if you can get an 11g DaaS, but at time of writing this post ORE is not part of the 12c DaaS images. So you will need to factor in some time to go and install ORE, if you need to use it.

I've been lucky to have one of these DaaS with OAA trials and with thanks to Thomas Kurian he has extended these trials to 12 months for all Oracle ACE Directors. Thank you Thomas.

When you get your DaaS setup you just need to configure your connection privileges, ssh, etc and away you go. All you need to do is to move your data across the internet to your own Oracle DaaS, and once it is in the DaaS all your OAA and other analytics is performed on the Database Server. Only the results are returned to you and displayed in your tool. This significantly reduces the processing time for your data and removes the need to constantly extract your data (in whole or in parts) to feed into other advanced analytics tools.

So if you haven't tried Oracle Advanced Analytics yet, then go ahead and setup your free trial of Oracle DaaS and try it out. You never know what you might discover by using Oracle Advanced Analytics (in the cloud)

NewImage

Friday, October 9, 2015

From Zero to Dashboards in 10min with BICS

Unless you have been going around with your head in the clouds, all you can hear from all the big names in the IT worlds is about moving to the Cloud. (yes that was a poor attempt at a joke. It is Friday afternoon after all).

The title of this post is 'From Zero to Dashboards in 5 min with BICS'.

BICS stands for Oracle BI Cloud Service.

Over the past few months I've been working rolling out BICS at a number of different sites, and it is surprising how quickly you can get up and running with BICS.

There is no install. There is the minimum of setup and configuration. All you need to do is to create a user and give them some privileges. All of that takes 2 minutes.

Next get them to log in to BICS (2 minutes for the first time they log in) and load up some data using the Data Load feature (another 2 minutes).

Then you can move onto the fun part of creating some Dashboards using Visual Analyzer. See the example below of one that we created in 4 minutes.

Bics 1

So all of the above took 10 minutes. How quick and easy is that!

Give it a go yourself and see how quick it is for you.

Let me know if you have any question on using BICS and if you need any help.