Brendan Tierney - Oralytics Blog

Saturday, May 26, 2018

UKOUG EPM & Hyperion Event 6th June, 2018

Come up really soon is the annual EPM and Hyperion event organised by the UKOUG. This year it will be on 6th June at Sandown Park Racecourse (Portsmouth Rd, Esher, KT10 9AJ).

If you have ever been to Sandown Park Racecourse you will know how fantastic a venue it is. And if you have been to a previous UKOUG EPM & Hyperion event before you will know how amazing it is.

Just go and book you place for this event now. If you are a UKOUG member this event could be free (depending on level of membership) but if you aren't a member, you can still go to this event. You can either become a UKOUG member and attend the event or pay the small event fee and attend the event.

From what I've heard a lot of people have already signed up to go, and I can see why. The agenda is jammed packed with end-user and customer case studies, as well as presentations from key people from Oracle people and and leading Oracle partners.

The exhibition space has been sold out! This will give you plenty of opportunities to get talking to various partners and service providers, and get your key questions answered at this event.

There will be a Panel Session at the end of the day. I love these panel sessions and gives everyone a chance to ask questions or to listen, or to join in with the discussion.

Lots and lots of value and learning to be had at this event. Go register now.

Monday, May 21, 2018

Creating a Word Cloud using Python

Over the past few days I've been doing a bit more playing around with Python, and create a word cloud. Yes there are lots of examples out there that show this, but none of them worked for me. This could be due to those examples using the older version of Python, libraries/packages no long exist, etc. There are lots of possible reasons. So I have to piece it together and the code given below is what I ended up with. Some steps could be skipped but this is what I ended up with.

Step 1 - Read in the data

In my example I wanted to create a word cloud for a website, so I picked my own blog for this exercise/example. The following code is used to read the website (a list of all packages used is given at the end).

import nltk
from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "http://www.oralytics.com/"
html = urlopen(url).read()
print(html)

The last line above, print(html), isn't needed, but I used to to inspect what html was read from the webpage.

Step 2 - Extract just the Text from the webpage

The Beautiful soup library has some useful functions for processing html. There are many alternative ways of doing this processing but this is the approached that I liked.

The first step is to convert the downloaded html into BeautifulSoup format. When you view this converted data you will notices how everything is nicely laid out.

The second step is to remove some of the scripts from the code.

soup = BeautifulSoup(html)
print(soup)

# kill all script and style elements
for script in soup(["script", "style"]):
    script.extract()    # rip it out
    
print(soup)

Step 3 - Extract plain text and remove whitespacing

The first line in the following extracts just the plain text and the remaining lines removes leading and trailing spaces, compacts multi-headlines and drops blank lines.

text = soup.get_text()
print(text)

# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text)

Step 4 - Remove stop words, tokenise and convert to lower case

As the heading says this code removes standard stop words for the English language, removes numbers and punctuation, tokenises the text into individual words, and then converts all words to lower case.

#download and print the stop words for the English language
from nltk.corpus import stopwords
#nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
print(stop_words)

#tokenise the data set
from nltk.tokenize import sent_tokenize, word_tokenize
words = word_tokenize(text)
print(words)

# removes punctuation and numbers
wordsFiltered = [word.lower() for word in words if word.isalpha()]
print(wordsFiltered)

# remove stop words from tokenised data set
filtered_words = [word for word in wordsFiltered if word not in stopwords.words('english')]
print(filtered_words)

Step 5 - Create the Word Cloud

Finally we can create a word cloud backed on the finalised data set of tokenised words. Here we use the WordCloud library to create the word cloud and then the matplotlib library to display the image.

from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(max_words=1000, margin=10, background_color='white',
               scale=3, relative_scaling = 0.5, width=500, height=400,
               random_state=1).generate(' '.join(filtered_words))

plt.figure(figsize=(20,10))
plt.imshow(wc)
plt.axis("off")
plt.show()
#wc.to_file("/wordcloud.png")

We get the following word cloud.

Step 6 - Word Cloud based on frequency counts

Another alternative when using the WordCloud library is to generate a WordCloud based on the frequency counts. For this you need to build up a table containing two items. The first item is the distinct token and the second column contains the number of times that word/token appears in the text. The following code shows this code and the code to generate the word cloud based on this frequency count.

from collections import Counter

# count frequencies
cnt = Counter()
for word in filtered_words:
    cnt[word] += 1

print(cnt)

from wordcloud import WordCloud
import matplotlib.pyplot as plt

wc = WordCloud(max_words=1000, margin=10, background_color='white',
               scale=3, relative_scaling = 0.5, width=500, height=400,
               random_state=1).generate_from_frequencies(cnt)

plt.figure(figsize=(20,10))
plt.imshow(wc)
#plt.axis("off")
plt.show()

Now we get the following word cloud.

When you examine these word cloud to can easily guess what the main contents of my blog is about. Machine Learning, Oracle SQL and coding.

What Python Packages did I use?

Here are the list of Python libraries that I used in the above code. You can use PIP3 to install these into your environment.

nltk
url open
BeautifulSoup
wordcloud
Counter

Wednesday, May 9, 2018

Oracle Machine Learning Users on ADWC

One of the new features of the Autonomous Data Warehouse Cloud (ADWC) service is Oracle Machine Learning. This is a Zeppelin based notebook for your machine learning on ADWC. Check out my previous blog post about this.

In order to be able to use this new product and the in-database machine learning in ADWC, you will need your database user to have certain privileges. The first step in this is to create a typical user for accessing the ADWC and grant it the necessary OML privileges.

To do this open the ADWC console and then open the Service Console.

OML2

This will then open a new admin page which contains a link for 'Manage Oracle ML User'. Click on this.

OML1

You can then enter the Username, Password and other details for the user, and then click Create.
This will then create a new user that is specific for Oracle Machine Learning. This new user will be granted the DWROLE, that contains the basic schema privileges and the privileges required to run the in-database machine learning algorithms. For those that a familiar with Oracle Data Mining/Oracle Advanced Analytics option in the Enterprise Edition of the Oracle database, you will see that these privileges are very similar.

You can examine the privileges granted to this DWROLE in the database as an administrator. When you do you will see the following:

CREATE ANALYTIC VIEW 
CREATE ATTRIBUTE DIMENSION 
ALTER SESSION 
CREATE HIERARCHY 
CREATE JOB 
CREATE MINING MODEL 
CREATE PROCEDURE 
CREATE SEQUENCE 
CREATE SESSION 
CREATE SYNONYM 
CREATE TABLE 
CREATE TRIGGER 
CREATE TYPE 
CREATE VIEW READ,WRITE ON directory DATA_PUMP_DIR 
EXECUTE privilege on the PL/SQL package DBMS_CLOUD

Monday, April 16, 2018

My 4th Book is now available - Data Science (The MIT Press Essential Knowledge series)

I almost forgot, but my 4th book has been published !

It is titled 'Data Science' and is published by MIT Press as part of their Essentials Knowledge series, and is co-written with John Kelleher.

It is available on Amazon in print, Kindle and Audio formats. Go check it out.

This book gives a concise introduction to the emerging field of data science, explaining its evolution, relation to machine learning, current uses, data infrastructure issues and ethical challenge the goal of data science is to improve decision making through the analysis of data. Today data science determines the ads we see online, the books and movies that are recommended to us online, which emails are filtered into our spam folders, even how much we pay for health insurance.

Go check it out.

Amazon.com

Amazon.co.uk

Sunday, April 1, 2018

Predicting IBS in Dogs using Oracle 18c in the Cloud

Over the past 6-8 months I've been working on a project with the Veterinary School of Medicine, part of University of Dublin. The project was focused on using Machine Learning to find patterns in blood tests and x-rays from dogs who are are suffering with Irritable Bowel Syndrome (IBS) in dogs.

Over the past five years the Veterinary School of Medicine has built up a very larger data set of blood samples, x-rays, family history of the dog, eating habits of the dogs, etc.

This project has finally completed and it is only now that I can share with you some of the results and lessons we learned during the project.

The first part of the project was getting all this historical data loaded into and Oracle 12.2c database. This was a relatively simple task but this it did take a bit of coding to get the data model correct and to perform the necessary data transformations and integration needed, as illustrated in the following diagram. NewImage

Once the data was loaded into the database we could start using the in-database Machine Learning algorithms to find patterns that indicated if a dog was suffering from IBS, and ideally if they were in the early stages of IBS. The treatments for early detection had a higher success rate.

We began using the GUI Oracle Data Miner tool, part of SQL Developer, to build the predictive model workflows.

NewImage

But the results we were getting was very disappointing. We were getting average predictive accuracy of 56% for all the models. This is not good and not much better than flipping a coin.

We were a bit stuck at this point about what to do next, and then Oracle's Cloud Engagement team heard about our troubles and suggested that we join the Oracle 18c beta. They got us setup and running with a beta Oracle 18c cloud instance in no time and within a couple of days we are generating new machine learning models.

Oracle 18c has a number of new features that Oracle thought would be useful to use. Firstly they had a new and improved Neural Networks algorithm that had better accuracy when working with images stored as BLOBs, plus there was a number of new SQL analytic functions. One particular function was TO_DOG_YEAR(). A year in a dogs life is not 365 days, and is dependent on the age of the dog as the length of a dog year changes with age and the breed of the dog. Some recent research indicates the geography location and origin of the dog also plays a part.

The syntax of this function is

TO_DOG_YEAR ( 
   DOB			DATE
   FEMALE		BOOLEAN
   NLS_BREED	STRING )

If you do a bit of googling you will find other blog posts that discuss this new function. Some of these can be found here and here.

Oracle has been working with veterinary schools around the world on this problem and hence the introduction of this new SQL analytic function in Oracle 18c. This function accepts the DOB of the dog (DATE), if the dog is Female, and the Breed of the dog. It then calculates the appropriate age of the dog down to the nearest day. We built this into our machine learning workflow, and were very surprised by the outcomes. The predictive accuracy of the models went from 56% to 93%. That is an amazing jump. Perhaps too amazing. But after a few days of extra validation we concluded the difference was down to the new TO_DOG_YEAR() function and the ability to so accurately calculate the age of the dog.

In the last few weeks we have noticed, after the latest Oracle 18c patch has automatically been applied, that this function now has an additional parameter, this is OWN_SMOKE, and seems to indicated if the dog is owned by someone who smokes. This will indeed affect the age of the animal. We having had a chance to try this new parameter yet, but hope to soon.

The following diagram shows the updated workflow along with the transformation node that uses the TO_DOG_YEAR() function.

NewImage

If you do a bit of googling you will find lots of research by various veterinary schools around the world, who spend so much time researching the various aspects that apply to this calculation. It was this research and Oracle's involvement in previous research that resulted in the TO_DOG_YEAR() function being included in Oracle 18c.

A more detailed research paper going to be published in the International Journal of Veterinary Science and Medicine in June 2018 (Volume 6, Issue 1). This paper will explain in more details of the effects of age, breed and sex has on the accuracy of the machine learning models.

We have also been asked to submit our project to Oracle Open World, and it is currently been considered for early selection. This will allow OOW to include this project in their promotional material.

Update 2nd April

A little update for those who didn’t realize this was posted on 1st of April. It was an April Fool common idea from some Oracle Community buddies on the post-UKOUG_TECH17 trip. And what remains true all the year is how this community is full of awesome people. Check out some of the other posts.

Franck Pachot: After IoT, IoP makes its way to the database

Martin Berger: tested performance

Pieter Van Puymbroeck: realized it was offloaded in Exadata

Øyvind Isene: provides a way to test it with a cloud discount

Connor McDonald/AskTom: yes they also joined in with the fun

Thursday, March 29, 2018

Oracle Machine Learning notebooks

With the recent release of Oracle's Autonomous Data Warehouse Cloud (ADWC), Oracle has given data scientists a new tool for data discovery and machine learning on the ADWC. Oracle Machine Learning is based on Apache Zeppelin and gives us a new machine learning tool for accessing the in-database machine learning algorithms and in-database statistical functions.

Oracle Machine Learning (OML) SQL notebooks provide easy access to Oracle's parallelized, scalable in-database implementations of a library of Oracle Advanced Analytics' machine learning algorithms (classification, regression, anomaly detection, clustering, associations, attribute importance, feature extraction, times series, etc.), SQL, PL/SQL and Oracle's statistical and analytical SQL functions. Oracle Machine Learning SQL notebooks and Oracle Advanced Analytics' library of machine learning SQL functions combined with PL/SQL allow companies to automate their discovery of new insights, generate predictions and add "AI" to data viz dashboards and enterprise applications.

The key features of Oracle Machine Learning include:

Collaborative SQL notebook UI for data scientists
Packaged with Oracle Autonomous Data Warehouse Cloud
Easy access to shared notebooks, templates, permissions, scheduler, etc.
Access to 30+ parallel, scalable in-database implementations of machine learning algorithms
SQL and PL/SQL scripting language supported
Enables and Supports Deployments of Enterprise Machine Learning Methodologies in ADWC

Here is a list of key resources for Oracle Machine Learning:

4368602

Monday, March 12, 2018

Oracle Code Presentation March 2018

Last week I was presenting at Oracle Code in New York. I've presented at a few Oracle Code events over the past 12 months and it is always interesting to meet and talk with developers from around the World.

The title of my presentation this time was 'SQL: The one language to rule all your data'.

I've given this presentation a few times at different events (POUG, OOW, Oracle Code). I take the contents of this presentation for granted and that most people know these things. But the opposite is true. Well a lot of people do know these things, but a magnitude more do not seem to know.

For example, at last weeks Oracle Code event, I had about 100 people in the room. I started out by asking the attendees 'How many of you write SQL every day?'. About 90% put up their hand. Then a few minutes later after I start talking about various statistical functions in the database, I then ask them to 'Count how many statistical functions they have used?' I then asked them to raise their hands if they use over five statistical functions. About eight people put up their hands. Then I asked how many people use over ten functions. To my surprise only one (yes one) person put up their hand.

The feedback from the attendees was fantastic and they were very eager to go back to their day jobs and start implementing better SQL code and to learn more about the database. All they have to do is to send me 15% of their pay rises (a bit of a joke during the presentation. You had to be there ...)

The first half of the presentation talks about statistical, analytical and machine learning in the database.

The second half covers some (not all) of the various data types and locations of data that can be accessed from the database.

The presentation then concludes with the title of the presentation about SQL being the one language to rule all your data.

Based on last weeks experience, it looks like a lot more people need to hear it !

Hopefully I'll get the chance to share this presentation with other events and Oracle User Group conferences.

Two of the key take away messages are:

Google makes us stupid
We need to RTFM more often

Here is a link to the slides on SlideShare

SQL : The one language to rule all your data from Brendan Tierney

And I recorded a short video about the presentation with Bob from OTN/ODC.

Wednesday, March 7, 2018

Oracle 18c New Oracle Advanced Analytics (Machine Learning) features

With each release of the Oracle Database we get new Machine Learning features, under the umbrella term of Oracle Advanced Analytics option (OAA).

With Oracle 18c we get the following new features, that include new machine learning algorithms, improvements to machine learning algorithms, and meta-data improvements for registering new R based algorithms.

These new OAA features include:

New Time-Series function : This new function forecasts target value based solely on a known history of target values and uses the popular auto-regressive modelling method.

New Model Detail Views : Previously you could inspect the details of a model using a function. This is being phased out and replaced by model view, with the format DM$VA

New Neural Networks Algorithm : With the growing interest in deep learning, Oracle have now included a neural network algorithm into the database, thus providing SQL and PL/SQL interfaces to all for easy of use and easy of integration into applications.

New Random Forest Algorithm : Random Forests has been proven over the past few years to be very accurate for certain types of classification problems. This algorithm has now been included in the database, with SQL and PL/SQL interfaces.

Improved Sampling for Association Rules : A new specialised sampling approach is introduced for Association Rules. This is to improve performance, while maintaining accuracy, for large/big data sets.

Algorithm Meta Data Registration : Simplifies the integration of new algorithms in the R extensibility framework. This feature allows a uniform consistent approach of registering new algorithm functions and their settings.

New Exponential Smoothing Algorithm : This allows for users to make predictions from time series data, and includes 14 models, including the popular Holt (trend) and Holt-Winters (trend and seasonality) models, and the ability to handle irregular time series intervals.

New CUR Decomposition-based Algorithm for Attribute and Row Importance : Most algorithms focus on identifying columns or rows that are important within their data sets. This algorithm has the added feature of also identifying important rows.

As you can see there are a lot of machine learning new features in Oracle 18c. Each one of these new features will be explored in more detail in separate blog posts.

Monday, March 5, 2018

Python and Oracle : Fetching records and setting buffer size

If you used other languages, including Oracle PL/SQL, more than likely you will have experienced having to play buffering the number of records that are returned from a cursor. Typically this is needed when you are processing more than a few hundred records. The default buffering size is relatively small and by increasing the size of the number of records to be buffered can dramatically improve the performance of your code.

As with all things in coding and IT, the phrase "It Depends" applies here and changing the buffering size may not be what you need and my not help you to gain optimal performance for your code.

There are lots and lots of examples of how to test this in PL/SQL and other languages, but what I'm going to show you here in this blog post is to change the buffering size when using Python to process data in an Oracle Database using the Oracle Python library cx_Oracle.

Let us begin with taking the defaults and seeing what happens. In this first scenario the default buffering is used. Here we execute a query and the process the records in a FOR loop (yes these is a row-by-row, slow-by-slow approach.

import time

i = 0
# define a cursor to use with the connection
cur2 = con.cursor()
# execute a query returning the results to the cursor
print("Starting cursor at", time.ctime())
cur2.execute('select * from sh.customers')
print("Finished cursor at", time.ctime())

# for each row returned to the cursor, print the record
print("Starting for loop", time.ctime())
t0 = time.time()
for row in cur2:
    i = i+1
    if (i%10000) == 0:
        print(i,"records processed", time.ctime())

              
t1 = time.time()
print("Finished for loop at", time.ctime())
print("Number of records counted = ", i)

ttime = t1 - t0
print("in ", ttime, "seconds.")

This gives us the following output.

Starting cursor at  10:11:43
Finished cursor at  10:11:43
Starting for loop  10:11:43
10000 records processed  10:11:49
20000 records processed  10:11:54
30000 records processed  10:11:59
40000 records processed  10:12:05
50000 records processed  10:12:09
Finished for loop at  10:12:11 
Number of records counted =  55500
in  28.398550033569336 seconds.

Processing the data this way takes approx. 28 seconds and this corresponds to the buffering of approx 50-75 records at a time. This involves many, many, many round trips to the the database to retrieve this data. This default processing might be fine when our query is only retrieving a small number of records, but as our data set or results set from the query increases so does the time it takes to process the query.

But we have a simple way of reducing the time taken, as the number of records in our results set increases. We can do this by increasing the number of records that are buffered. This can be done by changing the size of the 'arrysize' for the cursor definition. This reduces the number of "roundtrips" made to the database, often reducing networks load and reducing the number of context switches on the database server.

The following gives an example of same code with one additional line.

cur2.arraysize = 500

Here is the full code example.

# Test : Change the arraysize and see what impact that has
import time

i = 0
# define a cursor to use with the connection
cur2 = con.cursor()
cur2.arraysize = 500
# execute a query returning the results to the cursor
print("Starting cursor at", time.ctime())
cur2.execute('select * from sh.customers')
print("Finished cursor at", time.ctime())

# for each row returned to the cursor, print the record
print("Starting for loop", time.ctime())
t0 = time.time()
for row in cur2:
    i = i+1
    if (i%10000) == 0:
        print(i,"records processed", time.ctime())

              
t1 = time.time()
print("Finished for loop at", time.ctime())
print("Number of records counted = ", i)

ttime = t1 - t0
print("in ", ttime, "seconds.")

Now the response time to process all the records is.

Starting cursor at 10:13:02 Finished cursor at 10:13:02 Starting for loop 10:13:02 10000 records processed 10:13:04 20000 records processed 10:13:06 30000 records processed 10:13:08 40000 records processed 10:13:10 50000 records processed 10:13:12 Finished for loop at 10:13:13 Number of records counted = 55500 in 11.780734777450562 seconds.

All done in just under 12 seconds, compared to 28 seconds previously.

Here is another alternative way of processing the data and retrieves the entire results set, using the 'fetchall' command, and stores it located in 'res'.

# Test : Change the arraysize and see what impact that has
import time

i = 0
# define a cursor to use with the connection
cur2 = con.cursor()
cur2.arraysize = 500
# execute a query returning the results to the cursor
print("Starting cursor at", time.ctime())
cur2.execute('select * from sh.customers')

t0 = time.time()
print("starting FetchAll at", time.ctime())
res = cur2.fetchall()
              
t1 = time.time()
print("finished FetchAll at", time.ctime())

ttime = t1 - t0
print("in ", ttime, "seconds.")

Friday, March 2, 2018

Oracle 18c DBaaS Cloud Setup

The 18c Oracle DBaaS is now available. This is the only place that Oracle 18c will be available until later in 2018. So if you want to try it out, then you are going to need to get some Oracle Cloud credits, or you may already have a paying account for Oracle Cloud.

The following outlines the steps you need to go through to gets Oracle 18c setup.

1. Log into your Oracle Cloud

Log into your Oracle Cloud environment. Depending on your access path you will get to your dashboard.

Select Create Instance from the dashboard.

NewImage

2. Create a new Database

From the list of services to create, select Database.

NewImage

3. Click 'Create Instance'

NewImage

4. Enter the Database Instance details

Enter the details for your new Oracle 18c Database. I've called mine 'db18c'.

Then for the Software Release dropdown list, select 'Oracle Database 18c'.

Next select the Software Edition from the dropdown list.

NewImage

5. Fill in the Instance Details

Fill in the details for 'DB Name', 'PDB Name', 'Administration Password', 'Confirm Password', setup the SSH Public Key, and then decide if you need the Backup and Recovery option.

NewImage <

6. Create the DBaaS

Double check everything and when ready click on the 'Create' button.

NewImage

7. Wait for Everything to be Create

Now is the time to be patient and wait while your cloud service is created.

NewImage

I've created two different version of the 18c Oracle DBaaS. The Enterprise Edition to 30 minutes to complete and the High Performance service too 47 minutes.

No it's time to go play.

Sunday, February 18, 2018

18c is now available (but only on the Cloud)

On Friday afternoon (16th February) we started to see tweets and blog posts from people in Oracle saying that Oracle 18c was now available. But is only available on Oracle Cloud and Engineered Systems.

It looks like we will have to wait until the Autumn before we can install it ourselves on our own servers :-(

Here is the link to the official announcement for Oracle 18c.

Oracle 18c is really Oracle 12.2.0.2. The next full new release of the Oracle database is expected to be Oracle 19.

The new features and incremental enhancements in Oracle 18c are:

Multitenant
In-Memory
Sharding
Memory Optimized Fetches
Exadata RAC Optimizations
High Availability
Security
Online Partition Merge
Improved Machine Learning (OAA)
Polymorphic Table Functions
Spatial and Graph
More JSON improvements
Private Temporary Tablespaces
New mode for Connection Manager

And now the all important links to the documentation.

Oracle 18c Documentation

Oracle 18c New Features

Oracle 18c Data Warehousing

To give Oracle 18c a try you will need to go to cloud.oracle.com and select Database from the drop down list from the Platform menu. Yes you are going to need an Oracle Cloud account and some money or some free credit. Go and get some free cloud credits at the upcoming Oracle Code events.

If you want a 'free' way of trying out Oracle 18c, you can use Oracle Live SQL. They have setup some examples of the new features for you to try.

NOTE: Oracle 18c is not Autonomous. Check out Tim Hall's blog posts about this. The Autonomous Oracle Database is something different, and we will be hearing more about this going forward.

Tuesday, February 13, 2018

Oracle and Python setup with cx_Oracle

Is Python the new R?

Maybe, maybe not, but that I'm finding in recent months is more companies are asking me to use Python instead of R for some of my work.

In this blog post I will walk through the steps of setting up the Oracle driver for Python, called cx_Oracle. The documentation for this drive is good and detailed with plenty of examples available on GitHub. Hopefully there isn't anything new in this post, but it is my experiences and what I did.

1. Install Oracle Client
The Python driver requires Oracle Client software to be installed. Go here, download and install. It's a straightforward install. Make sure the directories are added to the search path.

2. Download and install cx_Oracle
You can use pip3 to do this.

pip3 install cx_Oracle

Collecting cx_Oracle
  Downloading cx_Oracle-6.1.tar.gz (232kB)
    100% |████████████████████████████████| 235kB 679kB/s
Building wheels for collected packages: cx-Oracle
  Running setup.py bdist_wheel for cx-Oracle ... done
  Stored in directory: /Users/brendan.tierney/Library/Caches/pip/wheels/0d/c4/b5/5a4d976432f3b045c3f019cbf6b5ba202b1cc4a36406c6c453
Successfully built cx-Oracle
Installing collected packages: cx-Oracle
Successfully installed cx-Oracle-6.1

3. Create a connection in Python
Now we can create a connection. When you see some text enclosed in angled brackets <>, you will need to enter your detailed for your schema and database server.

# import the Oracle Python library
import cx_Oracle

# define the login details
p_username = ""
p_password = ""
p_host = ""
p_service = ""
p_port = "1521"

# create the connection
con = cx_Oracle.connect(user=p_username, password=p_password, dsn=p_host+"/"+p_service+":"+p_port)

# an alternative way to create the connection
# con = cx_Oracle.connect('/@/:1521')

# print some details about the connection and the library
print("Database version:", con.version)
print("Oracle Python version:", cx_Oracle.version)


Database version: 12.1.0.1.0
Oracle Python version: 6.1

4. Query some data and return results to Python
In this example the query returns the list of tables in the schema.

# define a cursor to use with the connection
cur = con.cursor()
# execute a query returning the results to the cursor
cur.execute('select table_name from user_tables')
# for each row returned to the cursor, print the record
for row in cur:
    print("Table: ", row)

Table:  ('DECISION_TREE_MODEL_SETTINGS',)
Table:  ('INSUR_CUST_LTV_SAMPLE',)
Table:  ('ODMR_CARS_DATA',)

Now list the Views available in the schema.

# define a second cursor
cur2 = con.cursor()
# return the list of Views in the schema to the cursor
cur2.execute('select view_name from user_views')
# display the list of Views
for result_name in cur2:
    print("View: ", result_name)

View:  ('MINING_DATA_APPLY_V',)
View:  ('MINING_DATA_BUILD_V',)
View:  ('MINING_DATA_TEST_V',)
View:  ('MINING_DATA_TEXT_APPLY_V',)
View:  ('MINING_DATA_TEXT_BUILD_V',)
View:  ('MINING_DATA_TEXT_TEST_V',)

5. Query some data and return to a Panda in Python
Pandas are commonly used for storing, structuring and processing data in Python, using a data frame format. The following returns the results from a query and stores the results in a panda.

# in this example the results of a query are loaded into a Panda
# load the pandas library
import pandas as pd

# execute the query and return results into the panda called df
df = pd.read_sql_query("SELECT * from INSUR_CUST_LTV_SAMPLE", con)

# print the records returned by query and stored in panda
print(df.head())

 CUSTOMER_ID     LAST    FIRST STATE     REGION SEX    PROFESSION  \
0     CU13388     LEIF   ARNOLD    MI    Midwest   M        PROF-2   
1     CU13386     ALVA   VERNON    OK    Midwest   M       PROF-18   
2      CU6607   HECTOR  SUMMERS    MI    Midwest   M  Veterinarian   
3      CU7331  PATRICK  GARRETT    CA       West   M       PROF-46   
4      CU2624  CAITLYN     LOVE    NY  NorthEast   F      Clerical   

  BUY_INSURANCE  AGE  HAS_CHILDREN   ...     MONTHLY_CHECKS_WRITTEN  \
0            No   70             0   ...                          0   
1            No   24             0   ...                          9   
2            No   30             1   ...                          2   
3            No   43             0   ...                          4   
4            No   27             1   ...                          4   

   MORTGAGE_AMOUNT  N_TRANS_ATM  N_MORTGAGES  N_TRANS_TELLER  \
0                0            3            0               0   
1             3000            4            1               1   
2              980            4            1               3   
3                0            2            0               1   
4             5000            4            1               2   

  CREDIT_CARD_LIMITS  N_TRANS_KIOSK  N_TRANS_WEB_BANK       LTV  LTV_BIN  
0               2500              1                 0  17621.00   MEDIUM  
1               2500              1               450  22183.00     HIGH  
2                500              1               250  18805.25   MEDIUM  
3                800              1                 0  22574.75     HIGH  
4               3000              2              1500  17217.25   MEDIUM  

[5 rows x 31 columns]

6. Wrapping it up and closing things
Finally we need to wrap thing up and close our cursors and our connection to the database.

# close the cursors
cur2.close()
cur.close()

# close the connection to the database
con.close()

Useful links
cx_Oracle website
cx_Oracle documentation
cx_Oracle examples on GitHub

Watch out for more blog posts on using Python with Oracle, Oracle Data Mining and Oracle R Enterprise.

Pages

Saturday, May 26, 2018

Monday, May 21, 2018

Wednesday, May 9, 2018

Monday, April 16, 2018

Sunday, April 1, 2018

Thursday, March 29, 2018

Monday, March 12, 2018

Wednesday, March 7, 2018

Monday, March 5, 2018

Friday, March 2, 2018

Sunday, February 18, 2018

Tuesday, February 13, 2018