Friday, May 6, 2011

Oracle Data Miner Comes of Age

I’ve recently had an article titled Oracle Data Miner Comes of Age accepted for the June edition of the UKOUG Oracle Scene article.

I’ve been thinking of ways to try to promote this article and I’ve decided I would create two videos and post them on YouTube.

The first video is a short 1 minute introduction to the article. A taster kind of video. I’ve learned from my initial attempts at producing the video that

  • It is more difficult than it looks
  • The camera on my laptop is not install straight. That is why I’m looking to one side
  • I need a better quality microphone

But perhaps the most interesting thing was that within a couple of hours of posting it up on YouTube (and not telling anyone about it), it was found and tweeted by Charlie Burger. Charlie is the Senior Director in charge of the Oracle Data Miner tool. He also very kindly tweeted about one of my blog postings on the New Features of Oracle Data Miner 11g R2.

You can find the introduction video to the article at

I will be posting an much long view, which will be based on the full article over the next couple of weeks

Tuesday, April 26, 2011

Job: ETL/Data Warehouse Consultant

Distinct Partners are a new opening for a ETL/Data Warehouse Consultant.

Following a period of growth, we are now looking for experienced ETL professionals to join our consultancy team. If you are looking for a challenge in management consultancy and believe that you have the qualities to succeed in a dynamic and high growth consultancy environment, then we would love to hear from you.

Expertise Required:

  • Data Integration skills (at least one of the following)
  • Proprietary: Informatica, SAS Data Integration Studio, IBM Data Stage, Oracle.
  • Open Source: Talend, Postgres, My SQL, CUDA, Python.
  • Strong querying , data analysis and data flow mapping skills (must)
  • Data quality skills – checks, standardisation, house holding etc (understanding)
  • Data architecture skills (understanding)
  • data modelling (normalisation, referential integrity etc)
  • dimension modelling (dimensions, facts, SCDs etc
  • XML scripting and open source data integration skills (strong plus)
  • Database/ETL performance tuning and programming skills

Full details can be found at

http://www.linkedin.com/jobs?viewJob=&jobId=1556819&svfId=822045&goback=%2Emid_I2774654840*42

If you would like to apply for the job you can email your CV to

Gina Cassidy   gina.cassidy@distinctpartners.com

and mention you heard about the job from me

My Blog & others

Over the past few years I have been contributing on Data Mining and Oracle Data Miner topics on the BI-Quotient blog

http://www.business-intelligence-quotient.com/

Over the past few months I have decided to expand my blog postings to include all the things I’m currently doing or things that I find interesting. The main theme will be ‘Data is King’

The new blog will include posts on the following topics:

  • Oracle
  • Oracle Data Miner
  • Data Mining
  • Data Management
  • My research
  • Database Design
  • and generally anything else that I find interesting and relating to Data.

This is where this blog come into its own. This will be my main blog going forward. It will contain all my posts, including a copy of these that I post on the BI-Quotient blog

Deputy Editor for Oracle Scene (June edition)

Today I got a phone call from Jennifer from the UKOUG office asking me would I be interested in helping out with some (minor) editing of 4 articles for the June edition of Oracle Scene.

I will also have an article in this edition of Oracle Scene (a 5 page spread).

I’ve had a quick look through the 4 articles and they are an interesting bunch of articles.

Oracle Scene will be holding elections over the coming months for a more longer term deputy editor. This will go out to the user community for a public vote. I might put might name forward for this.

VirtaThon–Online Confernce July

Yesterday I received an email telling me that my presentation submission for VirtaThon (Virtual Conference for the Oracle, Java & MySQL Communities).

The presentation is titled, Getting Started with Oracle Data Miner 11G R2.

I would really like to give an online demo of the tools or even to be able to show a view of the demo, but it looks like I may have to do it with good old Powerpoint.

Thursday, April 21, 2011

Recent ODM activity

Over the past couple of weeks I’ve been a little bit busy with some Oracle Data Miner 11gR2 related activities. These include

  • Writing an article called Oracle Data Miner Comes of Age for submission to Oracle Scene, the UKOUG quarterly magazine. I was told on 20th April that my article was accepted and will be in the June edition
  • The call for presentations opened for the annual UKOUG conference in Birmingham in December. I submitted a presentation which will be based on the article in Oracle Scene.
  • I submitted 2 presentations to Oracle Open World in October. But funding might be a problem here. I’ve asked the ODM development group to see if they could sponsor some of the costs. One presentation is on Oracle Data Miner. The second is on
  • I also submitted a presentation to an online (virtual) Oracle conference called VirtaThon, again on Oracle Data Miner.

Some other things that I have planned are

  • Create two videos for the Oracle Scene article. The first video is a short intro to the article. The plan is to have this on the UKOUG website to promote the article. The second video will be based on the article, covering the material and the demo in the article
  • Create a video on creating an ODM repository and getting started with ODM
  • Create a video on removing the ODM repository
  • Create a video on saving/exporting a DM model from ODM
  • Write an article on what Oracle products can be used throughout the Data Mining LifeCycle (CRISP-DM). Hopefully I will submit this for the autumn edition of Oracle Scene.
  • Get all the documentation available on the data manipulation stage in the new ODM tool and write an article based on this, produce a video of it, etc

All of this to be finished by the middle of June.

So I have a busy few weeks ahead of me.

Oracle Data Miner 11g R2 – New Features

There are many new features in the new tool and these can be grouped under the following headings:

    Data Exploration

: The first step of every data mining project involves investigating the data to try to learn from the data, gather some initial information and investigate if there are any patterns in the data

    Workflow interface

: This gives the user a more intuitive way to work with the tool and with the overall process of data mining. It allows for the repeated rerunning of the data modelling process without having to input and define each step again. You had to do this in the previous version of the tool

    Generate multiple models at the same time

: This is one of the major improvements in the tool. It allows you to create models using each of the algorithms available for each data mining techniques, in one step, instead of repeatedly defining each in the pervious version of the tool.

    Graphical representations of models

: Another major new feature. The tool now produces Decision Trees and Clusters graphically. With the Decisions Trees we can now see on the screen how the tree looks and then to investigate the different branches of it to see how the tree was built. We can also see what rules were generated to create these branches.

Evaluation of all the developed models

: Another major new feature. In the previous version of the tool you were presented with a set of evaluation diagrams and measures for each model. You were not able to see all the results on one graph and you had to resort to having multiple windows open at the same time to try to compare the results. Now we can get the evaluation measures and graphs for all the models on the one set of graphs. This allows a data miner to concentrate on determining the most appropriate model to use.

Each of these new features really deserve a post by themselves to illustrate their new capabilities. These posts will follow over the coming weeks.

Brendan


Friday, March 4, 2011

2010 Rexer Analytics Data Mining Survey

The Rexer Analytics 4th Annual Rexer Analytics Data Miner Survey for 2010 is now available.  735 data miners participated in the 2010 survey. The main highlights of the survey are:

FIELDS & GOALS:  Data miners work in a diverse set of fields.  CRM / Marketing has been the #1 field in each of the past four years.  Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals are also the goals identified by the most data miners surveyed.

ALGORITHMS:  Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners.  However, a wide variety of algorithms are being used.  This year, for the first time, the survey asked about Ensemble Models, and 22% of data miners report using them. 
A third of data miners currently use text mining and another third plan to in the future.

MODELS:  About one-third of data miners typically build final models with 10 or fewer variables, while about 28% generally construct models with more than 45 variables.

TOOLS:  After a steady rise across the past few years, the open source data mining software R overtook other tools to become the tool used by more data miners (43%) than any other.  STATISTICA, which has also been climbing in the rankings, is selected as the primary data mining tool by the most data miners (18%).  Data miners report using an average of 4.6 software tools overall.  STATISTICA, IBM SPSS Modeler, and R received the strongest satisfaction ratings in both 2010 and 2009.

TECHNOLOGY:  Data Mining most often occurs on a desktop or laptop computer, and frequently the data is stored locally.  Model scoring typically happens using the same software used to develop models.  STATISTICA users are more likely than other tool users to deploy models using PMML.

CHALLENGES: As in previous years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners face.  This year data miners also shared best practices for overcoming these challenges.  

FUTURE:  Data miners are optimistic about continued growth in the number of projects they will be conducting, and growth in data mining adoption is the number one “future trend” identified.  There is room to improve:  only 13% of data miners rate their company’s analytic capabilities as “excellent” and only 8% rate their data quality as “very strong”.

You can request a copy of the full report by going to their data mining survey webpage

Tuesday, March 1, 2011

Changing the Domain of Oracle Database Server

I recently had the task of moving our Database server onto a new domain. The following steps outline what was involved in performing this task.

1. Change the Domain of the server

  • Take the server of the current domain
  • Reboot
  • Change the domain to the new one
  • Reboot

2. Update the Listener.ora and Tnsnames.ora

  • Change to the new domain name

3. Make sure the the instance is running

  • sqlplus / as sysdba

4. Drop the Enterprise Manager Console

  • emca -deconfig dbcontrol db -repos drop
  • Enter the SID name (ORA11gDB)
  • Listener port number = 1521
  • Password for SYS user
  • Password for SYSMAN user
  • Do you wish to continue = Y
  • Depending on the size of the DB it can take some minutes to complete the (10 minutes)

5. Reinstall the Enterprise Manager Console

  • emca -config dbcontrol db -repos create
  • Enter the SID name (ORA11gDB)
  • Listener port number = 1521
  • Password for SYS
  • Password for DBSNMP
  • Password for SYSMAN
  • Email address for notification
  • Outgoing mail (SMTP)
  • Do you wish to continue = Y
  • Again this can take some minutes to complete (20 minutes

6. Restart the database

7. Test connections

8. All should be OK

Wednesday, February 23, 2011

New Oracle Data Miner tool is now Available

Today the new Oracle Data Miner tool has been made available as part of the SQL Developer 3.0 (Early Adoptor Release 4).


The new ODM tool has been significantly redeveloped, with a new workflow interface and new graphical outputs. These include graphical representations of the decision trees and clustering.

To download the tool and to read the release documentation go to
http://tinyurl.com/62u3m4y

If you download and use the new tool, let me know what you think of it.

Tuesday, February 22, 2011

Data Analytics Videos–CNBC–Big Brother–Big Business

The following list of videos are available on Youtube from the CNBC program Big Brother – Big Business. Each video is between 8 minutes and 10 minutes long.

They give a good incite into how data analytics can be used and is currently being used by organisations to gain new information and knowledge of what is going on in their business.

Most of the techniques used in the examples given in the videos do not use any complex technique, but shows how a business can use their data to gain a incite into what is really going on in business

Video1, Video2, Video3, Video4, Video5, Video6, Video7, Video8, Video9, Video10

Let me know what you think of these videos.

If you come across any other interesting Data Analysis videos, let me know and I can add them to the list above

Brendan Tierney

Wednesday, February 16, 2011

New Oracle Data Mining tool video

Charlie Berger has recently put together a video demonstrating the new Oracle Data Mining tool.
The link to this video is
http://tinyurl.com/6jhsth4

The video gives a demonstration of some of the main stets in building and applying a classification model. He also demonstrates applying classification to the same data.

The new ODM interface is due to me made available within the next month or two on a limited basis initially and will be part of an Early Adopter (EA) release of SQL Developer 3.

Brendan