Wednesday, June 13, 2012
Data Science Is Multidisciplinary
A few weeks ago I had a blog post called Domain Knowledge + Data Skills = Data Miner.
In that blog post I was saying that to be a Data Scientist all you needed was Domain Knowledge and some Data Skills, which included Data Mining.
The reality is that the skill set of a Data Scientist will be much larger. There is a saying ‘A jack of all trades and a master of none’. When it comes to being a data scientist you need to be a bit like this but perhaps a better saying would be ‘A jack of all trades and a master of some’.
I’ve put together the following diagram, which includes most of the skills with an out circle of more fundamental skills. It is this outer ring of skills that are fundamental in becoming a data scientist. The skills in the inner part of the diagram are skills that most people will have some experience in one or more of them. The other skills can be developed and learned over time, all depending on the type of person you are.
Can we train someone to become a data scientist or are they born to be a data scientist. It is a little bit of both really but you need to have some of the fundamental skills and the right type of personality. The learning of the other skills should be easy(ish)
What do you think? Are their Skill that I’m missing?
Thursday, June 7, 2012
Review of Oracle Magazine–May/June 1996
The headline articles for the May/June 1996 edition of Oracle Magazine was an introduction to the Oracle Universal Server and how it can be used to give a flexible architecture for your growing organisation
Other articles included:
- Oracle Magazine goes interactive with the launch of the www.oramag.com. The initial site had Oracle Magazines from 1994 and 1995, along with subscription information, a Q&A area and a WebMaster comic strip
- There was a preview of Larry’s Network Computer (NC). It was supposed to be a low cost computing appliance optimised to operate on the internet and other highly distributed networks such as corporate LANs, is designed to provide users with simple, economical and robust communications and access to information. The NC will include a Web Terminal, ISDN Video Phone, Set-top Box, Two-way Pager and a Personal Digital Assistant.
- Oracle Developer/2000 and Designer/2000 Release 1.3 is announced
- There is a review of how Cisco standardised on using Oracle 7 and how they went about the selection and implementation of Oracle Applications including financial, manufacturing and human resources applications.
- Integrating the WWW and Oracle Order Entry. Companies can now have an instant presence to the world but also, by examining the web-server activity logs, they gain the ability to see who the buyers are and who just browses
To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions.
Tuesday, June 5, 2012
OUG Ireland SIG Meetings 26th June
The next Oracle User Group in Ireland SIG meetings will be on Tuesday 26th June.
This will be a full day event and will comprise 2 SIGs, the BI& EPM and Applications.
The BI & EPM SIG presentations will be in the morning and the Applications SIG presentations will be in the afternoon.
A lot of work has been put into planning this full day event to come up with an agenda that people from both communities may be interested in.
Check out the full agenda page – click here.
To register for the event – click here.
Monday, June 4, 2012
OTN Developer Days–Dublin 12th to 14 June
The OTN Developer Days events return to the Oracle Dublin office in East Point this month from the 12th to the 14th.
These are free events, but places are limited, and allow you to get some hands-on training with these tools. Depending on the day and the topic there is a mixture of lecture and workshop, to just being a hands-on workshop.
12th June – Golden Gate 11g, Oracle Data Integrator 11g and Enterprise Data Quality (full day : 9:45-17:00)
13th June – Partitioning and Advanced Compression (9:45-13:00)
14th June – Unlocking the value of Oracle Database 11g Core Features (9:45-15:00)
These are free events and you will even get a free lunch from Oracle.
Monday, May 28, 2012
VM for Oracle Data Miner
Recently the OTN team have updated the ‘Database App Development’ Developer Day virtual machine to include Oracle 11.2.0.2 DB and SQL Developer 3.1. This is all you need to try out Oracle Data Miner.
So how do you get started with using Oracle Data Miner on your PC. The first step is to download and install the latest version of Oracle VirtualBox.
The next step is to download and install the OTN Developer Day appliance. Click on the above link to go to the webpage and follow the instructions to download and install the appliance. Download the first appliance on this page ‘Database App Development’ VM. This is a large download and depending on your internet connection it can take anything from 30 minutes to hours. So I wouldn’t recommend doing this over a wifi.
When you start up the VM your OS username and password is oracle. Yes it is case sensitive.
When the get logged into the VM you can close or minimise the host window
There are two important icons, the SQL Developer and the ODDHandsOnLab.html icons.
The ODDHandsOnLab.html icon loads a webpage what contains a number of tutorials for you to follow.
The tutorial we are interest in is the Oracle Data Miner Tutorial. There are 4 tutorials given for ODM. The first two tutorials need to be followed in the order that they are given. The second two tutorials can be done in any order.
If you have not used SQL Developer before then you should work through this tutorial before starting the Oracle Data Miner tutorials.
The first tutorial takes you through the steps needed to create your ODM schema and to create the ODM repository within the database. This tutorial will only take you 10 to 15 minutes to complete.
In the second tutorial you get to use the ODM to build your first ODM model. This tutorial steps your through how to get started with an ODM project, workflow, the different ODM features, how to explore the data, how to create classification models, how to explore the model and then how to apply one of these models to new data. This second tutorial will take approx. 30 to 40 minutes to complete.
It is all very simple and easy to use.
Thursday, May 24, 2012
UKOUG Conference-Submissions Deadline is 1st June
The call for presentations for Europe’s largest Oracle conference is currently open, but the deadline for submissions is approaching fast. The submission deadline is 1st June.
If you are interested in presenting there are a couple of things you need to do. The first step is that you need to register as a speaker. This just involves you registering your interest in being a speaker. The second step is to submit your presentation abstracts.
The conference will be in Birmingham between 3rd and 5th December. There are multiple streams including BI, Technology, Fusion, Middleware, Development, MySQL, Infrastructure, eBusiness Suite, Core Database, etc.
I gave my very first presentation at the annual UKOUG Conference a few years ago and I’ve presented a few times since. I would encourage everyone to give it a go. Pick a topic or topics that you have been working on over the past 12 months or more, or if you used a particular technique on a recent project, or you have discovered a particular work around, etc. submit a presentation on it.
I’ve submitted a few presentations, all of which are about data mining and the advanced analytics option in the Oracle Database. Two of these presentations will be co-presented with Antony Heljula, Peak Indicators. The presentations will be on including and using Oracle Data Mining models in OBIEE and on how we went about developing the Oracle Data Mining models for our project. Will they get accepted, I hope so, but the presentation selection is based on user voting. Everyone can get involved in the judging and voting of presentations.
Notifications of Acceptances or Rejections typically come out around the end of July or early August.
I’ve already booked by flight to Birmingham in December, so if my presentations get accepted or not, I’ll be there. It is a great conference.
Wednesday, May 23, 2012
The First Oracle Magazine–Volume1 Issue 1
In my last blog post I reviewed the contents of the March/April 1996 edition of Oracle Magazine. While doing this I noticed on the Editors Pages, Julie Gibbs gave a review of the very first Oracle Magazine from 1987.
Here is the front cover of the first Oracle Magazine. I’ve scanned the editors page, containing the review. Just click on the image below.
The first edition had just 12 pages of content.
Here is the extract from the editors page March/April 1996:
“The picture you see on this page is of the first cover of Oracle Magazine-Yolume I, Number I, June 7987. Yes, we are celebrating our tenth anniversary this year. Ten years may not seem like much in other industries, but in high tech, it's a veritable lifetime. Companies and products have come and gone-where ate you now, VisiCalc? How about the PC jr? And who knew in 1987 that the Internet would be the dominant topic of the high-tech press in 1996?
What was in the first issue of Oracle Magazine? Here's a sampling of articles in that 12-page fledgling publication: New Network Expands Customer Support (24-hour online support was introduced June I, 1987);
Oracle Version 5 .1 Released; Oracle RDBMS Now Available on Wang VS; Oracle Exceeds First HaIf Forecast (revenues for the first half totalled almost $46million); UniForum: Site of Oracle UNIX Announcements (at the time, Oracle ran on more than 20 platforms, including new
UNIX ports to NCR, Sun, DEC Ultrix,Sequent, Altos, and Plexus); SQL Declared Standard Language by ANSI; Double DEC Awards for Oracle (Digital Review's Target Awards gave Oracle first place for "Best Database Management Product" and the No. I rating in the "Digital News 50").”
Some people say that Oracle Magazine existed before 1987. Oracle did have a newsletter type publication.
To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions.
Tuesday, May 22, 2012
Review of Oracle Magazine–March/April 1996
The headline articles for the March/April 1996 edition of Oracle Magazine was Oracle’s first or early articles on Data Warhousing, including DW Architectures, what Oracle tools you can use, multi-dimensional analysis, Oracle Express and future directions of data warehouses.
Julie Gibbs, the editor of Oracle Magazine, wondered ‘What will be hot in 2005?’. Some of her predictions/suggestions were:
- Will Larry Ellison’s NC provide every home with a $500 internet box
- What will be the 3 biggest software companies and were any of them around in 1995
- How many people will use the internet everyday
- Will the internet be censored ? How and by whom ?
- Or will the internet be passe and will virtual reality be a reality
- What will be the size of the largest data warehouse
- Will Apple still exist
- Will you be reading your magazines in print or online
- Will your company have a woman CEO
- How many people will be telecommuting
- Will every desktop have built in video conferencing so that you can talk to your coworkers
Other articles included:
- Oracle Interoffice Suite was released and comprised Messaging, Document and Workflow servers based on Oracle 7.3. The product provided groupware functions, such as electronic mail, messaging, scheduling, directory services, document management, workflow and conferencing.
- Oracle 7.3 new features included Oracle Enterprise Manager, Oracle Software Manager, SQL*Net 2.3, advanced replication and Oracle ConText.
- How to rename your database. It is not always optimal for a database to keep the name it was born with. A step by step guide is given on how to do this without loosing any data!
- A case study is presented from NeXT Computer on how to audit and clean up your Oracle Applications data as you prepare to upgrade to Release 10. These included:
- Review Usernames and unused responsibilities
- Unused menus and menu options
- Are outdate concurrent requests being purged
- Unused printers
- Identify cluttered production libraries
- Unused custom concurrent processes
- Unused database objects
- Inactive vendors and invalid distribution sets
- Unused payment terms
- Closed bank accounts
- Protecting your budgets
- Obsolete journal sources
- Invalid price lists
- Unbooked orders and unclosed orders
- Unused payment terms, transaction types, units of measure and inactive sales people
- How to design a database for OLAP. Most of the following steps still stand today for designing your star-schemas
- Define the question (business function/area)
- Use Normalized logic
- Identify Dimensions
- Create Hierarchies
- Identify Attributes
- Identify Measures
- Add Calculations
- There was a review of the very first Oracle Magazine that was published in June 1987. Watch this space, as I will be posting the details soon.
To view the cover page and the table of contents click on the image at the top of this post or click here.
My Oracle Magazine Collection can be found here. You will find links to my blog posts on previous editions.
Monday, May 21, 2012
Solar Panels
I’ve just had a quick look at the solar panels on the house. This is one of the first truly sunny days this weak and by 11am this morning we have the water tank in the hot press reading that we have maxed out with 300 litres of water at 60C.
The panels have reached a temperature of 128C. Again this is at 11:30 this morning.
Based on the weather outlook for the next 7 days, we will not be using the Gas boiler to head any water over the next week
Friday, May 18, 2012
Oracle Magazine Collection–New addition May/June 2005
I received in the post today a copy of an Oracle Magazine that I’m missing from my collection.
It was sent to me by Kim Berg Hansen from Denmark. Check out his blog.
I owe him a beer at the UKOUG conference in Birmingham later this year in December.
This now means that I have a copy of every Oracle Magazine from July 1998 right up to the current Avengers special editions. Plus other editions with a few gaps going back to 1992.
To view the entire collection – click here.
Tuesday, May 15, 2012
Oracle Magazine–The Avengers Collection
I had a nice surprise when I arrived home from work last night. There was a parcel delivered during the day. When I opened it, I found 6 Oracle Magazines for the May/June 2012 edition.
There was one magazine for each Avengers character. So I have the entire collection.
A big thank you to the person so sent them to me. You know who you are.
These magazines will be joining my collection of Oracle Magazine that spans 20+years. Check out the collection.
Friday, May 11, 2012
Domain Knowledge + Data Skills = Data Miner
Over the past few weeks I have been talking to a lot of people who are looking at how data mining can be used in their organisation, for their projects and to people who have been doing data mining for a log time.
What comes across from talking to the experienced people, and these people are not tied to a particular product, is that you need to concentrate on the business problem. Once you have this well defined then you can drill down to the deeper levels of the project. Some of these levels will include what data is needed (not what data you have), tools, algorithms, etc.
Statistics is only a very small part of a data mining project. Some people who have PhDs in statistics who work in data mining say you do not use or very rarely use their statistics skills.
Some quotes that I like are:
"Focus hard on Business Question and the relevant target variable that captures the essence of the question." Dean Abbott PAW Conf April 2012
"Find me something interesting in my data is a question from hell. Analysis should be guided by business goals." Colin Shearer PAW Conf Oct 2011
There has need a lot of blog posting and articles on what are the key skills for a Data Miner and the more popular Data Scientist. What is very clear from all of these is that you will spend most of your time looking at, examining, integrating, manipulating, preparing, standardising and formatting the data. It has been quoted that all of these tasks can take up to 70% to 85% of a Data Mining/Data Scientist time. All of these tasks are commonly performed by database developers and in particular the developers and architects involved in Data Warehousing projects. The rest of the time for the running of the data mining algorithms, examining the results, and yes some stats too.
Every little time is spent developing algorithms!!! Why is this ? Would it be that the algorithms are already developed (for a long time now and are well turned) and available in all the data mining tools. We can almost treat these algorithms as a black box. So one of the key abilities of a data miner/data scientist would be to know what the algorithms can do, what kind of problems they can be used for, know what kind of outputs they produce, etc.
Domain knowledge is important, no matter how little it is, in preparing for and being involved in a data mining project. As we define our business problem the domain expert can bring their knowledge to the problem and allows us separate the domain related problems from the data related problems. So the domain expertise is critical at that start of a project, but the domain expertise is also critical when we have the outputs from the data mining algorithms. We can use the domain knowledge to tied the outputs from the data mining algorithms back to the original problem to bring real meaning to the original business problem we are working on.
So what is the formula of skill sets for a data mining or data scientist. Well it is a little like the title of this blog;
Domain Knowledge + Data Skills + Data Mining Skills + a little bit of Machine Learning + a little bit of Stats = a Data Miner / Data Scientist