Over the past couple of weeks I’ve had conversations with a large number of people about Data Science in the Oracle arena.
A few things have stood out. The first and perhaps the most important of these is that there is confusion of what Data Science actually means. Some think it is just another name for Statistics or Advanced Statistics, some Predictive Analytics or Data Mining, or Data Analysis, Data Architecture, etc.. The reality is it is not. It is more than what these terms mean and this is a topic for discussion for another day.
During these conversations the same questions or topics keep coming up and the simplest answer to all of these is taken from a Pantomime (Panto).
We need to have lots of statisticians
'Oh No You Don't !'
We can only do Data Science if we have Big Data
'Oh No You Don't !'
We can only do data mining/data science if we have 10’s or 100’s of Million of records
'Oh No You Don't !'
We need to have an Exadata machine
'Oh No You Don't !'
We need to have an Exalytics machine
'Oh No You Don't !'
We need extra servers to process the data
'Oh No You Don't !'
We need to buy lots of Statistical and Predictive Analytics software
'Oh No You Don't !'
We need to spend weeks statistically analysing a predictive model
'Oh No You Don't !'
We need to have unstructured data to do Data Science
'Oh No You Don't !'
Data Science is only for large companies
'Oh No You Don't !'
Data Science is very complex, I can not do it
'Oh No You Don't !'
Let us all say it together for one last time ‘Oh No You Don’t’
In its simplest form, performing Data Science using the Oracle stack, just involves learning and using some simple SQL and PL/SQL functions in the database.
Maybe we (in the Oracle Data Science world and those looking to get into it) need to adopt a phrase that is used by Barrack Obama of ‘Yes We Can’, or as he said it in Irish when he visited Ireland back in 2011, ‘Is Feidir Linn’.
Remember it is just SQL.