This is the fourth part of series of blog posts on 'How the EU GDPR will affect the use of Machine Learning'
In this blog post (Part4b) I will examine some of the more technical aspects and how the in-database machine learning functions saves the day!
Probably in most cases where machine learning has been used and/or deployed in your company to analyse, profile and predict customers, it is more than likely that some sort of black box machine learning has been used.
Typical black box machine learning will include using algorithms like Neural Networks, but these can extended to other algorithms, within the context of the EU GDPR requirements, such as SVMs, GLM, etc. Additionally most companies don't just use one algorithm to make a decision on a customer. Many algorithms and rules based decision make can be used together, using some sort of voting system, to determine if a customer is targeted in a certain way.
Basically all of these do not really support the requirements of the EU GDPRs.
In most cases we need to go back to basics. Back to more simpler approaches of machine learning for customer profiling and prediction. This means no more, for now, ensemble models, unless you can explain why a customer was selected. This means having to use simple algorithms like Decision Trees, at a push Naive Bayes, and using some well defined rules based methods. All of these approaches allows us to see and understand why a customer was selected and based on Article 22 being able to explain why.
But there is some hope. Some of the commercial machine learning vendors already for some prediction insights built into their software. Very few if any open source solutions have this capability.
For example, Oracle introduced a new function called PREDICTION_DETAILS in Oracle 12.1c and this was expanded in Oracle 12.2c to cover all their in-database machine learning algorithms.
The following is an example of using this function for an SVM model. When you examine the boxes in the following image you an see that a slightly different set of attributes and the values of these attributes are listed. Each box corresponds to a different customer. This means we can give an explanation of why a customer was selected. Oracle 12c saves the day.
select cust_id, prediction(clas_svm_1_27 using *) pred_value, prediction_probability(clas_svm_1_27 using *) pred_prob, prediction_details(clas_svm_1_27 using *) pred_details from mining_data_apply_v;
If you have a look at other commercial machine learning solutions, you will find some give similar functionality or it will be available soon. Can we get the same level of detail from open source solutions. Not really unless you are using Decision Tress and maybe Naive Bayes. This means that companies that have gone done the pure open source for their machine learning may have to look at using alternative software and may have to folk out some hard earned dollars/euros to make sure that they are complainant with Article 22 of the EU GDPRs.
Click back to 'How the EU GDPR will affect the use of Machine Learning - Part 1' for links to all the blog posts in this series.