Tuesday, June 27, 2017

How the EU GDPR will affect the use of Machine Learning - Part 1

On 5 December 2015, the European Parliament, the Council and the Commission reached agreement on the new data protection rules, establishing a modern and harmonised data protection framework across the EU. Then on 14th April 2016 the Regulations and Directives were adopted by the European Parliament.

NewImage

The EU GDPR comes into effect on the 25th May, 2018.

Are you ready ?

The EU GDPR will affect every country around the World. As long as you capture and use/analyse data captured with the EU or by citizens in the EU then you have to comply with the EU GDPR.

Over the past few months we have seen a increase in the amount of blog posts, articles, presentations, conferences, seminars, etc being produced on how the EU GDPR will affect you. Basically if your company has not been working on implementing processes, procedures and ensuring they comply with the regulations then you a bit behind and a lot of work is ahead of you.

Like I said there was been a lot published and being talked about regarding the EU GDPR. Most of this is about the core aspects of the regulations on protecting and securing your data. But very little if anything is being discussed regarding the use of machine learning and customer profiling.

Do you use machine learning to profile, analyse and predict customers? Then the EU GDPRs affect you.

Article 22 of the EU GDPRs outlines some basic capabilities regarding machine learning, and in additionally Articles 13, 14, 19 and 21.

Over the coming weeks I will have the following blog posts. Each of these address a separate issue, within the EU GDPR, relating to the use of machine learning.

NewImage

Thursday, June 22, 2017

Installing Scala and Apache Spark on a Mac

The following outlines the steps I've followed to get get Scala and Apache Spark installed on my Mac. This allows me to play with Apache Spark on my laptop (single node) before deploying my code to a multi-node cluster.

1. Install Homebrew

Homebrew seems to be the standard for installing anything on a Mac. To install Homebrew run
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
NewImage

When prompted enter your system/OS password to allow the install to proceed.

NewImage NewImage

2. Install xcode-select (if needed)

You may have xcode-select already installed. This tool allows you to install the languages using command line.

xcode-select --install

If it already installed then nothing will happen and you will get the following message.

xcode-select: error: command line tools are already installed, use "Software Update" to install updates

3. Install Scala

[If you haven't installed Java then you need to also do this.]

Use Homebrew to install scala.

brew install scala
NewImage

4. Install Apache Spark

Now to install Apache Spark.

brew install apache-spark
NewImage

5. Start Spark

Now you can start the Apache Spark shell.

spark-shell
NewImage

6. Hello-World and Reading a file

The traditional Hello-World example.

scala> val helloWorld = "Hello-World"
helloWorld: String = Hello-World

or

scala> println("Hello World")
Hello World

What is my current working directory.

scala> val whereami = System.getProperty("user.dir")
whereami: String = /Users/brendan.tierney

Read and process a file.

scala> val lines = sc.textFile("docker_ora_db.txt")
lines: org.apache.spark.rdd.RDD[String] = docker_ora_db.txt MapPartitionsRDD[3] at textFile at :24

scala> lines.count()
res6: Long = 36

scala> lines.foreach(println)
####################################################################
## Specify the basic DB parameters
## Copyright(c) Oracle Corporation 1998,2016. All rights reserved.##
##                                                                ##
##------------------------------------------------------------------
##                   Docker OL7 db12c dat file                    ##

##                                                                ##
## db sid (name)
####################################################################
## default : ORCL

## cannot be longer than 8 characters
##------------------------------------------------------------------

...

There will be a lot more on how to use Spark and how to use Spark with Oracle (all their big data stuff) over the coming months.


[I've been busy for the past few months working on this stuff, EU GDPR issues relating to machine learning, and other things. I'll be sharing some what I've been working on and learning in blog posts over the coming weeks]