The following outlines the steps I've followed to get get Scala and Apache Spark installed on my Mac. This allows me to play with Apache Spark on my laptop (single node) before deploying my code to a multi-node cluster.
1. Install Homebrew
Homebrew seems to be the standard for installing anything on a Mac. To install
Homebrew run
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
When prompted enter your system/OS password to allow the install to proceed.
2. Install xcode-select (if needed)
You may have xcode-select already installed. This tool allows you to install the languages using command line.
xcode-select --install
If it already installed then nothing will happen and you will get the following message.
xcode-select: error: command line tools are already installed, use "Software Update" to install updates
3. Install Scala
[If you haven't installed Java then you need to also do this.]
Use Homebrew to install scala.
brew install scala
4. Install Apache Spark
Now to install Apache Spark.
brew install apache-spark
5. Start Spark
Now you can start the Apache Spark shell.
spark-shell
6. Hello-World and Reading a file
The traditional Hello-World example.
scala> val helloWorld = "Hello-World"
helloWorld: String = Hello-World
or
scala> println("Hello World")
Hello World
What is my current working directory.
scala> val whereami = System.getProperty("user.dir")
whereami: String = /Users/brendan.tierney
Read and process a file.
scala> val lines = sc.textFile("docker_ora_db.txt")
lines: org.apache.spark.rdd.RDD[String] = docker_ora_db.txt MapPartitionsRDD[3] at textFile at :24
scala> lines.count()
res6: Long = 36
scala> lines.foreach(println)
####################################################################
## Specify the basic DB parameters
## Copyright(c) Oracle Corporation 1998,2016. All rights reserved.##
## ##
##------------------------------------------------------------------
## Docker OL7 db12c dat file ##
## ##
## db sid (name)
####################################################################
## default : ORCL
## cannot be longer than 8 characters
##------------------------------------------------------------------
...
There will be a lot more on how to use Spark and how to use Spark with Oracle (all their big data stuff) over the coming months.
[I've been busy for the past few months working on this stuff, EU GDPR issues relating to machine learning, and other things. I'll be sharing some what I've been working on and learning in blog posts over the coming weeks]