Spark Shell with Scala
With Spark Shell with Scala, we can execute different commands of RDD transformation /action to process the data,is explained below.
1.Open Spark Shell
The following command is used to open Spark shell in scala.
2.Different ways of creating New RDD
2.1 create RDD by Reading File from local filesystem
In this method data is already available in the external systems like a local filesystem i.e HDFS, HBase etc.
|scala> val inputfile = sc.textFile(“input.txt”)|
Note: Here input.txt file present in Home directory.
2.2 Create RDD through Parallelized Collection
This method can be used with the existing collections of data.
|scala> val no = Array(1, 2, 3, 4, 5, 6, 7)|
scala> val noData = sc.parallelize(no)
2.3 Create RDD from Existing RDDs
In this method creating new RDD by using the existing one.
|scala> val newRDD = no.map(data => (data * 2))|
3. Counting number of Items in the RDD
Here count the number of items available in the RDD. To count the items need to call an Action.
Here Filter the RDD and create new RDD of items which contain word “BeyondCorner”. To filter, need to call transformation filter, which will return a new RDD with subset of items.
|scala> val DFData = data.filter(line => line.contains(“BeyondCorner”))|
5. Perform Transformation and Action together
Here will perform multiple operations together like filter transformation and count action together for complex requirements.
|scala> data.filter(line => line.contains(“BeyondCorner”)).count()|
6. Read the First item from the RDD
To read the first item from the file, use the below command.
7. Read the First 7 item from the RDD
To read the first 7 item from the file, use the below command
An RDD is made up of multiple partitions, to count the number of partitions, use the below command.
9. Caching the Transformations
Below command used to store the intermediate transformations in memory.
10. Exit from Spark shell
Below command used to exit from the shell.
From the above Spark Shell with Scala topic we conclude that, using Spark Shell commands we can create RDD, read from RDD, and partition RDD. We can perform various operation on the data using the Spark Shell commands.