Spark Shell with Scala

With Spark Shell with Scala, we can execute different commands of RDD transformation /action to process the data,is explained below.

1.Open Spark Shell

The following command is used to open Spark shell in scala.

$ spark-shell

2.Different ways of creating New RDD

2.1 create RDD by Reading File from local filesystem

In this method data is already available in the external systems like a local filesystem i.e HDFS, HBase etc.

scala> val inputfile = sc.textFile(“input.txt”)

Note: Here input.txt file present in Home directory.

2.2 Create RDD through Parallelized Collection

This method can be used with the existing collections of data.

scala> val no = Array(1, 2, 3, 4, 5, 6, 7)

scala> val noData = sc.parallelize(no)

2.3 Create RDD from Existing RDDs

In this method creating new RDD by using the existing one.

scala> val newRDD = no.map(data => (data * 2))

3. Counting number of Items in the RDD

Here count the number of items available in the RDD. To count the items need to call an Action.

scala> data.count()

4.Filter Operation

Here Filter the RDD and create new RDD of items which contain word “BeyondCorner”. To filter, need to call transformation filter, which will return a new RDD with subset of items.

scala> val DFData = data.filter(line => line.contains(“BeyondCorner”))

5. Perform Transformation and Action together

Here will perform multiple operations together like filter transformation and count action together for complex requirements.

scala> data.filter(line => line.contains(“BeyondCorner”)).count()

6. Read the First item from the RDD

To read the first item from the file, use the below command.

scala> data.first()

7. Read the First 7 item from the RDD

To read the first 7 item from the file, use the below command

scala> data.take(7)

8.RDD Partition

An RDD is made up of multiple partitions, to count the number of partitions, use the below command.

scala> data.partitions.length

9. Caching the Transformations

Below command used to store the intermediate transformations in memory.

scala> counts.cache()

10. Exit from Spark shell

Below command used to exit from the shell.

scala> exit

Conclusion

    From the above Spark Shell with Scala topic we conclude that, using Spark Shell commands we can create RDD, read from RDD, and partition RDD. We can perform various operation on the data using the Spark Shell commands.