# Apache Spark Components – Spark GraphX

**Objective**

The main objective behind Apache Spark Components-Spark GraphX creation is **to simplify graph analysis task.**

**Introduction**

** **GraphX is a distributed graph-processing framework build on the top of Spark. It is a component for **graph** and **graph-parallel** computation. Its API used to perform **graph analysis**. It simplifies the graph analytics tasks by the collection of graph algorithm and builders. It also provides an **optimized runtime**.

**Benefits**

- GraphX simplify graph analytics tasks by reusing
**Spark RDD**concept to, and it operates on a directed multigraph. - It provides an API for
**fast**and**robust**development for leveraging graphs. - GraphX is widely used in
**data analytics**and**computer science**, because Graphs are the perfect data structure for describing social networks. For this reason, companies like**Facebook**emphasize developing software. - GraphX optimizes the way to represent
**vertex**and**edges**when they are primitive data types. - GraphX supports fundamental operators like
**sub graph**,**join Vertices**, and**aggregate Messages**.

**Features**

**Flexibility**

- Spark GraphX works with
**graphs**and**computations**. - GraphX unifies
**ETL**(Extract Transform & Load). - Spark GraphX is an API designed to manipulate graphs.
- It performs
**exploratory analysis**and**iterative graph**computation within a single system. Therefore we can view the same data has graphs and collections, transform and join graphs in case of RDDs efficiently.

**Speed**

- Spark GraphX has the ability to combine transformations, machine learning, and graph computation in a single system at high speed, it makes Spark as one of the most powerful frameworks .

**Growing Algorithm Library**

- Spark GraphX has number of built-in graph algorithms including PageRank, Connected components, Label propagation, SVD++, and Triangle counter.

**Understanding Apache Spark Components-Spark GraphX with an Examples**

**Figure: Flight Example with GraphX**

As per the above diagram, a flight travels to three different places namely SFO, ORD and DFW and the distances between these locations are labeled accordingly.

GraphX is implemented to **analyze the flight routes**. In Spark GraphX all the locations are called as **Vertex (V)** and all the connecting routes are called as **Edge (E).**

**Use Cases of Graph Computation**

The following are the use cases of Apache Spark Components-Spark GraphX, it give an idea about graph computation and scope to implement new solutions using graphs.

**1.Disaster Detection System**

Graphs can be used to detect disasters such as earthquakes, tsunami, forest fires and volcanoes so that it provides warnings to alert people.

**2.Page Rank**

Page Rank can be used in finding the influencers in any network like social media network.

**3.Financial Fraud Detection**

It is can be used to detect people involved in financial fraud and money laundering and also to monitor financial transaction.

**4.Business Analysis**

Graph is also used in business Analysis to understand customers purchase trends supports. E.g. Uber, McDonald’s etc.

**5.Geographic Information Systems**

Graphs are used to develop functionalities on geographic information systems like watershed delineation and weather prediction.

**Use Case – Flow Diagram**

**Figure: Use Case – Flow diagram of Flight Data Analysis using Spark GraphX**

The Steps Involved in Flight Data Analysis Using Spark GraphX are as fallows

1. Collecting Huge amount of Flight data

2. Database Storing Real time Flight data.

3. Creating Graph Using GraphX.

4. Querying the data like

4.1 Compute Longest Flight Routes

4.2 Calculate Top Busiest airport

4.3 Calculate routes with lowest Flight Cost.

5. Visualizing using Google Data Studio.

6. Final Step is getting Specific Results.

**Conclusion**

** ***From the above topic we can conclude that Spark GraphX is a component for graph and graph-parallel computation, by using its API it simplifies the graph analysis. It is a boon for data scientist to analyze real-time data*.

**References**

https://en.wikipedia.org/wiki/Graph_theory