Apache Spark Components – Spark GraphX
The main objective behind Apache Spark Components-Spark GraphX creation is to simplify graph analysis task.
GraphX is a distributed graph-processing framework build on the top of Spark. It is a component for graph and graph-parallel computation. Its API used to perform graph analysis. It simplifies the graph analytics tasks by the collection of graph algorithm and builders. It also provides an optimized runtime.
- GraphX simplify graph analytics tasks by reusing Spark RDD concept to, and it operates on a directed multigraph.
- It provides an API for fast and robust development for leveraging graphs.
- GraphX is widely used in data analytics and computer science, because Graphs are the perfect data structure for describing social networks. For this reason, companies like Facebook emphasize developing software.
- GraphX optimizes the way to represent vertex and edges when they are primitive data types.
- GraphX supports fundamental operators like sub graph, join Vertices, and aggregate Messages.
- Spark GraphX works with graphs and computations.
- GraphX unifies ETL (Extract Transform & Load).
- Spark GraphX is an API designed to manipulate graphs.
- It performs exploratory analysis and iterative graph computation within a single system. Therefore we can view the same data has graphs and collections, transform and join graphs in case of RDDs efficiently.
- Spark GraphX has the ability to combine transformations, machine learning, and graph computation in a single system at high speed, it makes Spark as one of the most powerful frameworks .
- Growing Algorithm Library
- Spark GraphX has number of built-in graph algorithms including PageRank, Connected components, Label propagation, SVD++, and Triangle counter.
Understanding Apache Spark Components-Spark GraphX with an Examples
Figure: Flight Example with GraphX
As per the above diagram, a flight travels to three different places namely SFO, ORD and DFW and the distances between these locations are labeled accordingly.
GraphX is implemented to analyze the flight routes. In Spark GraphX all the locations are called as Vertex (V) and all the connecting routes are called as Edge (E).
Use Cases of Graph Computation
The following are the use cases of Apache Spark Components-Spark GraphX, it give an idea about graph computation and scope to implement new solutions using graphs.
1.Disaster Detection System
Graphs can be used to detect disasters such as earthquakes, tsunami, forest fires and volcanoes so that it provides warnings to alert people.
Page Rank can be used in finding the influencers in any network like social media network.
3.Financial Fraud Detection
It is can be used to detect people involved in financial fraud and money laundering and also to monitor financial transaction.
Graph is also used in business Analysis to understand customers purchase trends supports. E.g. Uber, McDonald’s etc.
5.Geographic Information Systems
Graphs are used to develop functionalities on geographic information systems like watershed delineation and weather prediction.
Use Case – Flow Diagram
Figure: Use Case – Flow diagram of Flight Data Analysis using Spark GraphX
The Steps Involved in Flight Data Analysis Using Spark GraphX are as fallows
1. Collecting Huge amount of Flight data
2. Database Storing Real time Flight data.
3. Creating Graph Using GraphX.
4. Querying the data like
4.1 Compute Longest Flight Routes
4.2 Calculate Top Busiest airport
4.3 Calculate routes with lowest Flight Cost.
5. Visualizing using Google Data Studio.
6. Final Step is getting Specific Results.
From the above topic we can conclude that Spark GraphX is a component for graph and graph-parallel computation, by using its API it simplifies the graph analysis. It is a boon for data scientist to analyze real-time data.