Writable WritableComparable Interfaces

Let’s have a glance on Writable and WritableComparable Interfaces,

Writable Interface

Overview

Writable is interface mechanism to serialise and de-serialise your data. It is an interface which has write method and read fields method.

* Write field is to writing your data into the output stream or network.

* Read field is to read data from the input stream.

Introduction

The hadoop is used for MapReduce computations, it uses the Writable interface based classes as the data types. These data types from writable are used throughout the MapReduce Data Flow structure, it starts from reading input data, transferring intermediate data between Map & Reduce and then writing output data.

Writable interface has multiple data types so we have to choose appropriate data types for input, intermediate and output. Choosing the right data type will enhance the performance and programmability of your MapReduce programs.

Writable Interface Functions

* A data type must implement the org.apache.hadoop.io.Writable interface in order to be used as a Value data type of a MapReduce computation.

* It is only one interface will define how a value should be serialized and de-serialized in Hadoop for transmitting and storing the data.

Coding

package org.apache.hadoop.io;

import java.io.DataInput;

import java.io.DataOutput;

public interface Writable

{

void write(DataOutput out) throws IOException;

void readFields(DataInput in) throws IOException;

}

WritableComparable Interfaces

* A data type must implement the org.apache.hadoop.io.WritableComparable<T> interface in order to be used as a Key data type of a MapReduce computation.

* It is the additional functionality for Writable interface and for sorting purpose.

Example

public interface WritableComparable extends Writable, Comparable

{

}

Here comparing the operators are passed to it as shown below.

public interface Comparable

{

public int compareTo(Object obj);

}

The compareTo() method compares the comparable object with the current object and if the compared object is less then it returns -1 , greater it returns 0 or equal then it returns 1 .

The above two interfaces are provided in org.apache.hadoop.io package.

Data Type constraints for Key-Value pair in MapReduce

There are two basic constraints which should be satisfied by the data types used for the Key-Value fields in Hadoop MapReduce.

  • The writable interface is must to be implemented by any data type for a Value filed in Mapper or Reducer.
  • The writableComparable interface along with Writable interface is must to be implemented by any data type for a Key filed in Mapper or Reducer in order to compare the keys of this type with each other for sorting purposes.

Writable Classes – Hadoop Data Types

1. Primitive Writable Classes

* Hadoop provides the classes which can implement the Writable and WritableComparable interfaces by wrapping the Java primitive types.

* These classes are provided in org.apache.hadoop.io package, so these Hadoop wrapper classes will have a get () and set () method in order to fetch and store the wrapped value.

* Hadoop provides the below list of primitive writable data types like,

  1. IntWritable
  2. VIntWritable
  3. FloatWritable
  4. LongWritable
  5. VLongWritable
  6. DoubleWritable
  7. BooleanWritable
  8. ByteWritable

Note

  • After serialisation both Java data types and Hadoop Primitive data types will have same size, IntWritable will have 4 bytes and LongWritable will have 8 bytes.
2. Array Writable Classes

There are two types of array writable classes available in Hadoop, one for single dimensional and another for two dimensional arrays

  1. ArrayWritable
  2. TwoDArrayWritable
3. Map Writable Classes

The three data types listed below are MapWritable class data types which implement java.util.Map interface.

  1. AbstractMapWritable – this act as the base or abstract for other MapWritable classes.
  2. MapWritable – This class used for the general purpose of mapping the Writable Keys to Writable values.
  3. SortedMapWritable – This is the significance of MapWritable class than can implement the SortedMap interface.
4. Other Writable Classes
4.1 NullWritable

* The NullWritable represents a null value in a MapReduce, when we do not want to read or write a Key or a Value then that field can be declared as NullWritable.

* If the data type is declared as NullWritable, then no byte is read or written.

4.2 Text

* Text Writable  class is equivalent to java.lang.string, Unlike java’s string data type text in hadoop can be muted.

* Its max size is 2GB.

4.3 BytesWritable

* It is a wrapper for an array of binary data.

4.4 ObjectWritable

* This is a generic object wrapper.

* It can store any objects like Java primitives, Writable, String, Null or arrays.

4.5 Generic Writable

It is similar to that of ObjectWritable class but supports only few data types.

“That’s all about the Writable and WritableComparable Interfaces”