Thursday, September 08, 2011

Using Distributed Cache in Hadoop (Hadoop 0.20.2)

Quite often we encounter situation when we need certain files like configuration files, jar libraries, xml files, properties files etc to be present in Hadoops processing nodes at the time of its execution. Quite understandably Hadoop has a feature called Distributed Cache which helps in sending those readonly files to the task nodes. In Hadoop environment jobs are basically map-Reduce jobs and the necessary readonly files are copied to the tasktracker nodes at the beginning of job execution process. The default size of distributed cache in Hadoop is about 10 GB but We can control the size of the distributed cache by explicitly defining its size in hadoop’s configuration file local.cache.size.

Thus, Distributed cache is a mechanism to caching readonly data over Hadoop cluster. The sending of readOnly files occurs at the time of job creation and the framework makes the cached files available to the cluster nodes at their computational time.

The following distributed cache java program sends the necessary xml files to the task executing nodes prior to job execution.

Java Program/Tutorial of Distributed Cache usage in Hadoop
Hadoop Version : 0.20.2
Java Version: Java-SE-1.6

The program below consists of two classes. The DcacheMapper class and the parent Class. Job is initialized in the base class. Job is initialized pointing to the location in HDFS where the file to be sent to all nodes is present. When the setup method in parent class is executed we can retrieve the distributed configuration file and read it for our usage.

API doc for Distributed Cache can be found at the given URL.

Class To make a Map Reduce Job for Distributed Cache

package com.bishal.mapreduce;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

* @author Bishal Acharya
public class DcacheMapper extends ParentMapper {
public DcacheMapper() {

public void map(Object key, Text value, Context context) throws IOException {
* Write your map Implementation

public static void main(String args[]) throws URISyntaxException,
IOException, InterruptedException, ClassNotFoundException {

Configuration conf = new Configuration();

final String NAME_NODE = "hdfs://localhost:9000";

Job job = new Job(conf);

DistributedCache.addCacheFile(new URI(NAME_NODE
+ "/user/root/input/Configuration/layout.xml"),


new Path(NAME_NODE + "/user/root/input/" + "/"
+ "/test.txt"));
new Path(NAME_NODE + "/user/root/output/" + "/"
+ "/importOutput"));
System.exit(job.waitForCompletion(true) ? 0 : 1);

Parent Class to read data from Distributed Cache

package com.bishal.mapreduce;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Mapper;

* BaseMapper class to perform initialization setup and cleanUp tasks using Distributed Cache for Map
* Reduce job
* @author Bishal Acharya
public class ParentMapper extends Mapper<Object, Text, Object, Text> {
protected Configuration conf;

public ParentMapper() {

private void initialize() {
conf = new Configuration();

protected void setup(
org.apache.hadoop.mapreduce.Mapper<Object, Text, Object, Text>.Context context)
throws, InterruptedException {

Path[] uris = DistributedCache.getLocalCacheFiles(context

BufferedReader fis;

* Prepare Objects from Layout XML
for (int i = 0; i < uris.length; i++) {
if (uris[i].toString().contains("layout")) {

String chunk = null;
fis = new BufferedReader(new FileReader(uris[i].toString()));
String records = "";
while ((chunk = fis.readLine()) != null) {
records += chunk;
// do whatever you like with xml using parser
System.out.println("Records :" + records);



protected void cleanup(
org.apache.hadoop.mapreduce.Mapper<Object, Text, Object, Text>.Context context)
throws, InterruptedException {


In the parent class we read the cache file in the setup method of the job. And cachePurge operation is performed at the cleanup phase of the MapReduce job.

Wednesday, September 07, 2011

Evaluating Javascript expressions in Java

As the java virtual machine comes up with Javascript runtime, we can directly evaluate JavaScript expressions in our Java Class. The approach could be helpful in conditions when we want certain kind of Javascript expressions to be evaluated in our Java class in server rather than the client side. JVM has the in built Script Engine called Mozilla Rhino. Rhino is an open-source implementation of JavaScript written in Java, which is used to run expressions given in Javascript from JVM.

Given is the ScriptingEngine info for Java virtual machine. This information can be obtained by running below method.

ScriptEngineManager mgr = new ScriptEngineManager();
List facts =

System.out.println("engName :" + facts.getEngineName()
+ ":engVersion:" + facts.getEngineVersion()
+ ":langName :" + facts.getLanguageName()
+ ":langVersion :"+ factory.getLanguageVersion());

when the Engine Names is printed we can see the following results.

Script Engine: Mozilla Rhino (1.6 release 2)
Engine Alias: js
Engine Alias: rhino
Engine Alias: JavaScript
Engine Alias: javascript
Engine Alias: ECMAScript
Engine Alias: ecmascript
Language: ECMAScript (1.6)

Given below is the class which evaluates the JavaScript expression given to it. The evaluate method has to be given the expression to be evaluated, the setters names and the setterValueMap which contains the key,value pair of setterField and its corresponding value.

*@author Bishal Acharya
public class JavascriptEvaluator {
* @param expression
* the expression to evaluate
* @param setters
* list of setters for the expression
* @param setterValueMap
* Map having setterKey and its value
* @return Object the evaluated expression
public static Object evaluate(String expression, String setters,
Map<String, String> setterValueMap) {
String[] setterArray = setters.split(",");
List<String> setterList = new ArrayList<String>();

for (String val : setterArray) {

ScriptEngineManager mgr = new ScriptEngineManager();
ScriptEngine jsEngine = mgr.getEngineByName("JavaScript");
Object obj = null;
for (int i = 0; i < setterList.size(); i++) {
try {
obj = jsEngine
.eval("func1(); function func1(){" + expression + "}");
} catch (ScriptException e) {
return obj;

public static void main(String args[]) {
String expr = "return GROUP_NAME.substr(0,5).concat(' How are You');";
String setters = "GROUP_NAME";
Map<String, String> setterValueMap = new HashMap<String, String>();
setterValueMap.put("GROUP_NAME", "Hello World");
System.out.println(JavascriptEvaluator.evaluate(expr, setters,

Output of Running above class :

Hello How are You