June 20, 2011

How to pass job specific parameters in Hadoop

Say there is a parameter your mapper or reducer needs, and it is desirable to get this parameter from the user at the beginning of the job submission. Here is how to use "Configuration" to let the user set the parameter:

public class GenericReplace {

   public static final String IS_KEY_FIRST = "IsKeyFirstInMapFile";

   public static class GenerateLinks extends Mapper {

      public void map(Text key, Text value, Context context)  {
         if (context.getConfiguration().getInt(IS_KEY_FIRST, 1)) {
              //do this .. 
         }
         else{
              //do that .. 
         }
      }
   }

   public static void main(String[] args) throws Exception {

   Configuration conf = new Configuration();
   GenericReplace.graphPath = args[0];
   GenericReplace.outputPath = args[1];
   conf.setBoolean(IS_KEY_FIRST , Boolean.getBoolean(args[3]));
   Job job = Job.getInstance(new Cluster(conf), conf);
   ...
   }
}