how to go around? (1) better configuration (2) look for unnecessarily allocated objects
Configuration
Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.
NOTE: Using
mapred.map.child.java.opts
: heap size for map tasksmapred.reduce.child.java.opts
: heap size for reduce tasksmapred.tasktracker.map.tasks.maximum
: max map tasks can run simultaneously per nodemapred.tasktracker.reduce.tasks.maximum
: max reduce tasks can run simultaneously per node Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.
io.sort.factor
: max # of streams to merge at once for sorting. Used both in map and reduce. io.sort.mb
: map side memory buffer size used while sorting mapred.job.shuffle.input.buffer.percent
: Reduce side buffer related - The percentage of memory to be allocated from the maximum heap size for storing map outputs during the shuffleNOTE: Using
fs.inmemory.size.mb
is very bad idea!Unnecessary memory allocation
Simply look for new keyword and make sure there is no unnecessary allocation. A very common tip is using set() method of Writable objects rather than re-allocating a new object at every map or reduce.
Here is a simple count example to show the trick:
note: There are couple more tips here for resolving common errors in Hadoop.
Simply look for new keyword and make sure there is no unnecessary allocation. A very common tip is using set() method of Writable objects rather than re-allocating a new object at every map or reduce.
Here is a simple count example to show the trick:
public static class UrlReducer extends Reducer{ IntWritable sumw = new IntWritable(); int sum; public void reduce(Text key,Iterable<IntW> vals,Context context){ sum=0; for (IntWritable val : vals) { sum += val.get(); } sumw.set(sum); context.write(key, sumw); } }
note: There are couple more tips here for resolving common errors in Hadoop.