how to go around? (1) better configuration (2) look for unnecessarily allocated objects
Configuration
Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.
NOTE: Using
mapred.map.child.java.opts : heap size for map tasksmapred.reduce.child.java.opts: heap size for reduce tasksmapred.tasktracker.map.tasks.maximum: max map tasks can run simultaneously per nodemapred.tasktracker.reduce.tasks.maximum: max reduce tasks can run simultaneously per node Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.
io.sort.factor: max # of streams to merge at once for sorting. Used both in map and reduce. io.sort.mb: map side memory buffer size used while sorting mapred.job.shuffle.input.buffer.percent: Reduce side buffer related - The percentage of memory to be allocated from the maximum heap size for storing map outputs during the shuffleNOTE: Using
fs.inmemory.size.mb is very bad idea!Unnecessary memory allocation
Simply look for new keyword and make sure there is no unnecessary allocation. A very common tip is using set() method of Writable objects rather than re-allocating a new object at every map or reduce.
Here is a simple count example to show the trick:
note: There are couple more tips here for resolving common errors in Hadoop.
Simply look for new keyword and make sure there is no unnecessary allocation. A very common tip is using set() method of Writable objects rather than re-allocating a new object at every map or reduce.
Here is a simple count example to show the trick:
public static class UrlReducer extends Reducer{
IntWritable sumw = new IntWritable();
int sum;
public void reduce(Text key,Iterable<IntW> vals,Context context){
sum=0;
for (IntWritable val : vals) {
sum += val.get();
}
sumw.set(sum);
context.write(key, sumw);
}
}
note: There are couple more tips here for resolving common errors in Hadoop.