yasemin's notes: hadoop exception

Showing posts with label hadoop exception. Show all posts

April 21, 2011

Hadoop - Incompatible namespaceIDs Error

After formatting the namenode, restarting Hadoop fails - more specifically namenode does not start with Incompatible namespaceIDs Error.

bin/hadoop namenode -format
..
bin/start-dfs.sh
..
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
  java.io.IOException: Incompatible namespaceIDs in /hadoop21/hdfs/datadir: 
  namenode namespaceID = 515704843; datanode namespaceID = 572408927
  ..

Why? -datanodes have the old version number after formatting the namenode.
Solution? - hacking in to <hdfs-data-path>/datadir/current/VERSION file and changing the version number with the new one (which is 572408927 in this example) solves the problem. Make sure to change it for every data-node in the cluster.

WARNING: most probably you will be loosing the data in HDFS. even though it is not deleted, not accessible with the new version.

To avoid such a boring case, be careful before formatting. Take a look at this

April 11, 2011

Common Hadoop HDFS exceptions with large files

Big data in HDFS, so many disk problems. First of all, make sure there are at least ~20-30% free space in each node. There are two other problems I faced recently:

all datanodes are bad
This error could be cause because of there are too many open files. limit is 1024 by default. To increase this use

ulimit -n newsize

For more information click!

error in shuffle in fetcher#k
This is another problem - here is full error log:

2011-04-11 05:59:45,744 WARN org.apache.hadoop.mapred.Child: 
Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: 
error in shuffle in fetcher#2
 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:124)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:362)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:416)
 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
 at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.OutOfMemoryError: Java heap space
 at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:58)
 at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:45)
 at org.apache.hadoop.mapreduce.task.reduce.MapOutput.(MapOutput.java:104)
 at org.apache.hadoop.mapreduce.task.reduce.MergeManager.unconditionalReserve(MergeManager.java:267)
 at org.apache.hadoop.mapreduce.task.reduce.MergeManager.reserve(MergeManager.java:257)
 at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:305)
 at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:251)
 at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:149)

One way to go around this problem is making sure there are not too many map tasks for small input files. If possible you can cat input files manually to create bigger chunks or push hadoop to combine multiple tiny input files for a single mapper. For more details, take a look at here.

Also, at Hadoop discussions groups, it is mentioned that default value of dfs.datanode.max.xcievers parameter, the upper bound for the number of files an HDFS DataNode can serve, is too low and causes ShuffleError. In hdfs-site.xml, I set this value to 2048 and worked in my case.

<property>
        <name>dfs.datanode.max.xcievers</name>
        <value>2048</value>
  </property>

Update: Default value for dfs.datanode.max.xcievers is updated with this JIRA.

April 03, 2011

java.io.EOFException with Hadoop

My code runs smoothly with a smaller dataset, however whenever I run it with a larger one, it fails with java.io.EOFException I've been trying to figure out the problem.

11/03/31 01:13:55 INFO mapreduce.Job: 
  Task Id: attempt_201103301621_0025_m_000634_0, Status : FAILED
java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:197)
 at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
 at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1999)
 at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2131)
 ...
 ...
 ...
 at org.apache.hadoop.mapred.MapTask$
  NewTrackingRecordReader.nextKeyValue(MapTask.java:465)
 at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
 at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:90)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run
  (Delegatin

So, EOFException means something wrong with your input files. If files are not written & closed correctly, this exception is thrown - the file systems thinks there are more to read but actually number of bytes left are less than expected.
To solve the problem, dig into the input files and make sure they are created carefully without any corruption. Also if MultipleOutputs is used to prepare input files, make sure it is also closed at the end!

March 24, 2011

Hadoop - out of disk space

Facing weird errors like

Exception running child : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill0.out

means simply some of the nodes run out of memory. To check hdfs status and available storage in each node: http://master-urls:50070/dfsnodelist.jsp?whatNodes=LIVE

March 07, 2011

Hadoop - Java Heap Space Error

"Error: Java Heap space" means I'm trying to allocate more memory then available in the system.
how to go around? (1) better configuration (2) look for unnecessarily allocated objects

Configuration

mapred.map.child.java.opts : heap size for map tasks
mapred.reduce.child.java.opts: heap size for reduce tasks

mapred.tasktracker.map.tasks.maximum: max map tasks can run simultaneously per node
mapred.tasktracker.reduce.tasks.maximum: max reduce tasks can run simultaneously per node

Make sure ((num_of_maps * map_heap_size) + (num_of_reducers * reduce_heap_size)) is not larger than memory available in the system. Max number of mappers & reducers can also be tuned looking at available system resources.

io.sort.factor: max # of streams to merge at once for sorting. Used both in map and reduce.

io.sort.mb: map side memory buffer size used while sorting
mapred.job.shuffle.input.buffer.percent: Reduce side buffer related - The percentage of memory to be allocated from the maximum heap size for storing map outputs during the shuffle

NOTE: Using fs.inmemory.size.mb is very bad idea!

Unnecessary memory allocation

Simply look for new keyword and make sure there is no unnecessary allocation. A very common tip is using set() method of Writable objects rather than re-allocating a new object at every map or reduce.
Here is a simple count example to show the trick:

public static class UrlReducer extends Reducer{
  IntWritable sumw = new IntWritable();
  int sum;

  public void reduce(Text key,Iterable<IntW> vals,Context context){
    sum=0;
    for (IntWritable val : vals) {
      sum += val.get();
    }
    sumw.set(sum);
    context.write(key, sumw);
  }
}

note: There are couple more tips here for resolving common errors in Hadoop.

December 03, 2010

make sure start-dfs.sh script is being executed on the master !

everything looks fine when start-all.sh is executed but there is no Namenode process apprear on the jps results. why ? also when I check the logs I see following networking exceptions:

ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to hostname/ipaddress:host : Cannot assign requested address
   at org.apache.hadoop.ipc.Server.bind(Server.java:190)
   at org.apache.hadoop.ipc.Server$Listener.(Server.java:253)
   at org.apache.hadoop.ipc.Server.(Server.java:1026)
   at org.apache.hadoop.ipc.RPC$Server.(RPC.java:488)

when I try to do an ls on dfs, I get following:
ipc.Client: Retrying connect to server:  hostname/ipaddress:host
...
...
Bad connection to FS. command aborted

I spent lots of time trying to figure out the "networking" problem, checked if the port is already in use, ip4/ip6 conflict etc ..

at the end, I realized that I'm running start-all.sh script on a random node. When it is executed on the master node, works just fine ! simple fix ..

November 24, 2010

hadoop - wrong key class exception

Usually happens because of mismatch between Map or Reduce class signature and configuration settings.

But also be careful about the combiner ! Check if you are using the same class as reducer and combiner. If reducer's input key-value pair is not same as its output key-value pair, then it can not be used as a combiner -because combiner's output will became input on the reducer side !

here is an example:
reducer input key val : < IntWritable, IntWritable >
reducer output key val: < Text, Text >

if this reducer is used as combiner, then the combiner will output <text, text> and reducer will receive <text, text> as input - and boom - wrong key class exception !

November 23, 2010

java.lang.InstantiationException hadoop

java.lang.InstantiationException definition:

Thrown when an application tries to create an instance of a class using the newInstance method in class Class, but the specified class object cannot be instantiated because it is an interface or is an abstract class.

I get this exception for setting input reader to FileInputFormat
FileInputFormat is an abstract class !
job.setInputFormatClass(~~FileInputFormat.class~~)

Default is TextInputFormat and it can be used instead..
job.setInputFormatClass(TextInputFormat.class)

exception:

Exception in thread "main" java.lang.RuntimeException: java.lang.InstantiationException
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:123)
        at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getInputFormatMap(MultipleInputs.java:109)
        at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:58)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:401)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:418)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:338)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:960)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:976)
        at nyu.cs.webgraph.LinkGraphUrlIdReplacement.phase1(LinkGraphUrlIdReplacement.java:326)
        at nyu.cs.webgraph.LinkGraphUrlIdReplacement.main(LinkGraphUrlIdReplacement.java:351)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
Caused by: java.lang.InstantiationException
        at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:121)
        ... 14 more