First one is
isSplitable
, determines whether file is splittable or not. Next three variables,
mapred.min.split.size
, mapred.max.split.size
, dfs.block.size
determine the actual split size used if input is splittable. By default, min split size is 0 and max split size is Long.MAX
and block size 64MB. For actual split size; minSplitSize&blockSize set the lower bound and blockSize&maxSplitSize together sets the upper bound. Here is the function to calculate:max(minsplitsize, min(maxsplitsize, blocksize))
Note: compressed input files (eg. gzip) are not splittable, there are patches * * available.