Highest Voted 'input-split' Questions

1

vote

1 answer

Efficiency of NLineInputFormat's InputSplit calculations

I looked into getSplitsForFile() fn of NLineInputFormat. I found that a InputStream is created for the input file & then its iterated and splits are created every n lines. Is it efficient? Particularly when this read operation is happening on 1 node…

java hadoop input-split

asked Aug 16 '14 at 07:12

S Kr

1,831
2
25
50

1

vote

0 answers

splits in map reduce jobs

I have an input file on which I need to customize the RecordReader. But, the problem here is, the data may get distributed across different input splits and different mapper may get the data which should be consumed by the first mapper. For e.g. A…

hadoop mapreduce input-split recordreader

asked Dec 04 '13 at 12:20

user3065762

11
1

1

vote

1 answer

Hadoop FileSplit reading

Assume a client application that uses a FileSplit object in order to read the actual bytes from the corresponding file. To do so, an InputStream object has to be created from the FileSplit, via code like: FileSplit split = ... // The FileSplit…

java hadoop inputstream filesplitting input-split

asked Apr 23 '13 at 22:17

PNS

19,295
32
96
143

1

vote

0 answers

Hadoop map.input.start not a line boundary?

It seems that the map.input.start property isn't giving me the position of the start of a line (except, of course, the first map.input.start which is 0). Sometimes, map.input.start is somewhere in the middle of the first line of the mapper's input,…

hadoop streaming input-split

asked Jul 11 '12 at 13:51

Vyassa Baratham

1,457
12
18

0

votes

0 answers

Apache Crunch map reduce job setting input split size not working

I have the following scenario: Multiple map reduce jobs using apache crunch. These jobs are scheduled using Oozie. Lets consider only one job for simplicity. What i want to achieve is reducing the number of mappers of that job. The number of mappers…

hadoop mapreduce apache-crunch input-split

asked Mar 14 '23 at 12:17

Stefan Ss

45
5

0

votes

1 answer

AttributeError: 'builtin_function_or_method' object has no attribute 'split' (3)

My code takes two inputs in one string inside the for loop. And I want to split that input to fill two variables. Here's my code: P = int(input()) #Principal amt T = int(input()) #Total tenure N1 = int(input()) #Number of slabs of interest rates by…

python python-3.x input-split

asked Apr 13 '22 at 17:15

Ayush Verma

1
1

0

votes

1 answer

MapReduce basics

I have a text file of 300mb with block size of 128mb. So total 3 blocks 128+128+44 mb would be created. Correct me - For map reduce default input split is same as block size that is 128mb which can be configured. Now record reader will read…

mapreduce input-split recordreader

asked Dec 11 '17 at 20:28

Boron

1
1

0

votes

1 answer

InputSplits in mapreduce

I have just started learning Mapreduce and have some queries I want answers to. Here goes: 1) Case 1: FileInputFormat as Input format. A directory having multiple files to be processed is the Input Path. If I have n files, all of the files lesser…

hadoop mapreduce input-split

asked Nov 17 '17 at 06:41

user1808266

61
4

0

votes

0 answers

How and where is input split size mentioned or passed to a MR program?

I understand what input split size and what a block size means. But what I am trying to understand is where and how the input split size is mentioned for a MR program... is it passed a parameter while starting a MR job using (Hadoop jar MRPROGRAM…

hadoop mapreduce input-split bigdata

asked Jul 26 '17 at 08:15

samshers

1
6
37
84

0

votes

1 answer

hadoop - how would input splits form if a file has only one record and the size of file is more than block size?

example to explain the question - i have a file of size 500MB (input.csv) the file contains only one line (record) in it so how the file will be stored in HDFS blocks and how the input splits would be computed ?

hadoop mapreduce hdfs input-split

asked Mar 02 '16 at 04:28

Ankush Rathi

622
1
6
26

0

votes

1 answer

Input Splits in Hadoop

If the input file size is 200MB, there will be 4 blocks/ input splits, but each data node will have a mapper running on it. If all the 4 input splits are in the same data node, then only one map task will be executed? or how does the number of map…

hadoop input-split

asked Feb 11 '16 at 07:00

Harshi

189
1
4
20

0

votes

1 answer

Mapper not executing on the hostname returned from getLocations() of InputSplit in Hadoop

I have extended the InputSplit class of Hadoop to calculate my custom input split, however while am returning a particular HostIP(i.e datanode IP) as string for the overridden getLocations(), the Map Task for it is not being executed on that HostIP…

hadoop mapreduce hadoop-yarn hadoop2 input-split

asked Sep 27 '15 at 12:30

Sushil Ks

403
2
10
18

0

votes

1 answer

Location of HadoopPartition

I have a dataset in a csv file that occupies two blocks in HDFS and replicated on two nodes, A and B. Each node has a copy of the dataset. When Spark starts processing the data, I have seen two ways how Spark loads the dataset as input. It either…

apache-spark load-balancing input-split

asked Jul 03 '15 at 17:56

Freddie Feng

11
1

0

votes

1 answer

jackson jsonparser restart parsing in broken JSON

I am using Jackson to process JSON that comes in chunks in Hadoop. That means, they are big files that are cut up in blocks (in my problem it's 128M but it doesn't really matter). For efficiency reasons, I need it to be streaming (not possible to…

json hadoop jackson input-split recordreader

asked Dec 04 '14 at 16:13

xmar

1,729
20
48

0

votes

1 answer

Does hadoop job submitter while calculating splits takes record boundries into account?

This question is NOT a duplicate of: How does Hadoop process records split across block boundaries? I've one question regarding the input split calculation. As per the hadoop guide 1) the InputSplits respect record boundaries 2) At the same time it…

hadoop mapreduce input-split

asked Aug 01 '14 at 14:50

user3105943

13
1
5

Questions tagged [input-split]