Questions tagged [hadoop-streaming]

Hadoop streaming is a utility that allows running map-reduce jobs using any executable that reads from standard input and writes to standard output.

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer and script should be able to read from standard input and write to standard output.

Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. Hadoop Streaming uses Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.

For example:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

Ruby Example:

hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar \
-input input/ncdc/sample.txt \
-output output \
-mapper ch02/src/main/ruby/max_temperature_map.rb \
-reducer ch02/src/main/ruby/max_temperature_reduce.rb

Python Example:

hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar \
-input input/ncdc/sample.txt \
-output output \
-mapper ch02/src/main/ruby/max_temperature_map.py \
-reducer ch02/src/main/ruby/max_temperature_reduce.py

871 questions

-1

votes

1 answer

Hadoop mapreduce using 2 mapper and 1 reducer using c++

Following the instructions on this link, I implemented a wordcount program in c++ using single mapper and single reducer. Now I need to use two mappers and one reducer for the same problem. Can someone help me please in this regard?

asked Sep 08 '14 at 14:00

user3532122

-1

votes

1 answer

Transforming a JSON file in Hadoop

I have 100GB of JSON files whose each row looks like this: {"field1":100, "field2":200, "field3":[{"in1":20, "in2":"abc"},{"in1":30, "in2":"xyz"}]} (It's actually a lot more complicated, but for this'll do as a small demo.) I want to process it to…

python hadoop hadoop-streaming

asked Aug 27 '14 at 12:56

user1265125

2,608
8
42
65

-1

votes

1 answer

Bash on Hadoop Streaming

I have written a simple bash script. The exact code is here. ideone.com/8XQCjH #!/bin/bash if ! bzip2 -t "$file" then printf '%s is corrupted\n' "$file" rm -f "$file" #echo "$file" "is corrupted" >> corrupted.log else tar -xjvf…

bash hadoop-streaming

asked Aug 26 '14 at 09:28

prog_guy

-1

votes

1 answer

I have to implement hadoop, so it can process the data of call detail records?

I have configured HDFS, Datanode and namenode and also hbase. I have stored a CDR csv file in HDFS. So how can I map it with Hbase and make ready to process it?

hadoop hadoop-streaming hadoop2 hadoop-plugins hadoop-partitioning

asked Jul 23 '14 at 15:22

user3869412

-1

votes

1 answer

How to save Word doc to HDFS

I am new to Hadoop and wanted to know the easiest way for someone to save a word document file that automatically gets sent to HDFS

hadoop hdfs hadoop-streaming

asked Jun 11 '14 at 16:07

user3722865

-1

votes

2 answers

Implementing R programs in hadoop System

I have written Mapper and Reducer programs using R language. I am using the Hadoop streaming utility to execute the R programs on hadoop. My constraint is that i need to input 2 text files to the mapper program. How to achieve it? Kindly assist at…

r hadoop mapreduce hadoop-streaming mapper

asked Aug 13 '13 at 05:51

user2500875

-1

votes

4 answers

New user SSH hadoop

Installation of hadoop on single node cluster , any idea why do we need to create the following Why do we need SSH access for a new user ..? Why should it be able to connect to its own user account? Why should i specify a password less for a new…

hadoop hadoop-streaming hadoop-plugins hadoop-partitioning

asked Jul 23 '13 at 08:45

Surya

3,408
5
27
35

-1

votes

1 answer

Which is better for running recommendations on a Hadoop cluster, Apache Mahout or using R with Hadoop (via hadoop streaming/RHIPE/RHadoop etc)?

I am new to Big-data and looking for a good platform to perform recommendations,clustering and classification. I understand Mahout has many algorithms to do this. Also R itself being a very good analytical tool is more than helpful for achieving…

r hadoop mahout hadoop-streaming rhadoop

asked Jun 18 '13 at 07:21

Kiran Karanth

-1

votes

1 answer

How to process a apache log file with hadoop using python

I am very newbie to hadoop and unable to understand the concept well, I had followed below process Installed Hadoop by seeing here Tried the basic examples in tutorial by seeing here and worcount example in python and working fine with…

python apache hadoop hadoop-streaming log-files

asked Nov 02 '12 at 09:08

Shiva Krishna Bavandla

25,548
75
193
313

-2

votes

1 answer

Calculate average temperature in reducer

I am trying to write a code that would calculate average temperature (reducer.py) based on ncdc…

python hadoop mapreduce hadoop-streaming

asked Dec 02 '22 at 04:45

dStudent

-2

votes

1 answer

Spark 1.6: Store dataframe into multiple csv file in hdfs (partition by id)

I'm trying to save a dataFrame into csv partition by id, for that I'm using spark 1.6 and scala. The function partitionBy("id") dont give me the right result. My code is here : validDf.write .partitionBy("id") …

scala apache-spark hadoop hadoop-streaming

asked Mar 25 '20 at 16:04

Spark

-2

votes

1 answer

how to load a file into pig with multiple delimiter?

I have below file tax_cal I want to load in pig: 101,5|2;3|2 102,3|1;4.5|2;4|1 103,2|1;5|2;5.6|3 output: 101,5|2,3|2 102,3|1,4.5|2,4|1 103,2|1,5|2,5.6|3 Further, I will pass this output file to a python UDF to calculate totalprice. How can I…

python apache-pig hadoop-streaming

asked Aug 27 '17 at 20:25

Harshit Kakkar

-2

votes

2 answers

Passing parameter in hive is not working

Passing parameter in hive is not working for me. My code: hive> set x='test variable'; hive> ${hiveconf:x}; I get this error: FAILED: Parse Error: line 1:0 cannot recognize input near ''test variable'' '' ''

hadoop hive hiveql hadoop-streaming

asked Oct 03 '16 at 03:06

Shetty

-2

votes

1 answer

I resarch about HDFS failures. For this I need to HDFS logs . Where can I download the logs?

I resarch about HDFS failures. For this I need to HDFS logs . Where can I download the logs ?

hadoop hadoop2 hadoop-streaming hadoop-partitioning webhdfs

asked May 05 '16 at 08:07

Mehdi Medadian

-2

votes

7 answers

How to Import/Load .csv file in PIG?

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as : grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As…

hadoop apache-pig bigdata hadoop-streaming

asked Sep 01 '14 at 03:55

Prix

Prev 1 2 3

…

59 Next