Questions tagged [hadoop-streaming]

Hadoop streaming is a utility that allows running map-reduce jobs using any executable that reads from standard input and writes to standard output.

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer and script should be able to read from standard input and write to standard output.

Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. Hadoop Streaming uses Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.

For example:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

Ruby Example:

hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar \
-input input/ncdc/sample.txt \
-output output \
-mapper ch02/src/main/ruby/max_temperature_map.rb \
-reducer ch02/src/main/ruby/max_temperature_reduce.rb

Python Example:

hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar \
-input input/ncdc/sample.txt \
-output output \
-mapper ch02/src/main/ruby/max_temperature_map.py \
-reducer ch02/src/main/ruby/max_temperature_reduce.py
871 questions
-1
votes
1 answer

Hadoop mapreduce using 2 mapper and 1 reducer using c++

Following the instructions on this link, I implemented a wordcount program in c++ using single mapper and single reducer. Now I need to use two mappers and one reducer for the same problem. Can someone help me please in this regard?
-1
votes
1 answer

Transforming a JSON file in Hadoop

I have 100GB of JSON files whose each row looks like this: {"field1":100, "field2":200, "field3":[{"in1":20, "in2":"abc"},{"in1":30, "in2":"xyz"}]} (It's actually a lot more complicated, but for this'll do as a small demo.) I want to process it to…
user1265125
  • 2,608
  • 8
  • 42
  • 65
-1
votes
1 answer

Bash on Hadoop Streaming

I have written a simple bash script. The exact code is here. ideone.com/8XQCjH #!/bin/bash if ! bzip2 -t "$file" then printf '%s is corrupted\n' "$file" rm -f "$file" #echo "$file" "is corrupted" >> corrupted.log else tar -xjvf…
prog_guy
  • 796
  • 3
  • 7
  • 24
-1
votes
1 answer

I have to implement hadoop, so it can process the data of call detail records?

I have configured HDFS, Datanode and namenode and also hbase. I have stored a CDR csv file in HDFS. So how can I map it with Hbase and make ready to process it?
-1
votes
1 answer

How to save Word doc to HDFS

I am new to Hadoop and wanted to know the easiest way for someone to save a word document file that automatically gets sent to HDFS
-1
votes
2 answers

Implementing R programs in hadoop System

I have written Mapper and Reducer programs using R language. I am using the Hadoop streaming utility to execute the R programs on hadoop. My constraint is that i need to input 2 text files to the mapper program. How to achieve it? Kindly assist at…
user2500875
  • 33
  • 1
  • 2
  • 8
-1
votes
4 answers

New user SSH hadoop

Installation of hadoop on single node cluster , any idea why do we need to create the following Why do we need SSH access for a new user ..? Why should it be able to connect to its own user account? Why should i specify a password less for a new…
Surya
  • 3,408
  • 5
  • 27
  • 35
-1
votes
1 answer

Which is better for running recommendations on a Hadoop cluster, Apache Mahout or using R with Hadoop (via hadoop streaming/RHIPE/RHadoop etc)?

I am new to Big-data and looking for a good platform to perform recommendations,clustering and classification. I understand Mahout has many algorithms to do this. Also R itself being a very good analytical tool is more than helpful for achieving…
Kiran Karanth
  • 133
  • 1
  • 1
  • 8
-1
votes
1 answer

How to process a apache log file with hadoop using python

I am very newbie to hadoop and unable to understand the concept well, I had followed below process Installed Hadoop by seeing here Tried the basic examples in tutorial by seeing here and worcount example in python and working fine with…
Shiva Krishna Bavandla
  • 25,548
  • 75
  • 193
  • 313
-2
votes
1 answer

Calculate average temperature in reducer

I am trying to write a code that would calculate average temperature (reducer.py) based on ncdc…
-2
votes
1 answer

Spark 1.6: Store dataframe into multiple csv file in hdfs (partition by id)

I'm trying to save a dataFrame into csv partition by id, for that I'm using spark 1.6 and scala. The function partitionBy("id") dont give me the right result. My code is here : validDf.write .partitionBy("id") …
Spark
  • 1
  • 1
-2
votes
1 answer

how to load a file into pig with multiple delimiter?

I have below file tax_cal I want to load in pig: 101,5|2;3|2 102,3|1;4.5|2;4|1 103,2|1;5|2;5.6|3 output: 101,5|2,3|2 102,3|1,4.5|2,4|1 103,2|1,5|2,5.6|3 Further, I will pass this output file to a python UDF to calculate totalprice. How can I…
Harshit Kakkar
  • 117
  • 2
  • 12
-2
votes
2 answers

Passing parameter in hive is not working

Passing parameter in hive is not working for me. My code: hive> set x='test variable'; hive> ${hiveconf:x}; I get this error: FAILED: Parse Error: line 1:0 cannot recognize input near ''test variable'' '' ''
Shetty
  • 29
  • 3
-2
votes
1 answer

I resarch about HDFS failures. For this I need to HDFS logs . Where can I download the logs?

I resarch about HDFS failures. For this I need to HDFS logs . Where can I download the logs ?
-2
votes
7 answers

How to Import/Load .csv file in PIG?

lets suppose there is a text file tab limited (datetemp.txt) I want to load this text file in pig for processing but when I am typing below line its giving me error as : grunt> inputfile= load '/training/pig/datetemp.txt' using PigStorage() As…
Prix
  • 19
  • 1
  • 5
1 2 3
58
59