Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce () service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with :

pip install mrjob
331 questions
0
votes
0 answers

EMR and MRJOB: TERMINATED_WITH_ERRORS: The given SSH key name was invalid

I'm having trouble running an example mrjob (https://github.com/Yelp/mrjob) with EMR on AWS. Generate the following error: Using configs in /home/ciceromoura/.mrjob.conf Creating temp directory…
0
votes
0 answers

How do you determine a given word's line index using MapReduce techniques in MrJob?

I would like to create an inverted index using MapReduce techniques with MrJob. The inverted index for a given word x is defined as the line index or indices where x occurs in a given input text file. For example, say x is the word this and the…
0
votes
0 answers

error while executing MRJob on hadoop using windows command

I am trying to execute MRJob on hadoop cluster using windows command. It is working when I write : Python C:\Users\salha\Documents\Thesis\Implementation\Jacobi_2classes.py C:\Users\salha\Documents\Thesis\Implementation\x.txt…
0
votes
1 answer

Merge small files from S3 to create a 10 Mb file

I am new to map reduce. I have a s3 bucket that gets 3000 files every minute. I am trying to use Map reduce to merge these files to make a file between size 10 -100 MB. The python code will use Mrjob and will run on aws EMR. Mrjob's documentation…
0
votes
1 answer

Need to count the number of documents in a particular directory using python - MapReduce

Please find the below program that I'm using. It is compiling but not giving any output. Request to help with error. import gzip import warc import os from mrjob.job import MRJob class DocumentCounter(MRJob): def mapper(self, _, line): …
0
votes
1 answer

How do I use the map reduce function in Python to determine a value?

Below is a list of data on foods you might find at a grocery store. The CSV file below denote the city, food type, average price per pound, and the meal in which that food is consumed in for a city in California. I need to determine using the Map…
0
votes
0 answers

When running MRJob program, it can't create a directory and fails

I have the following program: from mrjob.job import MRJob from mrjob.step import MRStep class RatingsBreakdown(MRJob): def steps(self): return [ MRStep(mapper=self.mapper_get_ratings, …
calin.bule
  • 95
  • 1
  • 15
0
votes
1 answer

Why does python freeze when I use configure_options to send a file to my nodes using add_file_option?

I am trying to use the MRJob package in python. I want to send a file(u.item) along with my code to all the nodes, so I use the configure_options function and use the add_file_option to tell python that I am going to send you a file in my command…
B.Badiei
  • 3
  • 3
0
votes
1 answer

Edit enviroment variables inside python for script bash

my project, which uses mapreduce without hadoop, is composed of two files: bash.sh and mapreduce.py. I would like to use environment variables to communicate the information between bash.sh and mapreduce.py. Within bash.sh I use export myvariable…
giupardeb
  • 791
  • 1
  • 5
  • 13
0
votes
0 answers

Using mrjob in python to find the top 3 cities with the most revenue

I need help outputting the top 3 cities that have the most revenue. Right now I just have all the cities outputting with their total revenues but I need to restrict this output to be just the top 3. I have all cities outputting with their total…
Firestxne
  • 35
  • 3
0
votes
1 answer

How to read JSON string from a line in csv file?

I'm new to MapReduce and MRjob, I am trying to read a csv file that I want to process using MRjob in python. But it has about 5 columns with JSON strings(eg. {}) or an array of JSON strings (eg. [{},{}]), some of them are nested. My mapper so far…
Rabbir
  • 47
  • 1
  • 2
  • 6
0
votes
2 answers

Standard deviation using mrjob in Python is showing the error "file has no attributes to run"

from mrjob.job import MRJob import statistics import sys class MRFindStdev(): def mapper(self, _, line): for number in line.split(','): yield number, float(number) def reducer(self, _, line): numbers =…
0
votes
1 answer

python find max value by mrjob

i would like to find the max value in list by mrjob. when i run this, it always show the error: No configs found; falling back on auto-configuration; No configs specified for inline runner i'd like to know what's the meaning class…
Debo
  • 1
  • 1
  • 4
0
votes
0 answers

How to write a string without quotes to a file with MRJob?

I am using MRJob to yield values and write them to a file... I have the following, where I yield a string(boom) as a key and an int(sum) as a value: boom = str(", ".join(key)).strip('"') yield boom, sum But I get an output that writes quotation…
defoification
  • 315
  • 6
  • 18
0
votes
1 answer

How to calculate the average number from a text file with MRJob

I am a beginner with MrJob and having trouble calculating an average prime number from a text file of prime numbers. I am unsure at which part to apply arithmetic logic and also whether I should yield lists when using MrJob. The text file contains…
Eckersley
  • 79
  • 9