Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

votes

3 answers

Python Module Import Error "ImportError: No module named mrjob.job"

System: Mac OSX 10.6.5, Python 2.6 I try to run the python script below: from mrjob.job import MRJob class MRWordCounter(MRJob): def mapper(self, key, line): for word in line.split(): yield word, 1 def reducer(self, word,…

python module path mrjob

asked Nov 16 '10 at 23:07

worker1138

2,071
5
29
36

votes

0 answers

mrjob with JSON data

Me and a friend of mine are working on a rather large JSON file. We want to perform MapReduce on parts of this file, being as speedy as possible. As it appears to be hard to feed a JSON file directly into a "mrjob job", we attempted to write the…

python json mrjob

asked Nov 19 '16 at 11:23

Superdids

votes

2 answers

Getting error while running django_cron

When am trying to run the chron job in django using below command python manage.py runcrons its showing one error like below $ python manage.py runcrons No handlers could be found for logger "django_cron" Does any one have any idea about this…

django mrjob django-cron

asked Feb 28 '15 at 10:05

Akshath Kumar

votes

1 answer

How to specifically determine input for each map step in MRJob?

I am working on a map-reduce job, consisting multiple steps. Using mrjob every step receives previous step output. The problem is I don't want it to. What I want is to extract some information and use it in second step against all input and so on.…

python hadoop mapreduce mrjob

asked Sep 28 '14 at 06:20

Mehraban

3,164
4
37
60

votes

1 answer

Why I got "WindowsError [Error5] Access is denied" when run python file using mrjob

I'm trying to use mrjob in a python file and run it in the command line, but I'm keeping getting the error log saying： C:\Users\Ni\Desktop>python si601lab6_sol.py pg1268.txt no configs found; falling back on auto-configuration no configs found;…

python command-line access-denied mrjob windowserror

asked Oct 19 '13 at 05:37

Ni Yan

votes

1 answer

How can I use s3 object names as inputs to an MRJob mapper, but not the s3 objects themselves?

I'm missing something obvious about Yelp's mrjob job library. Setting up an MRJob class is almost trivially easy. Running it over a file or stdin also so. But how can I change the input to the job from a file either locally or in s3, to, say, keys…

python mapreduce boto elastic-map-reduce mrjob

asked May 16 '13 at 22:11

Christopher

42,720
11
81
99

votes

1 answer

MRJob :- Display intermediate values in map reduce

How can I display intermediate values (i.e print a variable or a list ) on the terminal while running the mapreduce program using python MRJob library?

python hadoop mapreduce mrjob

asked Jan 24 '13 at 12:42

Read Q

1,405
2
14
26

votes

4 answers

How does mapreduce sort and shuffle work?

I am using yelps MRJob library for achieving map-reduce functionality. I know that map reduce has an internal sort and shuffle algorithm which sorts the values on the basis of their keys. So if I have the following results after map phase (1, 24)…

hadoop mapreduce mrjob

asked Jan 16 '13 at 08:11

Read Q

1,405
2
14
26

votes

2 answers

How can I allot more memory to Python program? Its not consuming more than 64MB on 4GB RAM

I have a Python program running on some input data on 4GB RAM 32-bit 12.04 Ubuntu. The time and space complexity of the program both are O(n). When input data is around 100 kb it completes the execution in about 4sec with peak RAM consumption being…

python ubuntu memory-management mapreduce mrjob

asked Dec 24 '12 at 09:52

user1403483

votes

1 answer

Can mrjob tasks output sets?

I tried outputting a python set from a mapper in mrjob. I changed the function signatures of my combiners and reducers accordingly. However, I get this error: Counters From Step 1 Unencodable output: TypeError: 172804 When I change the sets to…

mrjob

asked Sep 23 '12 at 23:01

dangerChihuahua007

20,299
35
117
206

votes

1 answer

Minimum AWS policy requirements to run an EMR job

I'd like to run an Elastic Mapreduce on data from the S3 bucket com.test.mybucket, using the MRJob Python framework. However I have lots of other data in S3, and other EC2 instances that I don't want to touch. What is the minimum possible set of…

amazon-web-services elastic-map-reduce mrjob

asked Dec 06 '11 at 19:31

Kevin Burke

61,194
76
188
305

votes

1 answer

MapReduce pairwise comparison of all lines in multiple files

I'm getting started with using python's mrjob to convert some of my long running python programs into MapReduce hadoop jobs. I've gotten the simple word count examples to work and I conceptually understand the 'text-classification' example. However,…

python mapreduce mrjob

asked Jul 10 '11 at 20:38

JudoWill

4,741
2
36
48

votes

0 answers

Problem when using SORT_VALUES in a MapReduce job using mrjob (key-values are not sorted in the reducer input)

I want to create a MapReduce program whose reduce receives k-v pairs sorted by the value. I'm using mrjob, whose SORT_VALUES parameter seemed to be ideal for the task. After setting this parameter to True, the reducer input is not sorted, for…

python hadoop mapreduce mrjob

asked May 21 '19 at 14:10

Agustin Caminero

votes

1 answer

MRJob sort reducer output

Is there any way to sort the output of reducer function using mrjob? I think that the input to reducer function is sorted by the key and I tried to exploit this feature to sort the output using another reducer like below where I know values have…

python sorting mapreduce mrjob

asked Dec 10 '18 at 14:42

Dandelion

votes

1 answer

python mapreduce - Skipping the first line of the .csv in mapper

I am trying to do mapreduce in python and my csv file looks like below, trip_id taxi_id pickup_time dropoff_time ... total 0 20117 2455.0 2013-05-05 09:45:00 50.44 1 44691 1779.0 2013-06-24 11:30:00 66.78 and my…

python csv hadoop mapreduce mrjob

asked May 28 '17 at 21:17

TTaa

Prev 1

…

22 23 Next