Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

votes

1 answer

using sqlite3dbm with mrjob for map reduce

I have a sqlite3dbm which store data in key-value pair. I need to process it using mrjob. When I run my code xyz.py my_db.db, mapper fn doesn't work properly. def mapper(k,val): for word in val: yield(word,k) I get null for k

python mapreduce mrjob

asked Jan 09 '13 at 02:00

user1525721

votes

1 answer

"The location specified by MRJOB_CONF" in mrjob documentation

Which path is "The location specified by MRJOB_CONF" in mrjob documentation? Link to mrjob doc: http://mrjob.readthedocs.org/en/latest/guides/configs-basics.html

hadoop mapreduce hadoop-streaming elastic-map-reduce mrjob

asked Dec 15 '12 at 09:07

user1403483

votes

1 answer

Some elementary doubts about running Mapreduce programs using mrjob on Amazon EMR

I am new to mrjob and I am having problems to get the job running on Amazon EMR. I will write them in sequential order. I can run a mrjob on my local machine. However when I have mrjob.conf in /home/ankit/.mrjob.conf and in /etc/mrjob.conf, the job…

python hadoop mapreduce elastic-map-reduce mrjob

asked Dec 12 '12 at 08:03

user1403483

votes

1 answer

Import module in MRJob on EMR

Simple question: I have a module headers.py which defines a couple variables I need in my main MRJob script. I should be able to run the job with python MRMyJob -r emr --file=headers.py s3://input/data/path and then in my MRJob script (MRMyJob),…

python hadoop emr mrjob

asked Jul 31 '12 at 14:20

Vyassa Baratham

1,457
12
18

votes

2 answers

Error running python mrjob word count example

I'm trying to run the example word count map reduce task using mrjob. I get the following error: Traceback (most recent call last): File "mr.py", line 3, in from mrjob.job import MRJob File…

python mapreduce emr mrjob

asked Jul 10 '12 at 11:49

nickponline

25,354
32
99
167

votes

2 answers

hadoop with mrjob piping on shell

I have an issue regarding mrjob. I'm using an hadoopcluster over 3 datanodes using one namenode and one jobtracker. Starting with a nifty sample application I wrote something like the following first_script.py: for i in range(1,2000000): …

unix hadoop pipe mrjob

asked May 14 '12 at 00:21

Mad Joker

-1

votes

1 answer

python with hadoop project: how to build a reducer to concatenate pairs of values

I have a small project with MapReduce and since I am new with this I am running into a lot of difficulties so would appreciate the help. In this project, I have a file that contains the nation, year, and weight. I want to find for each nation's year…

python hadoop mapreduce mrjob

asked Jul 03 '22 at 04:16

Frank shi

-1

votes

1 answer

I'm getting a list of lists in my reducer output rather than a paired value and I am unsure of what to change in my code

The code below is giving me nearly the output i want but not quite. def reducer(self, year, words): x = Counter(words) most_common = x.most_common(3) sorted(x, key=x.get, reverse=True) yield (year,…

python counter reducers mrjob

asked Nov 17 '21 at 00:01

CKZ

-1

votes

1 answer

How to sys.stderr.write into a json file in Python?

I am running a MapReduce job with mrjob library and I want to record the execution time to a json file. I record the time with this code: from datetime import datetime import sys if __name__ == '__main__': start_time = datetime.now() …

python hadoop mrjob

asked May 16 '21 at 05:40

huy

1,648
3
14
40

-1

votes

2 answers

Python command line loop

I'm running a mrjob python script, and in the command line I can pass the number of cores for the system to use. python example_script.py --num-cores 5 I'm looking to run the script for n number of cores for beach marking performance test. IE: I…

python command anaconda mrjob

asked Apr 27 '18 at 14:21

F.D

-1

votes

1 answer

How to use mrjob.cat to auto-decompress inputs?

I want to use MrJob to analyze a dataset without decompressing it on disk beforehand (it is 18Gb compressed but >3Tb uncompressed). How can I use use mrjob.cat to auto-decompress the file and stream it to my mapper? There aren't any code samples.

python mapreduce compression mapper mrjob

asked Mar 09 '18 at 21:49

crypdick

16,152
7
51
74

-1

votes

1 answer

How to integrate data with python code before running python program on command line

I have downloaded movielens dataset from that hyperlink ml-100k.zip (it is a movie and user information dataset and it is in the older dataset tab) and i have write the simple MapReduce code like below; from mrjob.job import MrJob class…

python python-3.x mapreduce pycharm mrjob

asked Jul 18 '17 at 11:35

pcpcne

-1

votes

2 answers

Performing a mapreduce function in Python

I'm trying to learn a little bit of mapreduce in combination with Python. Now I have the following code running from a tutorial I'm doing. from mrjob.job import MRJob class SpendByCustomer(MRJob): def mapper(self, _, line): …

python mapreduce mrjob

asked Feb 21 '16 at 12:42

John Dwyer

-1

votes

1 answer

MRJob using a different Python interpreter for local vs. hadoop

I'm using MRJob on machine A to launch MapReduce jobs on machines B_0 thru B_10. The job has dependencies that require it to be run not with the default /bin/python (i.e. the output of which python on machine A) but with /path/to/weird/python, which…

python hadoop mrjob

asked Jan 04 '16 at 19:31

Eli Rose

6,788
8
35
55

-1

votes

3 answers

How can I run mrjob with no input file?

I have a mrjob program, and just get data from sql database, so I don't need read local file or any input file, however mrjob forces me to 'reading from STDIN', so I just create an empty file as input file. It's really ugly, is there a way to run…

python mrjob

asked May 21 '14 at 22:58

user3662858

Prev 1 2 3

…

23 Next