Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

votes

2 answers

How to execute a job on a server on Lambda without waiting for the response?

I am trying to spawn a mapreduce job using the mrjob library from AWS Lambda. The job takes longer than the 5 minute Lambda time limit, so I want to execute a remote job. Using the paramiko package, I ssh'd onto the server and ran a nohup command to…

python amazon-web-services aws-lambda paramiko mrjob

asked May 25 '17 at 16:02

user2820906

votes

1 answer

how does this sentence(yield "lines", 1) work in mrjob's Official document

I'm trying to understand the official example for mrjob clearly def mapper(self, _, line): yield "chars", len(line) yield "words", len(line.split()) yield "lines", 1 def reducer(self, key, values): yield key,…

python mrjob

asked May 24 '17 at 10:57

reddevilzs

votes

2 answers

MapReduce job container killed by Google Cloud Platform [Error code:143]

I tried to run a mapreduce job on a cluster in Google Cloud Platform using Python package mrjob as follows: python mr_script.py -r dataproc --cluster-id [CLUSTER-ID] [gs://DATAFILE_FOLDER] I can successfully run the very same script against the…

hadoop mapreduce google-cloud-platform google-cloud-dataproc mrjob

asked Apr 30 '17 at 04:13

vkc

votes

1 answer

Parsing HTML .txt files in Hadoop via MapReduce using Python

I am very new to using the Hadoop platform and defining MapReduce functions, and I am having a difficult time trying to understand why this mapper is not working in my MapReduce script. I am trying to parse a collection of pages written as a string…

python parsing hadoop mapreduce mrjob

asked Apr 29 '17 at 02:48

Wilson

votes

1 answer

Amazon EMR: while attaching EBS volume to instance how to be sure that this volume is to be used

In my mrjob.conf i make settings for the additional volume: Instances.InstanceGroups.member.2.EbsConfiguration.EbsBlockDeviceConfigs.member.1.VolumeSpecification.SizeInGB: 250 …

amazon-web-services amazon-emr amazon-ebs mrjob

asked Apr 19 '17 at 20:12

mirt

1,453
1
17
35

votes

1 answer

Python Error: No module named mrjob.job

I am running the following simple script from the book and getting the following error from mrjob.job import MRJob class MRWordCount(MRJob): def mapper(self, _, line): for word in line.split(): yield(word, 1) def reducer(self, word,…

python python-import mrjob

asked Apr 04 '17 at 08:49

muazfaiz

4,611
14
50
88

votes

1 answer

Python: How can I index in MapReduce(MRJob)?

I want to index the result of reducer like this : 1 "EZmocAborM6z66rTzeZxzQ" 2 "FIk4lQQu1eTe2EpzQ4xhBA" 3 "myql3o3x22_ygECb8gVo7A" 4 "ojovtd9c8GIeDiB8e0mq2w" 5 "uVEoZmmL9yK0NMgadLL0CQ" My Python MRJob code : class MRUserDic(MRJob): …

python hadoop mapreduce mrjob

asked Mar 31 '17 at 03:45

user3595632

5,380
10
55
111

votes

1 answer

How to run a MRJob in a local Hadoop Cluster with Hadoop Streaming?

I'm currently taking a Big Data Class, and one of my projects is to run my Mapper/Reducer on a Hadoop Cluster which is set up locally. I've been using Python along with the MRJob library for the class. Here is my current Python Code for the…

python hadoop mrjob

asked Mar 06 '17 at 00:50

J.Halon

votes

1 answer

Hadoop Error: Error launching job , bad input path : File does not exist.Streaming Command Failed

I am running an MRJob on Hadoop cluster & I am getting the following error: No configs found; falling back on auto-configuration Looking for hadoop binary in $PATH... Found hadoop binary: /usr/local/hadoop/bin/hadoop Using Hadoop version…

python hadoop mrjob

asked Feb 27 '17 at 03:29

bhoots21304

votes

0 answers

Hadoop Error "Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1"

mapper.py is working fine. I ran mapper.py on my cluster and stored its output in part-0.txt. Excatly like a word-count job, I am trying to count the occurrences of every distinct key stored in part-0.txt file. I tried copy-pasting the code from…

python hadoop mrjob bigdata

asked Feb 23 '17 at 05:38

bhoots21304

votes

2 answers

Running MapReduce from Jupyter Notebook

I am trying to run MapReduce from Jupyter Notebook on a dataset in u.data file, but I keep receiving an error message that says "TypeError: 'str' object doesn't support item deletion". How can I make the code runs successfully? The u.data…

python jupyter-notebook mrjob

asked Feb 19 '17 at 23:18

Fxs7576

1,259
4
23
31

votes

1 answer

mrjob does not work on Amazon EMR 5.x, but does run on EMR4.8.3

I'm using mrjob on Amazon EMR. It works without flaw on EMR 4.8.3, but when I run it on EMR 5.x (any of them), something goes bonkers in the hadoop streaming API and I just get a lot of errors. My mrjob program is a very simple program that does…

amazon-web-services amazon-emr mrjob

asked Jan 30 '17 at 21:03

vy32

28,461
37
122
246

votes

1 answer

Top N Record MapReduce on Python

I am new on MapReduce and I have a very simple question. I solved WordCount problem and then I want to change the problem as Top N record on text. Although I sort all the words on text but I can not take last N value. First, I read text and send…

python mapreduce mrjob

asked Jan 20 '17 at 12:05

ugur

votes

0 answers

Company name matching Common Crawl using mrjob

I have a list of company name and details like ph.no, address, email etc.,. I want to get their company_url. We thought of using google API to make requests but it turns out to be costly. After searching I found Common_Crawl which was somewhat close…

python mrjob common-crawl

asked Dec 21 '16 at 14:41

Python master

votes

2 answers

Read the text from a file and sort according to numbers

I have a text file, say: cat 2 dog 4 bird 20 animal 3 I want to read this file and sort like this (according to numbers): cat 2 animal 3 dog 4 bird 20 Code tried so far: def txtsort(self, _, line): words = [] …

python python-2.7 mrjob

asked Dec 04 '16 at 23:27

bharath

Prev 1 2 3

…

22 23 Next