Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

vote

1 answer

MRJob same key gets sent to different reducers

So I have Hadoop 2.7.1 installed on a 3 machine cluster. I'm trying to run an inverted index mapreduce job using MRJob and Hadoop Streaming. Here's my configuration: MRJob.SORT_VALUES = True def steps(self): JOBCONF_STEP1 = { …

python hadoop partitioning hadoop-streaming mrjob

asked Jul 09 '16 at 01:51

Jack

vote

1 answer

Error when running mapreduce function

I have the following dataframe userID movieID rating timestamp 1 1 9 12 1 2 10 13 I called this dataframe mapper1.txt and stored it in the same dir as this python file: from mrjob.job import MRJob class MRRatingCounter(MRJob): …

python mapreduce mrjob

asked Jun 09 '16 at 13:13

Frits Verstraten

2,049
7
22
41

vote

1 answer

Bootstrapping dependencies on Amazon EMR with python Mrjob

I am trying to run a map reduce job on Amazon EMR with python Mrjob and I have some trouble installing dependencies. My mrjob code: from mrjob.job import MRJob import re from normalize import * from compute_features import * #Some code The…

python mrjob

asked May 20 '16 at 13:07

A. Rigoureau

vote

1 answer

Python mrjob mapreduce how to preprocess the input file

I am trying to pre-process a XML file to extract certain nodes before putting into mapreduce. I have the following code: from mrjob.compat import jobconf_from_env from mrjob.job import MRJob from mrjob.util import cmd_line, bash_wrap class…

python hadoop mrjob bigdata

asked Apr 06 '16 at 04:30

DigitalPig

vote

1 answer

TotalOrderPartitioner and mrjob

How does one specify the TotalOrderPartitioner when using mrjob? Is this the default, or must it be specified explicitly? I've seen inconsistent behavior on different data sets.

hadoop-streaming mrjob hadoop-partitioning totalorderpartitioner

asked Feb 26 '16 at 04:30

vy32

28,461
37
122
246

vote

1 answer

How to populate a postgresql database with Mrjob and Hadoop

I would like to populate a database of Postgresql by using a mapper with MrJob and Hadoop 2.7.1. I currently using the following code: # -*- coding: utf-8 -*- #Script for storing the sparse data into a database by using Hadoop import psycopg2 import…

postgresql python-2.7 hadoop mrjob

asked Jan 13 '16 at 04:19

Nacho

vote

1 answer

Running Map/Reduce python programs from Sublime Text 2

I just started a tutorial series on map reduce and Hadoop. The set up instructions call for using an IDE called Canopy with MRjob. I have installed both, and everything works. But... If Canopy is just a Python IDE couldn't i use anything in its…

python sublimetext2 mrjob

asked Jan 11 '16 at 19:22

StillLearningToCode

2,271
4
27
46

vote

1 answer

Read multiple HDFS files or S3 files with mrjob?

I have a large amount of data stored in an HDFS system (or, alternatively, in Amazon S3). I want to process it using mrjob. Unfortunately, when run mrjob and give the HDFS file name or the containing directory name, I get an error. For example, here…

hadoop mrjob

asked Dec 07 '15 at 03:29

vy32

28,461
37
122
246

vote

1 answer

Regular expressions in python map reduce: Counting words with «ñ» and accented vowels

I use a regular expression in order to manipulate accented vowels and «ñ» in spanish texts in the following way: WORD_REGEXP = re.compile(r"[a-zA-Záéíóúñ]+") Although it works fine with any string, when I execute the map reduce program, it doesn't…

python regex mapreduce mrjob

asked Dec 06 '15 at 08:57

Alvaro Fierro Clavero

vote

0 answers

How to read images from Hadoop sequence file using opencv and MrJob?

I created sequence file from tar file full of images with tar-to-seq.jar.Now i want to create images out of bytes from that sequence file and to analyze them. Im using opencv 3.0.0 and mrjob 0.5 version. Im having troubles to read the image using…

opencv hadoop mrjob sequencefile

asked Dec 03 '15 at 13:49

Milos Miletic

vote

2 answers

hadoop python job on snappy files produces 0 size output

When I run wordcount.py (python mrjob http://mrjob.readthedocs.org/en/latest/guides/quickstart.html#writing-your-first-job) using hadoop streaming on a text file it gives me the output, but when the same is run against .snappy files I got zero size…

hadoop hadoop-streaming mrjob

asked Nov 11 '15 at 12:03

user3369417

vote

0 answers

MrJob's MRJob.set_up_logging() function deprecated?

I'm writing a wrapper to run MrJob jobs with, and it's working quite well but I'd like to be able to deliver the Python stack trace from the job if it throws an exception. Originally, when something went wrong (i.e. an assert in the job code fails)…

python python-2.7 hadoop mapreduce mrjob

asked Oct 02 '15 at 19:24

Eli Rose

6,788
8
35
55

vote

1 answer

MrJob spends a lot of time Copying local files into hdfs

The problem I'm encountering is this: Having already put my input.txt (50MBytes) file into HDFS, I'm running python ./test.py hdfs:///user/myself/input.txt -r hadoop --hadoop-bin /usr/bin/hadoop It seems that MrJob spends a lot of time copying…

hadoop hdfs mrjob

asked Sep 27 '15 at 11:21

Nikos

vote

0 answers

Running MapReduce job on hadoop remote cluster

I would like to know if there is a method for running MapReduce Job into a Hadoop remote cluster. In my University there is a cluster which has Hadoop installed, so I have been learning MapReduce for distributing Machine Learning jobs. However, I…

python hadoop mrjob

asked Sep 24 '15 at 21:00

Nacho

vote

1 answer

socket.gaierror when trying to run emr using python mrjob

I currently trying to learn mrjob and how to implement it in AWS EMR so please forgive me if I am asking already asked question [searched many places but did not find the answer] and sorry if it is a silly question This is my python script : from…

python emr mrjob

asked Sep 23 '15 at 07:28

The6thSense

8,103
8
31
65

Prev 1 2 3

…

22 23 Next