Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

votes

1 answer

How to iterate through lines in MRJob

I have a text document full of lines of tweets that I need to run a MapReduce job on. I am using Python and MRJob to do so with the following code: from mrjob.job import MRJob import re import datetime class exerciseOne(MRJob): def…

python mapreduce mrjob

asked Oct 16 '18 at 17:09

faboys

votes

0 answers

How to access hdfs files direclty in python?

I am working on Hadoop and Spark Framework for clustering of images. I am using Python as my programming language.For map-reduce framework MRJOB package is used. The doubt i am having is how to access the hdfs files directly in python? For example…

python python-2.7 apache-spark hadoop mrjob

asked Sep 10 '18 at 20:16

Alay Majmudar

votes

1 answer

mrjob returned non-zero exit status 256

I'm new to map reduce and I'm trying to run a map reduce job using mrjob package of python. However, I encountered this error: ERROR:mrjob.launch:Step 1 of 1 failed: Command '['/usr/bin/hadoop', 'jar',…

python hadoop mrjob common-crawl

asked Aug 31 '18 at 04:16

kkesley

3,258
1
28
55

votes

1 answer

mrjob add_file_arg() csv file

I'm having trouble understanding how to use the add_file_arg() for mrjob. I'm trying to pass a csv to my mapper with a person's attributes and find the attributes for each person in my mapper. This is my code thus far: class MRPeopleScores(MRJob): …

python python-3.x mrjob

asked May 18 '18 at 23:27

person10559

votes

1 answer

Creating new SparkContext for each SparkStep in MRJob/ pySpark

I am new to pySpark and I'm trying to implement a multi-step EMR/Spark job using MRJob, do I need to create a new SparkContext for each SparkStep, or can I share the same SparkContext for all SparkSteps? I tried to look up the MRJob manual but…

pyspark amazon-emr mrjob

asked Apr 04 '18 at 20:35

vkc

votes

1 answer

How to prematurely terminate MrJob reducer?

I want to use MapReduce to filter a huge dataset for rare entities satisfying some criteria. I could speed this up a lot by terminating reducers once they violate the criteria, since they will be computing on entities that I'm not interested in. To…

python mapreduce filtering reduce mrjob

asked Mar 31 '18 at 20:20

crypdick

16,152
7
51
74

votes

1 answer

Not a valid jar when I was running an example of Hadoop

I am learning Hadoop recently. I am using sandbox on virtualbox. I downloaded a python script with mrjob frame and run the following command, python RatingsBreakdown.py -r hadoop --hadoop-streaming-jar…

hadoop mapreduce hadoop-streaming mrjob

asked Jan 28 '18 at 12:04

Jacob

votes

0 answers

Recreate Python dictionary results in MapReduce?

Can't get my head around why standard Python code produces an unexpected result when translated to MapReduce using mrjob. Example data from a .txt file: 1 12 1 14 1 15 1 16 1 18 1 12 2 11 2 11 2 13 3 12 3 15 3 11 3 10 This code creates…

python hadoop mapreduce mrjob

asked Dec 03 '17 at 21:57

RDJ

4,052
9
36
54

votes

1 answer

Failed package installation in Python

I am trying to install the Mrjob package for Python and I get the following error: AJs-MacBook-Pro-13:~ aj$ conda install -c asmeurer mrjob Fetching package metadata ............. Solving package specifications: . UnsatisfiableError: The following…

python failed-installation mrjob

asked Dec 02 '17 at 06:22

aj31

votes

1 answer

How to process rows from SQL query with MRJob

I am having hard time figuring out how MRJob works. I am trying to make an sql query and yield its rows, and in the documentation there is no such thing explained in details. My code so far: # To be able to give db file as option. def…

python python-2.7 sqlite mrjob

asked Nov 06 '17 at 13:08

B1nd0

votes

1 answer

mapreduce for word frequency in Python

I want my python program to output a list of the top ten most frequently used words and their associated word count. I have to use mrjob - mapreduce to created this program. I wrote a program that finds the frequency of the words and outputs them in…

python hadoop mapreduce mrjob

asked Oct 23 '17 at 23:39

Anna

votes

0 answers

run several jobs in a single file with mr job

I have different jobs in separate py file. These jobs do different operations on the csv file. Can I join all these jobs in cascading mode into one file and save the final output to the csv file?

python csv jobs mrjob

asked Sep 14 '17 at 22:53

hesse

votes

0 answers

Use MRJOB to count bigram: accur type error

I am a newcomer using map-reduce program with Mrjob. I need to use Mrjob to count bi-grams. Here is my code: import mrjob from mrjob.job import MRJob import re from itertools import islice, izip import itertools WORD_RE =…

python dictionary reduce cpu-word mrjob

asked Jul 19 '17 at 07:14

Rita Xia

votes

1 answer

Java error:org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedPartitioner not org.apache.hadoop.mapred.Partitioner

Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.mapreduce.lib.partition.KeyFieldBasedPartitioner not org.apache.hadoop.mapred.Partitioner at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:2273) at…

python hadoop mrjob

asked Jul 17 '17 at 18:42

Raj

votes

1 answer

-partitioner : class not found : org.apache.Hadoop.mapred.lib.KeyFieldBasedPartitioner

I am writing MRjob and want to partition my reducer output on key based. And I am using these options and get following error.How to use keyfieldbasedpartitioner? Do I need download something for this. And MRJOB is written in python. Step 1 of 1…

python hadoop-streaming mrjob

asked Jul 16 '17 at 16:31

Raj

Prev 1 2 3

…

22 23 Next