Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

vote

4 answers

Is there a good library that helps chain MapReduce jobs using Hadoop Streaming and Python?

This question answers part of my question but not completely. How do I run a script that manages this, is it from my local filesystem? Where exactly do things like MrJob or Dumbo come into picture? Are there any more alternative? I am trying to run…

hadoop mapreduce machine-learning hadoop-streaming mrjob

asked Dec 17 '12 at 19:01

incogmind

vote

2 answers

MRJob MR assign to Dictionary instead of Yield?

I'm new to MRJob and MR and I was wondering in the traditional word count python example for MRJob MR: from mrjob.job import MRJob class MRWordCounter(MRJob): def mapper(self, key, line): for word in line.split(): yield…

python dictionary mapreduce mrjob

asked Sep 25 '12 at 03:31

Michael

7,087
21
52
81

vote

1 answer

How can I cannot index into the values list of reduce?

I am using in-mapper combining in a Map Reduce job via the Python mrjob module. Because I wrote a mapper_final function that emits a single pair, I am sure that only a single key-value pair is emitted to my reducers. However, my reduce function is…

mapreduce mrjob

asked Sep 23 '12 at 20:43

dangerChihuahua007

20,299
35
117
206

vote

2 answers

How do all the reducers come up with a single answer?

I am beginning to learn MapReduce with the mrjob python package. mrjob documentation lists the following snippet as an example MapReduce script. """The classic MapReduce job: count the frequency of words. """ from mrjob.job import MRJob import…

mapreduce mrjob

asked Sep 17 '12 at 14:22

dangerChihuahua007

20,299
35
117
206

vote

3 answers

Write some data (lines) from my mappers to separate directories depending on some logic in my mapper code

I am using mrjob for my EMR needs. How do I write some data (lines) from my mappers to "separate directories" depending on some logic in my mapper code that I can: tar gzip and upload to separate S3 buckets (depending on the directory name) after…

hadoop elastic-map-reduce mrjob

asked Jun 18 '12 at 21:59

newToFlume

vote

2 answers

Is there a way to determine the filename passed to a map job in Hadoop/Dumbo/Mrjob?

All, I am working on creating an interface for dealing with some massive data and generating arff files for doing some machine learning stuff with. I can currently collect the features- but I have no way of associating them with the files they were…

python hadoop mrjob

asked Apr 17 '12 at 03:35

sampwing

1,238
1
10
13

votes

1 answer

How to receive a list of dictionaries as an argument for a MRJob job?

I understand how to programmatically receive the output, as well as how to run a MRJob job. This is clearly explained here. However I'm struggling to understand how to pass a list of dictionaries or any variables from another file into a MrJob job.…

python hadoop mrjob

asked Mar 27 '23 at 20:42

Kayer

votes

0 answers

splitting comma separated data in python

SOLVED solution at the end of the question.... I'm making a map reduce code using MRjob in python and i have a CSV dataset following are few rows from the dataset. column headings Year Length Title Genre Actor Actress Director …

python split mapreduce mrjob

asked Mar 22 '23 at 21:53

hadi khan

votes

0 answers

How do I sort the output of this MapReduce MRJob task

I have trouble sorting the output of this map reduce task. It has to be sorted in the order of words then years. I have tried the following code but it does not return sorted output. from mrjob.job import MRJob class Job(MRJob): def…

python mapreduce mrjob

asked Mar 19 '23 at 14:19

Grit 1000

votes

0 answers

MRJob program not showing any optput

i have implemented a python program using Mrjob to capture network packets and then plotting the graph. from mrjob.job import MRJob import socket import struct import sys import time import matplotlib.pyplot as plt import pyshark class…

python mrjob

asked Feb 12 '23 at 10:41

Muhammad Hani

votes

0 answers

mrjob configure_args() error: unrecognized arguments

I can't figure out what the error is in my case when creating an argument via add_file_arg() for mrjob. I'm trying to pass names from csv to my mapper and find attributes for each name in the mapper. This is my code so far: from mrjob.job import…

python hadoop mapreduce mrjob

asked Jan 18 '23 at 21:39

Berenika

votes

0 answers

Does backtrader or backtesting.py work with mapreduce and/or mrjob?

Would it be possible to backtest using either backtesting.py or backtrader doing mapreduce with the mrjob library or another? Unsure if backtrader or backtesting.py works with mapreduce/mrjob or if we will have to write some extra code to use…

python mapreduce mrjob backtrader pybacktest

asked Dec 02 '22 at 12:49

Andy

votes

1 answer

Conversion from String to Integer is not working while using MRJob

I'm writing a simple program which uses the mrjob library to map and reduce rows from a csv file. One of the columns from a row is a yearID. This column is by default read in as a Str. I need to convert it to an Int so that I can compare it. For…

python python-3.x type-conversion mrjob

asked Nov 16 '22 at 05:37

Joe Cranney

votes

0 answers

TypeError: cannot unpack non-iterable float object - MapReduce - mrjob

I'm testing a simple example to learn about MapReduce and mrjob. The goal is to sum up the logarithm of all the numbers and divide the count of all numbers by this summation. The code is pretty easy and straightforward: # mrMedian.py from mrjob.job…

python mapreduce hadoop-streaming mrjob

asked Nov 13 '22 at 15:54

Shahriar.M

votes

1 answer

Run Python mrjob in a Kubernetes on Hadoop Cluster

I'm exploring this python package mrjob to run MapReduce jobs in python. I've tried running it in the local environment and it works perfectly. I have Hadoop 3.3 runs on Kubernetes (GKE) cluster. So I also managed to run mrjob successfully in the…

kubernetes hadoop mrjob

asked Oct 24 '22 at 13:20

Thisara Watawana

Prev 1 2 3

…

22 23 Next