Questions tagged [mrjob]

Mrjob is a Python 2.5+ package that assists the creation and running of Hadoop Streaming jobs

Mrjob is a Python 2.5+ package that assists the creation and running of hadoop Streaming jobs

Mrjob fully supports Amazon’s Elastic MapReduce (emr) service, which allows one to buy time on a Hadoop cluster on an hourly basis. It also works with personal Hadoop clusters.

Mrjob can be installed with pip:

pip install mrjob

331 questions

votes

0 answers

How to solve the error "Object of type function is not json serializable"

I have a mapper and reducer function as below. from mrjob.job import MRJob from mrjob.step import MRStep class SortNumMoviesDesc(MRJob): def steps(self): return [MRStep(mapper=self.mapper_retrieve_counts, reducer =…

python mapreduce mrjob

asked Sep 15 '22 at 08:42

Ajay Ganapathy

votes

0 answers

Python MRJob Script Sorting Results - Top Ten Words Syllable Count

I am trying to make a job that takes in a text file, only processes words that are not in the STOPWORDS set, counts the number of syllables in each word, then returns the top 10 words with the most syllables, sorting the results. I believe…

python hadoop mrjob

asked Jul 04 '22 at 15:08

Tony M

votes

0 answers

Python Hadoop mrjob: subprocess.CalledProcessError: Command returned non-zero exit status 1

I'm using package mrjob on Python3.7 recently. I started hadoop and created an wordaccount.py file, which can calculate the frequency of each word in an .txt file. When I tried to run the file through python3 wordaccount.py -r hadoop…

python hadoop mapreduce hdfs mrjob

asked May 22 '22 at 10:21

yamato

votes

1 answer

How to get the longest word in the MRjob

I'm trying to find the longest word in the text file through letter a->z. from mrjob.job import MRJob import re WORD_RE = re.compile(r"[\w']+") class MRWordFreqCount(MRJob): def mapper(self, _, line): for word in…

python mapreduce mrjob

asked Apr 28 '22 at 17:56

Phat Phat

votes

0 answers

how to run mrjob with hdfs on ubuntu?

i am setting hadoop 3.3.1 on ubuntu. I can run jar file with hfds normaly ( use eclipse, add addition jar lib of hadoop then export). run mrjob local normaly but while i running mrjob with hdfs the errors had come. > python mrjob1.py -r hadoop…

python java hadoop hdfs mrjob

asked Apr 19 '22 at 04:17

robocon20x

votes

1 answer

calculate median of a list of values parallely using Hadoop map-reduce

I'm new to Hadoop mrjob. I have a text file which consists of data "id groupId value" in each line. I am trying to calculate a median of all values in the text file using Hadoop map-reduce. But i'm stuck when it comes to calculate only the median…

python hadoop mapreduce hadoop-streaming mrjob

asked Apr 14 '22 at 13:12

AdamA

votes

1 answer

Finding Top Ten Word Syllable Count

I am trying to make a job that takes in a text file, then counts the number of syllables in each word, then ultimately returns the top 10 words with the most syllables. I'm able to get all of the word/syllable pairs sorted in descending order,…

python hadoop mapreduce mrjob

asked Mar 07 '22 at 20:06

dimension_dweller

votes

1 answer

How to write a MRJob python for matrix addition

I have been trying to make simple matrix addition program with MRJob library. I have created this simple program as with a separate mapper and reducer it works fine locally and on Hadoop cluster now i am trying to create this program on a single…

python hadoop mapreduce mrjob

asked Feb 28 '22 at 21:33

Edward Newgate

votes

0 answers

my mapper function doesn't unpack all the values in python

I got a file that it has lines like this : Name_of_country,somedata,Max or min degree,degree,other data so it goes like this : France,xxx,TMAX,30,.... Germany,xxx,TMIN,40,.... France,xxx,TMIN,10,..... . . . now i tried this code i have wrote…

python mapper mrjob

asked Feb 13 '22 at 13:04

Dalholi_Farsin

votes

1 answer

counting relative frequency in pairs a strips mapreduce

i am new in python and i want to use MrJob package for countind relative frequency of pair words i wrote below code but it doesn't make correct output. can you plz help me with my mistakes? (|) = (, )/()=(, )/∑A' (′ , ) import re from collections…

python mapreduce mrjob

asked Dec 19 '21 at 13:06

Learner

votes

1 answer

How to count same item with multi parameters in mrjob in python?

I'm trying to write a map-reduce function in python. I have a file that contains product information and I want to count the number of products that are members of the same category and have the same version. like this:

python mapreduce bigdata mrjob word-frequency

asked Nov 28 '21 at 20:00

user17488887

votes

1 answer

my code is outputting a tuple of values and I would like it to be in individual pairs, i need help to understand how to modify it

def mapper(self, _, line): stop_words = set(["to", "a", "an", "the", "for", "in", "on", "of", "at", "over", "with", "after", "and", "from", "new", "us", "by", "as", "man", "up", "says", "in", "out", "is", "be", "are", "not", "pm", "am", "off",…

python mapper mrjob

asked Nov 16 '21 at 22:01

CKZ

votes

1 answer

Write a job that counts the frequencies of word first letters in a file. So if there are three words starting with "c" answer would be "c 3"

I have the below code and get the word count but getting the first letter frequency of all the words I don't understand how to do this. If there are three words starting with C in the file I would expect the outcome to be "C 3". I don't need to…

python mrjob

asked Oct 31 '21 at 08:28

CKZ

votes

1 answer

Cannot run MapReduce job on AWS EMR Spark application

I am trying to run this example from mrjob about running a word count MapReduce job on AWS EMR. This is the word count code example from mrjob: from mrjob.job import MRJob class MRWordFrequencyCount(MRJob): def mapper(self, _, line): …

python apache-spark amazon-emr mrjob

asked Jul 20 '21 at 14:40

huy

1,648
3
14
40

votes

1 answer

How to import other python modules and packages

I have the following project structure, work_directory: merge.py a_package (i.e. a python file merge.py and a directory a_package under the directory "work_directory") I wrote a MapReduce job using MRJob in merge.py, in which I need to…

python hadoop mapreduce hadoop2 mrjob

asked Jul 08 '21 at 17:54

luw

Prev 1 2 3

…

22 23 Next