0

I need help outputting the top 3 cities that have the most revenue. Right now I just have all the cities outputting with their total revenues but I need to restrict this output to be just the top 3.

I have all cities outputting with their total revenues.

"""
 Python script to find the total amount of sales revenue for each city
 using Map-Reduce framework (mapper, combiner, and reducer functions) with mrjob package
 4/14/17
"""
from mrjob.job import MRJob

class CityRevenue(MRJob):
# each input lines consists of city, productCategory, price, and paymentMode
    def mapper(self, _, line):
        # create a key-value pair with key: city and value: price
        line_cols = line.split(',')
        yield line_cols[0], float(line_cols[2])
    def combiner(self, city, counts):
        # consolidates all key-value pairs of mapper function (performed at mapper nodes)
        yield city, sum(counts)
    def reducer(self, city, counts):
        # final consolidation of key-value pairs at reducer nodes
        yield city, '${:,.2f}'.format(sum(counts))
if __name__ == '__main__':
    CityRevenue.run()

Actual:

"Albuquerque"   "$1,208,490.13"
"Anaheim"       "$1,264,165.71"
"Anchorage"     "$1,191,057.61"
"Arlington"     "$1,229,375.89"
"Atlanta"       "$1,216,153.47"
"Aurora"        "$1,216,807.43"
"Austin"        "$1,208,925.21"
"Bakersfield"   "$1,211,742.46"
"Baltimore"     "$1,225,227.14"
"Baton Rouge"   "$1,214,852.71"
"Birmingham"    "$1,218,785.75"
"Boise"         "$1,216,941.64"
"Boston"        "$1,204,833.39"
"Buffalo"       "$1,190,531.15"
"Chandler"      "$1,192,263.53"
"Charlotte"     "$1,233,641.50"
"Chesapeake"    "$1,242,760.99"
"Chicago"       "$1,219,848.29"
"Chula Vista"   "$1,241,528.46"
"Cincinnati"    "$1,218,642.84"

Expected:

"Anaheim"       "$1,264,165.71"
"Chesapeake"    "$1,242,760.99"
"Chula Vista"   "$1,241,528.46"
Firestxne
  • 35
  • 3
  • You can force the job to use one Reducer and keep a list of the top three cities there, rather than outputting all cities. However this will be fairly inefficient if you're dealing with a lot of Reducer input records. – Ben Watson Apr 16 '19 at 15:52
  • How would I go about forcing the reducer method to just run once? and by list do you mean a traditional list [] ? – Firestxne Apr 16 '19 at 23:25
  • Hi, did you find solution of this question ? – Abdul Haseeb Sep 24 '20 at 12:32

0 Answers0