0

I am having hard time figuring out how MRJob works. I am trying to make an sql query and yield its rows, and in the documentation there is no such thing explained in details.

My code so far:

# To be able to give db file as option.
def configure_options(self):
    super(MyClassName, self).configure_options()
    self.add_file_option('--database')

def mapper_init(self):
    # Making sqlite3 database available to mapper.
    self.sqlite_conn = sqlite3.connect(self.options.database)
    self.command= '''
        SELECT id
        FROM names
        '''

def mapper(self,_,val):        
    yield self.sqlite_conn.execute(self.command), 1

And in console I write

python myfile.py text.txt --database=mydb.db

Where text.txt is an empty dummy file so the script will not ask for std input.

I am expecting the output to be:

id1, 1
id2, 1

But now there is no output. What am I missing?

B1nd0
  • 110
  • 1
  • 8

1 Answers1

1

I found the solution myself, in case someone needs it later. In this example, the database path is given as an option in command line.

def configure_options(self):
    super(MyClassName, self).configure_options()
    self.add_file_option('--database')

def mapper_init(self):
    # make sqlite3 database available to mapper
    self.sqlite_conn = sqlite3.connect(self.options.database)
    self.command = '''
        SELECT id
        FROM table
        '''

def mapper(self,_,val):        
    queryResult = self.sqlite_conn.execute(self.command)
    while 1:
        row = queryResult.fetchone()
        if row == None:
            break
        yield row[0], 1

Executing from command line:

python myfilename.py dummy.txt --database=mydatabase.db

Note that when you add a dummy text file, it should contain only one row, since the mapper will run as many times as many rows there are in the text file.

B1nd0
  • 110
  • 1
  • 8