2

I was messing around with a zip file cracker and decided to use the multiprocessing module to speed the process up. It was a complete pain since it was my first time using the module and I don't even fully understand it yet. However, I got it to work.

The problem is that it doesn't complete the word list; it just stops at random puts during the word list, and if the password is found it continues to go through the word list instead of just stopping the process.

Does anyone know why it's exhibiting this behaviour?

Source Code For ZipFile Cracker

#!/usr/bin/env python3

import multiprocessing as mp
import zipfile # Handeling the zipfile
import sys # Command line arguments, and quiting application
import time # To calculate runtime

def usage(program_name):
    print("Usage: {0} <path to zipfile> <dictionary>".format(program_name))
    sys.exit(1)

def cracker(password):
    try:
        zFile.extractall(pwd=password)
        print("[+] Password Found! : {0}".format(password.decode('utf-8')))
        pool.close()
    except:
        pass

def main():
    global zFile
    global pool

    if len(sys.argv) < 3:
        usage(sys.argv[0])

    zFile = zipfile.ZipFile(sys.argv[1])

    print("[*] Started Cracking")

    startime = time.time()
    pool = mp.Pool()

    for i in open(sys.argv[2], 'r', errors='ignore'):
        pswd = bytes(i.strip('\n'), 'utf-8')
        pool.apply_async(cracker, (pswd,))

    print (pswd)
    runtime =  round(time.time() - startime, 5)
    print ("[*] Runtime:", runtime, 'seconds')
    sys.exit(0)

if __name__ == "__main__":
    main()
Michael Currie
  • 13,721
  • 9
  • 42
  • 58

3 Answers3

2

You are terminating your program too early. To test this out, add a harmless time.sleep(10) in the cracker method and observe your program still terminating within a second.

Call join to wait for the pool to finish:

pool = mp.Pool()
for i in open(sys.argv[2], 'r', errors='ignore'):
    pswd = bytes(i.strip('\n'), 'utf-8')
    pool.apply_async(cracker, (pswd,))

pool.close()  # Indicate that no more data is coming
pool.join()   # Wait for pool to finish processing

runtime =  round(time.time() - startime, 5)
print ("[*] Runtime:", runtime, 'seconds')
sys.exit(0)

Additionally, once you find the right password, calling close just indicates that no more future tasks are coming - all tasks already submitted will still be done. Instead, call terminate to kill the pool without processing any more tasks.

Furthermore, depending on the implementation details of multiprocessing.Pool, the global variable pool may not be available when you need it (and its value isn't serializable anyways). To solve this problem, you can use a callback, as in

def cracker(password):
    try:
        zFile.extractall(pwd=password)
    except RuntimeError:
        return
    return password

 def callback(found):
     if found:
         pool.terminate()
 ...
 pool.apply_async(cracker, (pswd,), callback=cb)

Of course, since you now look at the result all the time, apply is not the right way to go. Instead, you can write your code using imap_unordered:

with open(sys.argv[2], 'r', errors='ignore') as passf, \
         multiprocessing.Pool() as pool:
     passwords = (line.strip('\n').encode('utf-8') for line in passf)
     for found in pool.imap_unordered(cracker, passwords):
         if found:
             break

Instead of using globals, you may also want to open the zip file (and create a ZipFile object) in each process, by using an initializer for the pool. Even better (and way faster), forgo all of the I/O in the first place and read just the bytes you need once and then pass them on to the children.

phihag
  • 278,196
  • 72
  • 453
  • 469
  • Thanks phihag, multiprocessing now works fine thanks to you but i ran into another problem. When I run the application a strange error is raised saying that zFile is not defined even thou I made it as a global variable and when I try to pass the zFile object to cracker the function doesn't even run. I would appreciate your help. Thanks in advance. – Abdulrahman7ossam Jul 10 '15 at 23:00
  • @NightHawk: Please see my answer, specifically the advice to _explicitly pass resources to child processes_. – Rusty Shackleford Jul 10 '15 at 23:08
  • @NightHawk I've amended my answer as well, with more of an explanation. See the other answers for the full code. – phihag Jul 10 '15 at 23:41
  • I keep getting random errors when i use bigger word lists. one of them is: zlib.error: Error -3 while decompressing data: invalid distance too far back – Abdulrahman7ossam Jul 11 '15 at 00:51
1

phihag's answer is the correct solution.

I just wanted to provide an additional detail regarding calling terminate() when you've found the correct password. The pool variable in cracker() was not defined when I ran the code. So trying to invoke it from there simply threw an exception:

NameError: name 'pool' is not defined

(My fork() experience is weak, so I don't completely understand why the global zFile is copied to the child processes successfully while pool is not. Even if it were copied, it would not be the same pool in the parent process, right? So any methods invoked on it would have no effect on the real pool in the parent process. Regardless, I prefer this advice listed within the multiprocessing module's Programming guidelines section: Explicitly pass resources to child processes.)

My suggestion is to make cracker() return the password if it is correct, otherwise return None. Then pass a callback to apply_async() that records the correct password, as well as terminating the pool. Here's my take at modifying your code to do this:

#!/usr/bin/env python3

import multiprocessing as mp
import zipfile # Handeling the zipfile
import sys # Command line arguments, and quiting application
import time # To calculate runtime
import os

def usage(program_name):
    print("Usage: {0} <path to zipfile> <dictionary>".format(program_name))
    sys.exit(1)

def cracker(zip_file_path, password):
    print('[*] Starting new cracker (pid={0}, password="{1}")'.format(os.getpid(), password))

    try:
        time.sleep(1) # XXX: to simulate the task taking a bit of time
        with zipfile.ZipFile(zip_file_path) as zFile:
            zFile.extractall(pwd=bytes(password, 'utf-8'))
        return password
    except:
        return None

def main():
    if len(sys.argv) < 3:
        usage(sys.argv[0])

    print('[*] Starting main (pid={0})'.format(os.getpid()))

    zip_file_path = sys.argv[1]
    password_file_path = sys.argv[2]
    startime = time.time()
    actual_password = None

    with mp.Pool() as pool:
        def set_actual_password(password):
            nonlocal actual_password
            if password:
                print('[*] Found password; stopping future tasks')
                pool.terminate()
                actual_password = password

        with open(password_file_path, 'r', errors='ignore') as password_file:
            for pswd in password_file:
                pswd = pswd.strip('\n')
                pool.apply_async(cracker, (zip_file_path, pswd,), callback=set_actual_password)

        pool.close()
        pool.join()

    if actual_password:
        print('[*] Cracked password: "{0}"'.format(actual_password))
    else:
        print('[*] Unable to crack password')
    runtime =  round(time.time() - startime, 5)
    print("[*] Runtime:", runtime, 'seconds')
    sys.exit(0)

if __name__ == "__main__":
    main()
Rusty Shackleford
  • 1,111
  • 2
  • 14
  • 19
0

Here's an implementation of the advice from @phihag's and @Equality 7-2521's answers:

#!/usr/bin/env python3
"""Brute force zip password.

Usage: brute-force-zip-password <zip archive> <passwords>
"""
import sys
from multiprocessing import Pool
from time import monotonic as timer
from zipfile import ZipFile

def init(archive): # run at the start of a worker process
    global zfile
    zfile = ZipFile(open(archive, 'rb')) # open file in each process once

def check(password):
    assert password
    try:
        with zfile.open(zfile.infolist()[0], pwd=password):
            return password # assume success
    except Exception as e:
        if e.args[0] != 'Bad password for file':
            # assume all other errors happen after the password was accepted
            raise RuntimeError(password) from e

def main():
    if len(sys.argv) != 3:
        sys.exit(__doc__) # print usage

    start = timer()
    # decode passwords using the preferred locale encoding
    with open(sys.argv[2], errors='ignore') as file, \
         Pool(initializer=init, initargs=[sys.argv[1]]) as pool: # use all CPUs
        # check passwords encoded using utf-8
        passwords = (line.rstrip('\n').encode('utf-8') for line in file)
        passwords = filter(None, passwords) # filter empty passwords
        for password in pool.imap_unordered(check, passwords, chunksize=100):
            if password is not None:  # found
                print("Password: '{}'".format(password.decode('utf-8')))
                break
        else:
            sys.exit('Unable to find password')
    print('Runtime: %.5f seconds' % (timer() - start,))

if __name__=="__main__":
    main()

Note:

  • each worker process has its own ZipFile object and the zip file is opened once per process: it should make it more portable (Windows support) and improve time performance
  • the content is not extracted: check(password) tries to open and immediately closes an archive member on success: it is safer and it should improve time performance (no need to create directories, etc)
  • all errors except 'Bad password for file' while decrypting the archive member are assumed to happen after the password is accepted: the rational is to avoid silencing unexpected errors -- each exception should be considered individually
  • check(password) expects nonempty passwords
  • chunksize parameter may drastically improve performance
  • a rare for/else syntax is used, to report cases when the password is not found
  • the with-statement calls pool.terminate() for you
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Thank you, your help is appreciated – Abdulrahman7ossam Jul 11 '15 at 00:14
  • Sorry to bother you but the script u wrote works fine with small word lists but with large word lists this error is raised: RuntimeError: File is encrypted, password required for extraction any thoughts? – Abdulrahman7ossam Jul 11 '15 at 00:29
  • The same error is now raised for any word list. – Abdulrahman7ossam Jul 11 '15 at 00:36
  • @NightHawk: the code works as is. I've tested it with millions passwords. Make sure you haven't deleted `pwd=` part. – jfs Jul 11 '15 at 00:51
  • I created a new file and copy and pasted the code you shared with me. Here is a screenshot highlighting the 'pwd=' bit and the error i get in the terminal. https://www.dropbox.com/s/i5xvtnbwl5nv0rx/Screen%20Shot%202015-07-11%20at%204.10.34%20AM.png?dl=0 Thanks again and sorry for being annoying <3 Note: the word list I used is just 500 words. – Abdulrahman7ossam Jul 11 '15 at 01:16
  • @NightHawk: your input has empty passwords. I've updated the code. – jfs Jul 11 '15 at 01:25
  • so sorry to do this but I know get the error zlib.error: Error -3 while decompressing data: invalid distance too far back – Abdulrahman7ossam Jul 11 '15 at 01:34
  • @NightHawk: If the `zlib.error` can happen only for a valid password then catch it and return `password` or raise an exception mentioning the password. It seems unrelated to `multiprocessing`. – jfs Jul 11 '15 at 02:11
  • I upgrade python to latest version everything runs great now. Without your code run-time for a specific zip file would take 120 seconds with a exceptionally large word list now it only takes 30 seconds. Many thanks sir. – Abdulrahman7ossam Jul 11 '15 at 02:43
  • @NightHawk: I've updated the code, to avoid trying to extract all archive members for each password. It should improve time performance. – jfs Jul 11 '15 at 10:05