1

I am learning how to crack zip files using dictionary attacks. This is the code:

import zipfile
from threading import Thread

def extractFile(zFile, password):
    try:
        zFile.extractall(pwd=password)
        print '[+] Found password ' + password + '\n'
    except:
        pass

def main():
    zFile = zipfile.ZipFile('evil.zip')
    passFile = open('dictionary.txt')
    for line in passFile.readlines():
        password = line.strip('\n')
        extractFile(zFile, password)

if __name__ == '__main__':
    main()

I use threading on it

import zipfile
from threading import Thread

def extractFile(zFile, password):
    try:
        zFile.extractall(pwd=password)
        print '[+] Found password ' + password + '\n'
    except:
        pass

def main():
    zFile = zipfile.ZipFile('evil.zip')
    passFile = open('dictionary.txt')
    for line in passFile.readlines():
        password = line.strip('\n')
        t = Thread(target=extractFile, args=(zFile, password))
        t.start()

if __name__ == '__main__':
    main()

However, when I time the two programmes, it takes 90 seconds to complete the first but nearly 300 seconds to complete the second. The dictionary contains 459026 entries. I am baffled as to why this happens. I also tried limiting the threads to 10,20,so on. But still the loop performs faster at each instance. Can anybody explain why this is so?? Also is there any chance to improve the program at all.

EDIT I tried slicing as suggested by Ray as follows:

import zipfile
from threading import Thread

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

def extractFile(zFile, passwords):
    for password in passwords:
        try:
            zFile.extractall(pwd=password)
            print '[+] Found password ' + password + '\n'
            sys.exit(0)
        except:
            continue

def main():
    zFile = zipfile.ZipFile('evil.zip')
    with open('dictionary.txt', 'rb') as pass_file:
        passwords = [i.strip() for i in pass_file]
    passes = list(chunks(passwords, 10))
    for pas in passes:
        t = Thread(target=extractFile, args=(zFile, pas))
        t.start()

if __name__ == '__main__':
    main()

Still takes 3-4 mins

Echchama Nayak
  • 971
  • 3
  • 23
  • 44

2 Answers2

2

One reason why this does not work properly for multiprocessing; you must open the zip file in each subprocess, otherwise you can be hurt by sharing filehandles. Then create only a handful (say 2 * number of cores) subprocesses, and let a single subprocess test multiple passwords.

Thus we get:

import zipfile
from multiprocessing import Process


def extract_file(passwords):
    with zipfile.ZipFile('evil.zip') as zipf:
        for password in passwords:
            try:
                zipf.extractall(pwd=password)
                print('[+] Found password {}\n'.format(password))
            except Exception as e:
                pass


def main():
    with open('dictionary.txt', 'rb') as pass_file:
        passwords = [i.strip() for i in pass_file]

    N_PROC = 8
    for i in range(N_PROC):
        p = Process(target=extract_file, args=[passwords[i::N_PROC]])
        p.start()


if __name__ == '__main__':
    main()
1

Can anybody explain why this is so??

I think that, in addition to the problem of the Global Interpreter Lock (GIL), you might be using the threads incorrectly.

Judging from the loop, you're starting a completely new thread for every password line in your file -i.e. just to make a single attempt. Starting a new thread for only a single attempt is, as you've discovered, expensive and not working out as you expected. If you do this using multiprocessing, then it'll be even slower because creating a completely new process just to for a single try is even more expensive than creating a thread.

Is there any chance to improve the program at all?

I suggest you:

  • break up the passwords into several sub-lists/groups (i.e. slicing)
  • create a thread (or process) for each of these sub-lists
  • let each thread/process consume a group (i.e. make multiple attempts and get more out of them)

For example, if you have 100 lines in the file, you could break it up into 4 parts (i.e. 25 passwords per sub-list) and use these to feed 4 threads/processes (i.e. one for each sub-list).

Using multiprocessing here would be advantageous because you can avoid the GIL. However, keep in mind that you'd still have multiple processes accessing the same file simultaneously, so make sure you account for this when trying to extract the file, etc.

You should take care not to overwhelm your PC cores. You might want to use a process pool (see python docs) and limit the amount of processes you create to the number of cores in your PC as a maximum (perhaps your_core_count - 1 to keep it responsive).

Then, as each process consumes a sub-list and terminates, a new process is created (or existing one re-assigned, if using a process pool) to handle yet another sub-list waiting in your queue. If one of the children completes successfully, then you might want to get the parent process to kill all the other children to avoid unnecessary resource usage.

code_dredd
  • 5,915
  • 1
  • 25
  • 53
  • I tried slicing as shown in the edit in the main question. Is this what you suggested? – Echchama Nayak Feb 13 '16 at 09:40
  • 1
    @Ekoji: Something along those lines, but not quite. I still think you should use *processes*, not threads, to get *parallelism* instead of concurrency, and that feeding only 10 passwords for each one is a waste. Use larger values. With your 400K+ entry dictionary, in a 4-core system, I'd start with ~100K entries per core (play around w/ values and measure). You must distribute the work *evenly* and make sure you give each process enough work to do. Until then, it'll probably be a waste. That's why your code is still slower than Antti's rewrite, based on my original set of suggestions. – code_dredd Feb 13 '16 at 10:57