0

I am creating a rainbow table with strings and hashes separated by spaces in a table. The rainbow table looks like this:

j)O 3be44b195706cdd25e29d2b01a0e88d4
j)P a83079350701398672677a9ffe07108c
j)Q 2952c4654c127f2bb1086b75d8f1f986
j)R 6621ec6e1ba3c3669259894db8cde339
j)S 0442a2ee045e1913cd2eb094e8945399

I want to know how I can make a python program to search for a string and find a hash or vice versa.

I have made it search the whole document, but I want it to only search a specific column.

I used panda and I can make it search now in a specific column but I want it only to find exact matchs:

working_table = pd.read_csv('rainbow_table_md5.txt', sep = ' ', names=["string", "hash"])
print(working_table['hash'].where(working_table['string'] == input(colored("String: ", 'cyan'))))

The code right now outputs this:

String: a
0           0cc175b9c0f1b6a831c399e269772661
1                                        NaN
2                                        NaN

                          ...               
14094701                                 NaN
14094702                                 NaN

Name: hash, Length: 14094731, dtype: object

I don't need all the other lines other than the match in row 0

Ideally I only need the hash as the output.

  • 1
    If you want it in a table format, pandas is the way to go. `import pandas as pd df=pd.read_csv(sep=' ')`, but it might be a lot of overhead for what you're trying to do. Please provide a [mcve] for your task – G. Anderson Mar 26 '19 at 22:01
  • Yes Please provide MCVE. If your checking for specific portions of the hash keys, the re module might give you what you need. – RockAndRoleCoder Mar 26 '19 at 22:32
  • Sorry for providing very little information. I updated the question. – UserBlackBox Mar 26 '19 at 22:52

1 Answers1

0

You want "lookup" rather than "search", since only an exact match matters. Pandas might be overkill for this application. A pair of dictionaries suffices:

class Rainbow:

    def __init__(self, infile, k=20):
        self.s_to_hash = {s: hash
                          for s, hash in self._read_tuples(infile)}
        self.hash_to_s = {hash[:k]: s
                          for s, hash in self.s_to_hash.items()}
        self.k = k

    @staticmethod
    def _read_tuples(infile):
        with open(infile) as fin:
            for line in fin:
                s, hash = line.strip().split()
                yield s, hash

Choosing k < 32 is an attempt to save some memory, at the (small) risk of having hashes collide based on their common prefix. Tune it up or down to taste, based on your memory, table size, and appetite for collision risk. Consider writing a getter function and then making hash_to_s private.

Storing bytes would be twice as memory efficient compared to storing ascii hex nybbles.

J_H
  • 17,926
  • 4
  • 24
  • 44