parsing registry text dumps into pandas

Question

I have a registry text dump like below

[HKLM\CurrentControlSet\somerandomthing]

[HKLM\CurrentControlSet\somerandomthing\somerandomsub]
"key1"=hex(7)"1234aa\
     123451234567788\
     124123412341234
     1243"
"key2":C:\randomlocaltioninmydrive
"key3"-somerandomstuffwithanyoutput

this goes on for days, multiple duplicates, multiple different key value pair types.

How can i put this data into a pandas dataframe similar to the below output

Path                                                  Type    Key      Value           
HKLM\CurrentControlSet\somerandomthing\somerandomsub  hex(7)  key1    1234aa1234512345677881241234123412341243
HKLM\CurrentControlSet\somerandomthing\somerandomsub  N/A     key2    C:\randomlocaltioninmydrive
HKLM\CurrentControlSet\somerandomthing\somerandomsub  N/A     key3    somerandomstuffwithanyoutput

I have attempted to use configparser.rawconfigparser to no avail. This dataset is a raw hklm.txt file from the registry dump on a windows box.

You could write a parser that converts the registry data into a format`pandas` can read out of. However, I'm sure someone else has already done this before. Perhaps [regparse](https://github.com/sysforensics/python-regparse) is what you need? — Aleksey Bilogur, Jan 20 '18 at 22:05

score 0 · Accepted Answer · edited Nov 05 '19 at 12:20

Current most effecient method so far to place this into pandas dataframe: (Edited the previous answer to make it better and usable)

import ConfigParser
import io



sample_config = r"""
[HKEY_LOCAL_MACHINE\SOFTWARE\ODBC]

[HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI]

[HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\XXX_DB]
"Driver"="C:\\Windows\\system32\\SQLSRV32.dll"
"Server"="192.168.1.1"
"Database"="AAA"
"LastUser"="bb"

[HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\xxx_db]
"Driver"="C:\\Windows\\system32\\sqlncli11.dll"
"Server"="10.8.1.3"
"Database"="XXX"
"LastUser"="DDD"
"""

config = ConfigParser.RawConfigParser(allow_no_value=True)
config.readfp(io.BytesIO(sample_config))

newlist = []
for section in config.sections():
    for (key, val) in config.items(section):
        newlist.append([section, key, val])

df = pd.DataFrame(newlist)

Above return this:

0  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\XXX_DB    "driver"   
1  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\XXX_DB    "server"   
2  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\XXX_DB  "database"   
3  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\XXX_DB  "lastuser"   
4  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\xxx_db    "driver"   
5  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\xxx_db    "server"   
6  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\xxx_db  "database"   
7  HKEY_LOCAL_MACHINE\SOFTWARE\ODBC\ODBC.INI\xxx_db  "lastuser"   

                                        2  
0   "C:\\Windows\\system32\\SQLSRV32.dll"  
1                           "192.168.1.1"  
2                                   "AAA"  
3                                    "bb"  
4  "C:\\Windows\\system32\\sqlncli11.dll"  
5                              "10.8.1.3"  
6                                   "XXX"  
7                                   "DDD"

what is `config` is this a variable, a file. could you update your code so that a newbie can understand and use it directly without resorting to guess work.. thank you. honestly i have no clue.. where or how to use the `config.sections()` — ihightower, Nov 05 '19 at 09:53
i have edited answer to make it better. hope i did it correctly! — ihightower, Nov 05 '19 at 10:06

parsing registry text dumps into pandas

1 Answers1