0

I have a sql script that is saved as binary data. I read it in the standard way.

 with open('data.sql', 'rb') as f:
      var = f.read()
      var_text = var.decode('utf-8', errors='replace)

When I go to print the var_text, it shows as normal text

 print(var_text)
 >>>> �-----------------------------------------------------------------------------
 -- Propensity MSF Managed Investing (MI) 2.0.0 r
    

But when variable itself is still in its byte representation, which means I can't perform regex on the script. I need to be able to save the text in it's string representation form so I can search for patterns.

var_text
 >>>> '��-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00-\x00\r\x00\n\x00-\x00-\x00 \x00P\x00r\x00o\x00p\x00e\x00n\x00s\x00i\x00t\x00y\x00 \x00M\x00S\x00F\x00 \x00M\x00a\x00n\x00a\x00g\x00e\x00d\x00 \x00I\x00n\x00v\x00e\x00s\x00t\x00i\x00n\x00g\x00 \x00(\x00M\x00I\x00)\x00 \x002\x00.\x000\x00.\x000\x00 \x00r\x00'

I was under the assumption that decoding the bytes would do the trick, but no dice. How can I save the object as plain text?

Todd Shannon
  • 527
  • 1
  • 6
  • 20
  • Can't you just `var_text.replace('\x00', '')`? – Axe319 Sep 16 '21 at 16:59
  • @Axe319 I only put this example of the text there for illustration. I could of course eliminate the dashes, but the entire file is a sql script where I'll be extracting with regex table names and create statements. There's not benefit to removing the dashes. – Todd Shannon Sep 16 '21 at 17:06
  • `\x00` is the way python represents a null character. `var_text = var_text.replace('\x00', '')` is just stripping the null characters from the text. It already is in plain text. The real offender here is whatever is placing the null characters in the first place. Which may have a good reason for being there. – Axe319 Sep 16 '21 at 17:10
  • See https://stackoverflow.com/questions/38883476/how-to-remove-those-x00-x00/38883536 – Axe319 Sep 16 '21 at 17:11
  • 1
    Wow, that worked...can't believe I missed that. Thanks @Axe319 – Todd Shannon Sep 16 '21 at 17:33

0 Answers0