SQL Query in pandas python

Question

I am writing an SQL query in python pandas:

import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
from pandasql import sqldf
pysqldf=lambda q:sqldf(q,globals())
rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;")

Error:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

How to switch to Unicode strings? I am using python2.7.

new code: import pandas as pd from pandas import DataFrame, read_csv import numpy as np from pandasql import sqldf pysqldf=lambda q:sqldf(q,globals()) rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;") — Pulkit Jha, Nov 19 '14 at 09:38
your "new code" is identical to what's in the question - so maybe delete your confusing comment here? — WestCoastProjects, Jul 22 '18 at 13:50

score 2 · Answer 1 · answered Nov 19 '14 at 08:58

2

According to the python unicode howto:

In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character: u'abcdefghijk'

In other words, your script should read:

import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
from pandasql import sqldf
pysqldf=lambda q:sqldf(q,globals())
rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;")

Hope that helps.

answered Nov 19 '14 at 08:58

hd1

33,938
5
80
91

Thanks @hd1. I have changed my code but still getting exactly same error. – Pulkit Jha Nov 19 '14 at 09:12
Sir I have just pasted you code. Because only differece i could see was addition of u before select statement. Here is the new code: import pandas as pd from pandas import DataFrame, read_csv import numpy as np from pandasql import sqldf pysqldf=lambda q:sqldf(q,globals()) rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;") – Pulkit Jha Nov 19 '14 at 09:34

SQL Query in pandas python

1 Answers1