1

I am writing an SQL query in python pandas:

import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
from pandasql import sqldf
pysqldf=lambda q:sqldf(q,globals())
rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;")

Error:

sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

How to switch to Unicode strings? I am using python2.7.

Pulkit Jha
  • 1,709
  • 3
  • 12
  • 18
  • new code: import pandas as pd from pandas import DataFrame, read_csv import numpy as np from pandasql import sqldf pysqldf=lambda q:sqldf(q,globals()) rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;") – Pulkit Jha Nov 19 '14 at 09:38
  • your "new code" is identical to what's in the question - so maybe delete your confusing comment here? – WestCoastProjects Jul 22 '18 at 13:50

1 Answers1

2

According to the python unicode howto:

In Python source code, Unicode literals are written as strings prefixed with the ‘u’ or ‘U’ character: u'abcdefghijk'

In other words, your script should read:

import pandas as pd
from pandas import DataFrame, read_csv
import numpy as np
from pandasql import sqldf
pysqldf=lambda q:sqldf(q,globals())
rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;")

Hope that helps.

hd1
  • 33,938
  • 5
  • 80
  • 91
  • Thanks @hd1. I have changed my code but still getting exactly same error. – Pulkit Jha Nov 19 '14 at 09:12
  • Sir I have just pasted you code. Because only differece i could see was addition of u before select statement. Here is the new code: import pandas as pd from pandas import DataFrame, read_csv import numpy as np from pandasql import sqldf pysqldf=lambda q:sqldf(q,globals()) rolup = pysqldf(u"select MasterUserId,DeviceUsed,hcluster, count(MasterUserId) as Tot_Rec, sum(Visits),sum(PV),sum(TimeSpent) from clstrd_data group by MasterUserId,DeviceUsed,hcluster;") – Pulkit Jha Nov 19 '14 at 09:34