0

I am very new to PandaSQL and have never used it before. Here is my code up until now:

import pandas as pd
from pandasql import sqldf
import numpy as np

tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252')
query = """SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) 
FROM TASKS 
GROUP BY Work Item Id, Parent Work Item Id;"""

df = sqldf(query, locals()))
print(df.head(5))

I am getting this error:

'pandasql.sqldf.PandaSQLException: (sqlite3.OperationalError) near "Id": syntax error [SQL: 'SELECT Work Item Id, Parent Work Item Id, MAX(Remaining Work) \n'

Any help would be great!

Edit: After implementing some suggestions from other users below, here is my working code:

import pandas as pd
from pandasql import sqldf
import numpy as np
tasks = pd.read_csv("C:/Users/RMahesh/Documents/TASKS_Final_2.csv", encoding='cp1252',  low_memory=False)

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM tasks 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

print(sqldf(query, locals()))
rmahesh
  • 739
  • 2
  • 14
  • 30
  • 1
    it looks like the problem is your select statement and you probably will have problems with the GROUP BY statement as well. I would test firsts with `SELECT * FROM tasks`. I am guessing the column names need to follow snake formatting: work_item_id – It_is_Chris Jun 12 '18 at 18:36
  • @Chris Thanks for getting back. I did exactly that as another user mentioned below and I am getting another error for some reason. – rmahesh Jun 12 '18 at 18:41

1 Answers1

2

If you have column names that contain spaces, you have to quote them to make the SQL valid:

query = """SELECT `Work Item Id`, `Parent Work Item Id`, MAX(`Remaining Work`) 
FROM TASKS 
GROUP BY `Work Item Id`, `Parent Work Item Id`;"""

or

query = """SELECT [Work Item Id], [Parent Work Item Id], MAX([Remaining Work]) 
FROM TASKS 
GROUP BY [Work Item Id], [Parent Work Item Id];"""

In dependence of what flavor PandaSQL expects.

zwer
  • 24,943
  • 3
  • 48
  • 66
  • This seemed to have worked. But I am still getting this message and am unsure of what to do: sys:1: DtypeWarning: Columns (32) have mixed types. Specify dtype option on import or set low_memory=False. – rmahesh Jun 12 '18 at 18:40
  • @rmahesh check this post out: https://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa – It_is_Chris Jun 12 '18 at 18:44
  • @rmahesh - That's unrelated with the SQL itself, it means that your CSV file has mixed data types in some of its columns, you'll have to tell Pandas how to convert those columns. Read [**this answer**](https://stackoverflow.com/a/27232309/7553525) for more info. – zwer Jun 12 '18 at 18:44
  • @zwer I have made an edit with the current code. I am still getting errors numerous different errors. – rmahesh Jun 12 '18 at 18:50
  • @rmahesh - Without a sample of your data and traceback of those errors it would be impossible for us to determine why they occur. They are certainly not related to this question which, I believe, is resolved by my answer above so create a new question to deal with further issues you might encounter. – zwer Jun 12 '18 at 18:53
  • @zwer I unfortunately cannot share any of the data as it would be a corporate violation. I can point out the specific areas of errors here: in to_sql, in _execute_insert, in execute, in _execute_on_connection, in _execute_clauseelement, in _execute_context, in _handle_dbapi_exception, in raise_from_cause, in reraise, in _execute_context, in do_execute – rmahesh Jun 12 '18 at 18:55
  • @rmahesh - This tells us nothing. Start a new question and at least post a traceback of the whole error. – zwer Jun 12 '18 at 19:20
  • @zwer Will do. I have to wait 90 minutes to make another post so it will be up shortly. – rmahesh Jun 12 '18 at 19:24