0

Update

I can run below sql query in netezza database, but it goes wrong in sqldf package in R

> sqldf("SELECT TEXT, 
+ VEH_MAKE_NM, 
+ NEW_USED_CD, 
+ PRODUCT, 
+ OVERALL_SUBV_IND, 
+ AS_OF_DATE, 
+ CATEGORY,
+ ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY CATEGORY DESC) RN_CATEGORY,
+  SUBCATEGORY,
+ ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY SUBCATEGORY DESC) RN_SUBCATEGORY
+ FROM output
+ --GROUP BY 1,2,3,4,5,6")
Error in sqliteSendQuery(con, statement, bind.data) : 
  error in statement: near "(": syntax error

I think it might because sqldf package doesn't support netezza SQL. Is there a netezza sql package in R?

Thanks

Community
  • 1
  • 1
WayToNinja
  • 285
  • 4
  • 14
  • Hi @vkp, tried both `==` and `=`, neither of them works... – WayToNinja Sep 16 '15 at 17:58
  • Same... I tried `' '`, `" "` and ` `, all did not work – WayToNinja Sep 16 '15 at 18:01
  • is `CATEGORY` a column name in one of the tables? and if you use `""` around it, it will be case-sensitive – Vamsi Prabhala Sep 16 '15 at 18:02
  • I changed `"CATEGORY"` into `CATEGORY` and it turns out that the problem is in these two rows: `ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY CATEGORY DESC) RN_CATEGORY, SUBCATEGORY, ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY SUBCATEGORY DESC) RN_SUBCATEGORY` – WayToNinja Sep 16 '15 at 19:38
  • 1
    The only sqldf backend that supports partition is PostgreSQL. – G. Grothendieck Sep 17 '15 at 19:44
  • When I use RPostgreSQL package, do I need to set a database as well? Or I can just use its function. Cause when I run the code... it seems like asking for a database – WayToNinja Sep 17 '15 at 23:39
  • See sqldf FAQ#12 on the sqldf home page: https://github.com/ggrothendieck/sqldf – G. Grothendieck Sep 18 '15 at 02:02
  • Since this posts sqlite added support for partition. – G. Grothendieck Jun 07 '22 at 13:09

1 Answers1

0

Step 1. Add row number column into output dataframe:

output['RN_CATEGORY'] = output.sort_values(['CATEGORY'], 
    ascending=False).groupby(['TEXT', 'VEH_MAKE_NM', 'NEW_USED_CD', 'PRODUCT', 
    'OVERALL_SUBV_IND', 'AS_OF_DATE']).cumcount() + 1

output['RN_SUBCATEGORY'] =output.sort_values(['SUBCATEGORY'], 
    ascending=False).groupby(['TEXT', 'VEH_MAKE_NM', 'NEW_USED_CD', 'PRODUCT', 
    'OVERALL_SUBV_IND', 'AS_OF_DATE']).cumcount() + 1

Step 2.

sqldf("SELECT TEXT, 
    VEH_MAKE_NM, 
    NEW_USED_CD, 
    PRODUCT, 
    OVERALL_SUBV_IND, 
    AS_OF_DATE, 
    CATEGORY,
    RN_CATEGORY,
    SUBCATEGORY,
    RN_SUBCATEGORY
  FROM output
--GROUP BY 1,2,3,4,5,6")
lemon
  • 14,875
  • 6
  • 18
  • 38
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 02 '22 at 13:26