0

I'm new to Polars. I'm trying to get data from MySQL using polars for one of my project. here I have made a connection and trying to retrieve some sample data from a table.

enter image description here

on executing this query it returns some of index data not the actual data from table. what could be the reason for it ? it should actually give an output as this:

enter image description here

please guide me how to make a connection in between MySQL and Polars. because i have about 250M records stored in a table on which i have to do some of data analysis I have tried this command using pandas, pymysql:

df_pol = pl.from_pandas(pd.read_sql(f'SELECT *FROM complete_list_rolled_up',con))

here I have got an Memory Error while reading data from data using pandas statement .

In an another case: using connectorX to retrieve data from MySQL as follows:

cx.read_sql(uri,'SELECT *FROM complete_list_rolled_up limit 5')

I'm able to read the same table using connectorX as above. but in output it shows b in every entry such as b'unknown', b'49499494' etc etc.

enter image description here

What could be the reasons for getting this kind of outputs reading the data from MySQL using polars/connectorX?

Any idea on making DB connections between MySQL and Polars?

kmdreko
  • 42,554
  • 6
  • 57
  • 106
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
  • Have you tried adding `protocol= text` in `read_sql` function as described here: https://github.com/sfu-db/connector-x/discussions/165 –  Aug 19 '22 at 18:27
  • Yes i have used protocol as text and but still the same output as indexes – myamulla_ciencia Aug 20 '22 at 01:26
  • That's odd.. Not sure if this is related or not but read_sql takes a connection object not a connection string. Try making a conne tion object with sqlalchemy and passing that connection variable to read_sql – rayad Aug 20 '22 at 20:11
  • ```my_sql_engine = create_engine( url="mysql+pymysql://{0}:{1}@{2}:{3}/{4}".format( db_user_name , db_pass_word, db_host, db_port, db_name )) ``` ```with my_sql_engine.connect() as con: pl.read_sql('SELECT *FROM bolton_complete_list_rolled_up limit 5',con)``` on executing it is gives an error: AttributeError: 'Connection' object has no attribute 'split' – myamulla_ciencia Aug 21 '22 at 06:09
  • @rayad do you have any updates/comments on it ? – myamulla_ciencia Aug 21 '22 at 21:06
  • @cbilot do you have any comments on it? Can you please try to connect mySql using poalrs and check how the output of table is ?? – myamulla_ciencia Aug 22 '22 at 05:09
  • Interesting error. Can't say for certain without the full traceback but I think that will be coming from within pandas itself. But before we do that i think you don't have to use .connect() in the with statement. Try removing it and try again – rayad Aug 22 '22 at 16:59
  • @rayad i tried without mentioning .connect() and this time it is giving an error: ``` AttributeError: __enter__ ``` – myamulla_ciencia Aug 23 '22 at 01:03
  • @rayad could you please connect to mysql at your end and test it ? – myamulla_ciencia Aug 23 '22 at 01:04
  • This could be caused by columns which are of VARBINARY type. What are the types of the columns in your MySQL table? Can you include the output of `DESCRIBE ` in your question? – Nick ODell Oct 13 '22 at 19:37

0 Answers0