Questions tagged [modin]

Modin is a project to speed up pandas workflows only by changing a single import statement.

Modin is a project to speed up pandas workflows only by changing a single import statement. Peruse the documentation at https://modin.readthedocs.io/.

80 questions
0
votes
1 answer

Modin pandas / modin.db_conn database connection error (Unsupported database library) (UPDATED)

When using pandas, I can connect to import sqlalchemy as db db.create_engine('sqlite:///C:\db\PositionTrackDB.db') Now, I am trying to replace pandas with modin.pandas and work with databases. But no matter what I try, I always get the error of an…
Dalalama231123
  • 101
  • 1
  • 1
  • 7
0
votes
1 answer

import modin.pandas causes ERROR: AttributeError: type object 'pyarrow.lib.Message' has no attribute '__reduce_cython__'

Issue I have installed conda install -c conda-forge modin When I import import modin.pandas as pd I get an error message Tried solutions Similar to but different framework, different use case and slightly different error message - “has no…
sogu
  • 2,738
  • 5
  • 31
  • 90
0
votes
0 answers

How do I fix modin pandas?

Basically my code in python is this: import os from distributed import Client client = Client() os.environ["MODIN_ENGINE"] = "dask" import modin.pandas as pd when I run it on python it says dask execution environment not yet initialized. What is…
0
votes
2 answers

Modin - ModuleNotFoundError: No module named 'ray'

I'm trying to use Modin on Databricks and getting this error I've tried both pip install modin[all] and pip install modin[ray] Firstly, the installation takes 15 minutes, which is weird. After installing, I'm doing import modin.pandas as md df =…
Vishal Balaji
  • 667
  • 3
  • 11
0
votes
4 answers

Is it possibe to change similar libraries (Data Analysis) in Python within the same code?

I use the modin library for multiprocessing. While the library is great for faster processing, it fails at merge and I would like to revert to default pandas in between the code. I understand as per PEP 8: E402 conventions, import should be declared…
Rander
  • 94
  • 8
0
votes
1 answer

Using modin provides different results compared to Pandas default

I am getting different results when I use pandas within modin and when using pandas default print(selection_weights.head()) country league Win DNB O 1.5 U 4.5 0 Africa Africa Cup of Nations 3.68 1.86 5.2 …
Harshad
  • 25
  • 6
0
votes
1 answer

Improving Python code performance when comparing strings using Levenshtein in pandas

I have this code that functions properly and produces the result I am looking for: from thefuzz import fuzz import pandas as pd df = pd.read_csv('/folder/folder/2011_05-rc.csv', dtype=str, lineterminator='\n') df_compare = pd.DataFrame( …
Octner
  • 61
  • 8
0
votes
1 answer

Modin[dask] on Apple M1 chip

I have successfully installed modin[dask] with conda on my Apple M1 chip MacBook Pro, but when I run the code, I got the below errors: AttributeError: 'NoneType' object has no attribute 'ncores'. The below is pip list(Python 3.10.4): dask …
Joycode
  • 31
  • 1
0
votes
1 answer

How do I resolve localRayletDiedError when using Modin with pandas?

I am trying to use the Modin engine to process a large dataframe: df.head(20): Unnamed: 0 game score home_odds draw_odds away_odds country league datetime 0 0 Sport Recife - Imperatriz …
leonardo
  • 140
  • 10
0
votes
1 answer

How to figure out if a modin dataframe is going to fit in RAM?

Im learning how to work with large datasets, so im using modin.pandas. I'm doing some aggregation, after which a 50GB dataset is hopefully going to become closer to 5GB in size - and now i need to check: if the df is small enough to fit in RAM, i…
0
votes
0 answers

KeyError making pandas dataframe

I am trying to make find the equation of a function using pandas dataframe. This has worked in the past on other projects, however, now nothing seems to work. I am aware that there might be easier ways to solve this, but i need this to work…
0
votes
1 answer

Pandas Modin ray library fails to startup

I am trying to accelerate my pandas data processing using modin import os os.environ["MODIN_ENGINE"] = "ray" import modin.pandas as pd df = pd.read_csv(r"C:\Users\Harshad\Documents\Files\Data\Pre-processed\data.csv", low_memory=False) I get the…
leonardo
  • 140
  • 10
0
votes
0 answers

Unable to catch Ray Task Errors when using modin pandas

I am trying to check if a floating column is actually an int column before converting it to string column, (exact use case: 123.00 needs to be 123, '123-4' needs to remain '123-4'). Code: # series: modin pandas Series try: # If it's…
Susmit
  • 336
  • 1
  • 4
  • 12
0
votes
1 answer

Writing a dataset to multiple directories with modin and Ray pauses unexplainably

Problem I am trying to perform IO operations with multiple directories using ray, modin(with ray backend) and python. The file writes pause and the memory and disk usages do not change at all and the program is blocked. Setup I have a ray actor set…
AvidJoe
  • 596
  • 5
  • 19
0
votes
1 answer

how to pass the sqlalchemyconnection string for oracle modin read_sql

UserWarning: To use parallel implementation of `read_sql`, pass the sqlalchemyconnection string instead of . I get gettign this error using this string 'oracle://username:password@server:1521/SID' I also tried…
user1082748
  • 365
  • 1
  • 7
  • 18