Questions tagged [modin]

Modin is a project to speed up pandas workflows only by changing a single import statement.

Modin is a project to speed up pandas workflows only by changing a single import statement. Peruse the documentation at https://modin.readthedocs.io/.

80 questions
1
vote
0 answers

Modin Pandas and Dask Does nothing but hang

I am trying to decipher why this is just hanging with modin and works fine with regular pandas: import modin.pandas as pd infile1 = 'D:\\test_files\\curves_crosstab.csv' infile2 = 'D:\\test_files\\8760_crosstab.csv' infilenames = [infile1,…
Shenanigator
  • 1,036
  • 12
  • 45
1
vote
1 answer

merging two pandas data frames with modin.pandas gives ValueError

In an attempt to make my pandas code faster I installed modin and tried to use it. A merge of two data frames that had previously worked gave me the following error: ValueError: can not merge DataFrame with instance of type
Boris
  • 716
  • 1
  • 4
  • 25
1
vote
1 answer

Modin read_csv issue

I'm attempting to read a csv file using modin and it results in the following error. this issue seems to happen on all dataframe operations: RayWorkerError: The worker died unexpectedly while executing this task. Python 3.7.3 Pandas 0.24.2 Modin…
DACW
  • 2,601
  • 2
  • 18
  • 16
1
vote
1 answer

How to append a Modin pandas dataframe to other?

I am working on performing calculations on large files around 6GB each file and came across Modin pandas which I heard optimized compared to pandas. I need to read a CSV file in chunks and perform calculations on that and append it to a big…
Underoos
  • 4,708
  • 8
  • 42
  • 85
1
vote
1 answer

Join two modin.pandas.DataFrame(s)

I have attempted to join/merge/concat two modin.pandas DataFrames and failed. Has anyone been successful in performing this operation? This is the big data modin-project pandas implementation. The source is…
MyopicVisage
  • 1,333
  • 1
  • 19
  • 33
0
votes
1 answer

Intel Server Unavailable after executing the code

I am on intel dev cloud and using Intel OneAPI. This is my code till now: # first block of jupyter notebook import modin.pandas as pd # second block of jupyter notebook df = pd.read_csv('dataset/dataset.csv') df.head() # output of second…
Adarsh Wase
  • 1,727
  • 3
  • 12
  • 26
0
votes
1 answer

Installing Modin Pandas in Linux (CentOS)

I am trying to install Modin on a shared computer which runs linux where I can get access to the terminal and jupyter. I created a virtual environment in Conda and used the steps mentioned in the official documentation for Modin. However when I…
0
votes
1 answer

RAY workers being killed because of OOM pressure

I am using modin in combination with ray to read a huge csv file (56GB with 1,5 billion rows). I sorted the data beforehand using linux sort. The following code results in multiple workers being killed due to out of memory pressure and I doubt that…
Ranger
  • 75
  • 7
0
votes
1 answer

How to make this for loop ready for Pandas/Modin/Ray

I have a semi-complex for loop which has to be applied row by row (I guess). I've read the information in e.g. 1. However, I cannot wrap my head around how I would create a dictionary using these options. Running the current loop on the dataset…
Ranger
  • 75
  • 7
0
votes
2 answers

Why does df.shift() not work when using modin?

In the following example code I am trying to use the df.shift() function which pandas normally executes flawlessly. However, when using modin, the .shift() function ceases to work. Is there any way to fix this? import modin.pandas as pd import…
user15070504
  • 9
  • 1
  • 4
0
votes
0 answers

Scalable way to get data ready / into pandas or consorts

I have around 600GB of csv files, around 1 billion lines, stored in around 80 million text files. For performing additional analysis, specifically network analysis, I would have to first aggregate some of the data and then do the analysis building…
Ranger
  • 75
  • 7
0
votes
1 answer

Reading sas7bdat large file using modin panda: FactoryDispatcher.read_sas() takes 1 positional argument but 2 were given

I want to read a large file in jupyter notebook. (can not read using pandas becuase of the memory constraints). The datafile requres over 35 GB memory but my space has only 20 GB. Therefore, I tried to use modin panda instead but occured…
0
votes
0 answers

TypeError when using modin with pd.cut(df[column],300)

I first sub in Modin for Pandas for the benefit of distributed work over multiple cores: import modin.pandas as pd from modin.config import Engine Engine.put("dask") After initializing my dataframe, I attempt to use: df['bins'] =…
bconsolvo
  • 21
  • 5
0
votes
1 answer

TypeError: 'LocalFileOpener' object is not iterable

I have a huge dataset with millions of entries (It is a normal .csv file and I get no errors when I load it with pandas). Pandas struggles when trying to load the dataset (.csv), so I decided to use modin, which apparently allows you to use multiple…
CozyCode
  • 484
  • 4
  • 13
0
votes
0 answers

String methods fail with Modin, but same work with Pandas

I'm currently trying to improve processing speed on several large log files, to extract some metrics to then store on a Postgres database. Currently, I'm just trying the first step, which is, simply filtering only relevant lines of the log after…