-1

Hello i am bit new on Dask and i am trying to do the following things

i have a CSV file I am reading file everything works fine

import pandas 
import os
import json

import math
import numpy as np

import dask
from dask.distributed import Client
import dask.dataframe as df
import dask.multiprocessing
client = Client(n_workers=3, threads_per_worker=4, processes=False, memory_limit='2GB')

df = df.read_csv("netflix_titles.csv")

now i have function


def toupper(x):
    return x.upper()

i would like to apply this to a column now the issue is want to save the result in same column seems like i cannot do that

df["title"].map(toupper).compute()

The following line works but i want


df["title"] = df["title"].map(toupper).compute()

ValueError: Not all divisions are known, can't align partitions. Please use set_index to set the index.

Image enter image description here

Soumil Nitin Shah
  • 634
  • 2
  • 7
  • 18

1 Answers1

0

Maybe try this after read_csv.

df.title = df.title.map(toupper)
df.to_csv("netflix_titles.csv", index=False, single_file=True)

to_csv has a optional argument with default valuecompute=True so you don't need to explicit do compute().

Ke Zhang
  • 937
  • 1
  • 10
  • 24
  • No still giving error or: Not all divisions are known, can't align partitions. Please use `set_index` to set the index. df.title = df["title"].map(toupper).compute() – Soumil Nitin Shah Jan 19 '21 at 12:58
  • @SoumilNitinShah Could you show me you csv file or maybe 10 lines of it? and which python and dask version are u using? – Ke Zhang Jan 19 '21 at 15:03
  • Thanks for getting back i am not sure if I can paste the image in the comment of how CSV looks like but here is my version of dask Version: 2020.12.0 – Soumil Nitin Shah Jan 19 '21 at 22:24
  • i have added the image to the description please check above image – Soumil Nitin Shah Jan 19 '21 at 22:30
  • @SoumilNitinShah Hey, I still cannot reproduce your error. However, I notice you are using jupyter. Could you double check you execute cells in the order you expect? And didn't rerun cells accidentally. – Ke Zhang Jan 20 '21 at 02:11
  • Hello sir thanks for getting back first of all i still have error here is my notebook with csv file on github https://github.com/soumilshah1995/Stackoverflow-issue- – Soumil Nitin Shah Jan 20 '21 at 12:27
  • I didn't call ``compute()`` in any part of my code. – Ke Zhang Jan 20 '21 at 15:59
  • First of all i would like to say thanks for taking out time and looking into issue very few people do that thanks my error is resolved – Soumil Nitin Shah Jan 20 '21 at 19:38