1

I am sorting my data by the student id number. when I am explicitly giving the arguement inplace=True. I am getting following error-

  ValueError: This Series is a view of some other array, to sort in-place you must create a copy

I want to save the sorted data to a file so I can not make inplace=False. I dont understand why its showing the error. here is my code-

    df = pd.read_csv('/home/user/Documents/MOOC dataset test/students_info_assessment2.csv')
    df = df.id_student.sort_values(inplace=True)
    df = pd.DataFrame(df)
    df.to_csv('/home/user/Documents/MOOC dataset test/students_info_assessment_sorted.csv')

what should I do?

Sri991
  • 381
  • 1
  • 4
  • 17
  • @jezrael I saw the post but it does not explain why the error is occurring when I have explicitly given the argument for inplace=True – Sri991 Jul 25 '18 at 06:23
  • You have a DataFrame which generally has multiple columns. If you choose one column of that DataFrame (`df['id_student']`) and try to sort it inplace, the id's will be sorted but if you have, let's say, the grades of those students in other columns, those will not be sorted so you will have an incorrect DataFrame. The error is preventing that mistake. – ayhan Jul 25 '18 at 06:30
  • 1
    The correct thing to do, if you are trying to sort by `id_student` is either `df.sort_values(by='id_student', inplace=True)` or `df = df.sort_values(by='id_student')`. This will ensure the corresponding rows will be sorted. – ayhan Jul 25 '18 at 06:32
  • ok, I did that but now the resulting dataframe is empty. why is that? – Sri991 Jul 25 '18 at 06:33
  • You should either use the inplace argument or assign it back. `df.sort_values(by='id_student', inplace=True)` this changes the original DataFrame but returns None. If you assign this to `df` your DataFrame will be equal to None. I'd suggest deleting those lines and re-running your script with the correct code. – ayhan Jul 25 '18 at 06:39

1 Answers1

1

You can use this:

df = df.sort_values(by=['id_student'])
Joe
  • 12,057
  • 5
  • 39
  • 55
  • ya but when I am saving the data to new file It is same it is not sort. so I am explicitly giving the argument (inplace=True). but for this i am getting error (given in the question) – Sri991 Jul 25 '18 at 06:19
  • 1
    Declare `df= df.sort_values(by=['id_student'])` or `df.sort_values(by=['id_student'], inplace=True)` is the same. I think you should remove the 3rd line : `df=pd.DaTaFrame(df)` – Joe Jul 25 '18 at 06:22
  • when I remove the third line this the error-AttributeError: 'NoneType' object has no attribute 'to_csv'. – Sri991 Jul 25 '18 at 06:26
  • Try with: `df.sort_values('id_student', inplace=True)`. Maybe this can help https://stackoverflow.com/questions/36160019/sort-series-with-pandas-in-python – Joe Jul 25 '18 at 06:28
  • I tried the above one-df.sort_values(by=['id_student']) but now the resultant dataframe is empty! what is happening here? – Sri991 Jul 25 '18 at 06:35
  • Can you edit the question with the result of this line: `df = pd.read_csv('/home/user/Documents/MOOC dataset test/students_info_assessment2.csv')` and the desired output? – Joe Jul 25 '18 at 06:38
  • ok I will post a new question – Sri991 Jul 25 '18 at 06:38
  • You can modify also this one – Joe Jul 25 '18 at 06:39
  • You have None because you did: `df = df.sort_values(by=['id_student'], inplace=True)`. do or: `df.sort_values(by=['id_student'], inplace=True)` or `df = df.sort_values(by=['id_student'])` – Joe Jul 25 '18 at 06:56