31

Maybe a very naive question, but I am stuck in this: pandas.Series has a method sort_values and there is an option to do it "in place" or not. I have Googled for it a while, but I am not very clear about it. It seems that this thing is assumed to be perfectly known to everybody but me. Could anyone give me some illustrative explanation how these two options differ each other for dummies...?

Thank you for any assistance.

Om Prakash
  • 2,675
  • 4
  • 29
  • 50
Karel Macek
  • 1,119
  • 2
  • 11
  • 24
  • 2
    The `inplace` parameter is a more generic term w.r.t pandas and not specific to sort_values alone. You can see it in several functions like pd.fillna, pd.replace etc. Whenever the `inplace` is set to True, it modifies the existing data frame and you need not assign it to a new data frame. – Dileep Kumar Patchigolla Jan 21 '17 at 08:53

4 Answers4

17

Here an example. df1 will hold sorted dataframe and df will be intact

import pandas as pd
from datetime import datetime as dt
df = pd.DataFrame(data=[22,22,3],
                  index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
                  columns=['foo'])

df1 = df.sort_values(by='foo')
print(df, df1)

In the case below, df will hold sorted values

import pandas as pd
from datetime import datetime as dt

df = pd.DataFrame(data=[22,22,3],
                  index=[dt(2016, 11, 10, 0), dt(2016, 11, 10, 13), dt(2016, 11, 13, 5)],
                  columns=['foo'])

df.sort_values(by='foo', inplace=True)
print(df)
Peter Pearman
  • 129
  • 1
  • 10
Alexey Smirnov
  • 2,573
  • 14
  • 20
8

As you can read from the sort_values document, the return value of the function is a series. However, it is a new series instead of the original.

For example:

import numpy as np
import pandas as pd

s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
print(s)
a   -0.872271
b    0.294317
c   -0.017433
d   -1.375316
e    0.993197
dtype: float64

s_sorted = s.sort_values()

print(s_sorted)

d   -1.375316
a   -0.872271
c   -0.017433
b    0.294317
e    0.993197
dtype: float64

print(id(s_sorted))
127952880

print(id(s))
127724792

So s and s_sorted are different series. But if you use inplace=True.

s.sort_values(inplace=True)
print(s)
d   -1.375316
a   -0.872271
c   -0.017433
b    0.294317
e    0.993197
dtype: float64

print(id(s))
127724792

It shows they are the same series, and no new series will return.

SSC
  • 1,311
  • 5
  • 18
  • 29
  • Not OP but thanks a lot for your answer. Is it similar as the idea of View and Copy or am I confusing two completely different problems? Thanks. – Bowen Liu Oct 25 '18 at 14:45
  • @BowenLiu I am not sure what you mean by View and Copy. From my understanding, `inplace` means the existing data is modified, and without `inplace`, the existing data is copied to another place and some action is performed on the new data. So the existing data is preserved. – SSC Oct 26 '18 at 02:22
  • Thank you so much. This is all I need to understand the concept and the process! – Bowen Liu Oct 26 '18 at 13:41
2

inplace = True changes the actual list itself while sorting.
inplace = False will return a new sorted list without changing the original.
By default, inplace is set to False if unspecified.

-1

"inplace=True" is more like a physical sort while "inplace=False" is more like logic sort. The physical sort means that the data sets saved in the computer is sorted based on some keys; and the logic sort means the data sets saved in the computer is still saved in the original (when it was input/imported) way, and the sort is only working on the their index. A data sets have one or multiple logic index, but physical index is unique.

Leon
  • 444
  • 2
  • 15