1

I have created a datatable frame as,

DT_EX = dt.Frame({'sales':[103.07, 47.28, 162.15, 84.47, 44.97, 46.97, 34.99, 9.99, 29.99, 64.98],
                  'quantity':[6, 2, 8, 3, 3, 3, 1, 1, 1, 2],
                  'customer_lifecycle_status':['Lead','First time buyer','Active customer','Defecting customer','
                                             'Lead','First time buyer','Lead','Lead','Lead','Lead']})

Now I'm trying to select only 2 fields from the datatable as,

DT_EX[:, f.sales, f.quantity]

In this case, It is displaying the data from quantity to sales whereas it should display them in the specified order(sales,quantity). and here another observation from this output is that- quantity fields gets sorted in an ascending order.

Keeping this case a side, Now I have tried to pass the required fields in parenthesis as

DT_EX[:, (f.sales,f.quantity)]

Here It is now producing the correct output without any sorting/jumbled of fields

It's always recommended to pass the fields to be selected in parenthesis.

Finally, I would be interested to know what has happened in the first case? , would you please explain it clearly ?.

Pasha
  • 6,298
  • 2
  • 22
  • 34
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30

1 Answers1

1

The primary syntax of datatable is

                            DT[i, j, by, ...]

That is, when you write a sequence of expressions in the square brackets, the first one is interpreted as i (row filter), the second as j (column selector), the third as by (group-by variable).

Normally, you would use a by() function to express a group-by condition, but the old syntax allowed to specify a bare column name in the third place in DT[], and it was interpreted as a group-by variable. Such use is considered deprecated nowadays, and may eventually get removed, but at least for now it is what it is.

Thus, when you write DT_EX[:, f.sales, f.quantity], the quantity column is interpreted as a group by condition (and since j does not have any reduction operations, it works essentially as a sort). Another effect of using a grouping variable is that it is moved to the front of the resulting frame, which essentially means you'll see columns (quantity, sales) in the "opposite" order of how they were listed.

If all you need is to select 2 columns from a frame, however, then you need to make sure those 2 columns are both within the j position in the list of arguments to DT[...]. This can be done via a list, or a tuple, or a dictionary:

DT_EX[:, [f.sales, f.quantity]]
DT_EX[:, (f.sales, f.quantity)]
DT_EX[:, {"SALES": f.sales, "QUANT": f.quantity}]
Pasha
  • 6,298
  • 2
  • 22
  • 34