I have a DataFrame
with columns "id" and "date". date is of format yyyy-mm-dd here is an example:
+---------+----------+
| item_id| ds|
+---------+----------+
| 25867869|2018-05-01|
| 17190474|2018-01-02|
| 19870756|2018-01-02|
|172248680|2018-07-29|
| 41148162|2018-03-01|
+---------+----------+
I want to create a new column, in which each date is associated with an integer starting from 1. such that the smallest(earliest) date gets integer 1 , next(2nd earliest date) gets assigned to 2 and so on..
I want my DataFrame
to look like this... :
+---------+----------+---------+
| item_id| ds| number|
+---------+----------+---------+
| 25867869|2018-05-01| 3|
| 17190474|2018-01-02| 1|
| 19870756|2018-01-02| 1|
|172248680|2018-07-29| 4|
| 41148162|2018-03-01| 2|
+---------+----------+---------+
Explanation:
2018 jan 02 date comes the earliest hence its number is 1. since there are 2 rows with same date, therefore 1 is located twice. after 2018-01-02 the next date comes as 2018-03-01 hence its number is 2 and so on... How can I create such column ?