1
receptor year month day hour hour.inc lat lon height pressure date
1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00
1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00
1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00
1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00
1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00
1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00
1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00
1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00
1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00
1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00
2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00
2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00
2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00
2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00

Above is what my dataframe looks like but I am trying to create a new column called date1 and will look like the frame below.

 receptor year month day hour hour.inc    lat      lon height pressure                date       date1
1         1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
2         1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
3         1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
4         1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
5         1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
6         1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
7         1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
8         1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
9         1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
10        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
38        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
39        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
40        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
41        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 2018-01-04 16:00:00 2018-01-04 19:00:00

Disregard the index furthest to the left. I want to match the receptor (Ex:1,2) with the first occurrence of the date (Ex: 2018-01-03 19:00:00,2018-01-04 19:00:00) and then repeat till the receptor changes.

I'm working in R so I'd like to find a solution in R but I could also use a python solution and make use of the Reticulate package in R.

Pygirl
  • 12,969
  • 5
  • 30
  • 43
bob0901
  • 65
  • 5

3 Answers3

1

Using data.table you can try

library(data.table)
setDT(df) #converting into data.frame 
df[,date1 := date[1],receptor] # taking the first date per receptor
df

#Output 

    receptor year month day hour hour.inc    lat      lon height pressure                date               date1
 1:        1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
 2:        1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
 3:        1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
 4:        1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
 5:        1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
 6:        1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
 7:        1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
 8:        1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
 9:        1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
10:        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
11:        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
12:        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
13:        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
14:        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
Chriss Paul
  • 1,101
  • 6
  • 19
  • This is clean. Really impressed my Team Lead today. I don't work with extremely large datasets so I don't use Datatable much. But this is the second solution I have seen to a problem using DT. I did not take credit for this btw. Said that boi Chriss Paul got me. – bob0901 Mar 24 '21 at 22:27
  • Thank you @bob0901. Sometimes I find `data.table` so handy for this kind of problems that is always my first choice, and actually I am able to handle large datasets with it too. – Chriss Paul Mar 24 '21 at 23:07
0

Try filling location of the unchanged value with np.nan and changed value location with date (of that index) and then simply do forward fill using .ffill()

df.receptor.shift().ne(df.receptor) will give you where the receptor value changes. compare the previous and current value to see the change.

df['date1'] = np.where(df.receptor.shift().ne(df.receptor), df.date, np.nan)
df.date1 = df.date1.ffill()

receptor year month day hour hour.inc lat lon height pressure date date1
0 1 2018 1 3 19 0 31.768 -106.501 500.0 835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
1 1 2018 1 3 18 -1 31.628 -106.350 508.8 840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
2 1 2018 1 3 17 -2 31.489 -106.180 526.2 839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
3 1 2018 1 3 16 -3 31.372 -105.974 547.6 836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
4 1 2018 1 3 15 -4 31.289 -105.731 555.3 829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
5 1 2018 1 3 14 -5 31.265 -105.462 577.8 812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
6 1 2018 1 3 13 -6 31.337 -105.175 640.0 793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
7 1 2018 1 3 12 -7 31.492 -104.897 645.6 809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
8 1 2018 1 3 11 -8 31.671 -104.700 686.8 801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
9 1 2018 1 3 10 -9 31.913 -104.552 794.2 795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
10 2 2018 1 4 19 0 31.768 -106.501 500.0 830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
11 2 2018 1 4 18 -1 31.904 -106.635 611.5 819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
12 2 2018 1 4 17 -2 32.070 -106.749 709.7 808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
13 2 2018 1 4 16 -3 32.223 -106.855 787.3 794.9 2018-01-04 16:00:00 2018-01-04 19:00:00
Pygirl
  • 12,969
  • 5
  • 30
  • 43
  • 1
    Be sure to avoid period qualifiers for column referencing to avoid conflict with attributes (i.e., `df['receptor']`, `df['date1']`). See warning in [Attribute Access](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#attribute-access) docs. – Parfait Mar 24 '21 at 17:33
0

Consider base R's ave after calculating a Date column to return first date time per date grouping using head:

df <- within(df, {
  date_short <- as.Date(substr(as.character(date), 1, 10), origin="1970-01-01")
  first_dt_hour <- ave(date, date_short, FUN=function(x) head(x, 1))
  rm(date_short)   # DROP HELPER COLUMN
})

print(df)
#    receptor year month day hour hour.inc    lat      lon height pressure                date       first_dt_hour
# 1         1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 2018-01-03 19:00:00 2018-01-03 19:00:00
# 2         1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 2018-01-03 18:00:00 2018-01-03 19:00:00
# 3         1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 2018-01-03 17:00:00 2018-01-03 19:00:00
# 4         1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 2018-01-03 16:00:00 2018-01-03 19:00:00
# 5         1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 2018-01-03 15:00:00 2018-01-03 19:00:00
# 6         1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 2018-01-03 14:00:00 2018-01-03 19:00:00
# 7         1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 2018-01-03 13:00:00 2018-01-03 19:00:00
# 8         1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 2018-01-03 12:00:00 2018-01-03 19:00:00
# 9         1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 2018-01-03 11:00:00 2018-01-03 19:00:00
# 10        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 2018-01-03 10:00:00 2018-01-03 19:00:00
# 38        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 2018-01-04 19:00:00 2018-01-04 19:00:00
# 39        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 2018-01-04 18:00:00 2018-01-04 19:00:00
# 40        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 2018-01-04 17:00:00 2018-01-04 19:00:00
# 41        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 2018-01-04 16:00:00 2018-01-04 19:00:00

Data

data <- ' receptor year month day hour hour.inc    lat      lon height pressure                date
1         1 2018     1   3   19        0 31.768 -106.501  500.0    835.6 "2018-01-03 19:00:00" 
2         1 2018     1   3   18       -1 31.628 -106.350  508.8    840.5 "2018-01-03 18:00:00" 
3         1 2018     1   3   17       -2 31.489 -106.180  526.2    839.4 "2018-01-03 17:00:00" 
4         1 2018     1   3   16       -3 31.372 -105.974  547.6    836.8 "2018-01-03 16:00:00"
5         1 2018     1   3   15       -4 31.289 -105.731  555.3    829.8 "2018-01-03 15:00:00"
6         1 2018     1   3   14       -5 31.265 -105.462  577.8    812.8 "2018-01-03 14:00:00"
7         1 2018     1   3   13       -6 31.337 -105.175  640.0    793.9 "2018-01-03 13:00:00"
8         1 2018     1   3   12       -7 31.492 -104.897  645.6    809.2 "2018-01-03 12:00:00"
9         1 2018     1   3   11       -8 31.671 -104.700  686.8    801.0 "2018-01-03 11:00:00"
10        1 2018     1   3   10       -9 31.913 -104.552  794.2    795.8 "2018-01-03 10:00:00"
38        2 2018     1   4   19        0 31.768 -106.501  500.0    830.9 "2018-01-04 19:00:00"
39        2 2018     1   4   18       -1 31.904 -106.635  611.5    819.5 "2018-01-04 18:00:00"
40        2 2018     1   4   17       -2 32.070 -106.749  709.7    808.0 "2018-01-04 17:00:00"
41        2 2018     1   4   16       -3 32.223 -106.855  787.3    794.9 "2018-01-04 16:00:00"'

df <- read.table(text=data, 
                 colClasses=c(rep("integer", 7), rep("numeric", 4), "POSIXct"),
                 header=TRUE)
Parfait
  • 104,375
  • 17
  • 94
  • 125