I have the following dataframe in R:
library(dplyr)
library(tsibble)
library(fpp3)
usda <- read.csv("https://raw.githubusercontent.com/rhozon/datasets/master/usda_data_stovwflw.csv", head = TRUE, sep = ";") |>
mutate(
Dates = case_when(
CalendarYear - MarketYear == 1 ~ paste(CalendarYear,"-", Month),
CalendarYear - MarketYear == 0 ~ paste(CalendarYear,"-", Month)
),
Dates = gsub(" ", "", Dates)
) |>
drop_na() |>
mutate(
Dates = yearmonth(Dates)
) |> arrange(Dates) |>
select(
Dates,
AttributeDescription,
Value
) |>
glimpse()
Rows: 3,898
Columns: 3
$ Dates <mth> 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 2010 jan, 20…
$ AttributeDescription <chr> "Production", "Area Harvested", "Yield", "Imports", "Exports", "Ending Stocks", "Total Distribution", "Beginning Stocks", "FSI Consumption", "TY Imports…
$ Value <int> 334052, 32225, 10, 254, 52072, 44817, 376810, 42504, 138945, 300, 279921, 0, 376810, 52000, 140976, 334052, 32225, 42504, 43674, 50802, 282334, 376810, …
Now I´m trying to filter the data by some variable:
usda |> filter(AttributeDescription == "Production")
Dates AttributeDescription Value
1 2010 jan Production 334052
2 2010 feb Production 334052
3 2010 mar Production 333533
4 2010 apr Production 333533
5 2010 may Production 339614
6 2010 may Production 333011
7 2010 jun Production 339614
8 2010 jun Production 333011
9 2010 jul Production 336438
10 2010 jul Production 333011
...
As we can see the may-2010 are repeated, but the value is different.
How can I filter this dataframe preserving only those months that appear first from top to bottom, discarding those below it repeating, considering the different variables available in col AttributeDescription
?