I have a parquet dataset stored in s3 and I want to read it to apply a filter to the partition field, specifically the unique. I was trying as follows, however the unique function cannot be applied
Here's my attempt:
query_fecha_dato = "{0}fecha_dato={1}/".format(param.delivery["output_path"], fecha_dato_formato)
print(query_fecha_dato)
df_fecha_datos = wr.s3.read_parquet(path=query_fecha_dato,dataset=True,filters=[('fecha_dato','unique',fecha_dato)])
print(df_fecha_datos.head(5))
It should show only the partition column "fecha_dato", however it shows the following:
nro_de_pedido nro_de_negocio ... nrootchex ingest_date
0 2006968078 635922336 ... -1 2022-08-06
1 2006968079 635912195 ... -1 2022-08-06
2 2006968080 635921361 ... -1 2022-08-06
3 2006968081 635922792 ... -1 2022-08-06
4 2006968082 635922368 ... -1 2022-08-06
I want to obtain only the partition column "fecha_dato" without duplicates