1

I'm using TFX (more precisely TensorFlow Data Validation) with the infer_schema method documented there https://www.tensorflow.org/tfx/data_validation/api_docs/python/tfdv/infer_schema. It generates a schema from a csv file describing column types.

It works well on Float, Bytes, categories... But I would also like to detect Dates. I haven't found it in tutorials or guides. The proto message that is generated supports Dates, so that would not be an issue (see TimeDomain). https://github.com/tensorflow/metadata/blob/master/tensorflow_metadata/proto/v0/schema.proto

I tried with a CSV file with that format (non-US date format), it is recognized as Byte :(

date, amount
15/08/2001, 0.3120682494
16/08/2001, 0.9310268917
17/08/2001, 0.902986235

The code is the same as in the tutorial, so more or less:

train_stats = tfdv.generate_statistics_from_csv(data_location="/content/csv_with_dates.csv")
schema = tfdv.infer_schema(statistics=train_stats)
tfdv.display_schema(schema=schema)

which displays:

Type    Presence    Valency Domain
Feature name                
'date'  BYTES   required        -
'amount'    FLOAT   required        -

Could I make it work? How?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Pixou
  • 1,719
  • 13
  • 23

1 Answers1

0

Not at the moment maybe in an upcoming version. if you check the link that you've mentionned you'll find that features support the following types (dates are not included):

enum FeatureType {
  TYPE_UNKNOWN = 0;
  BYTES = 1;
  INT = 2;
  FLOAT = 3;
  STRUCT = 4;
}
Amine_h
  • 119
  • 6