I'm running a simple penguin pipeline in interactive mode with a split train/eval, the transform step run but i can't get post_transform_statistics artifacts.
Inside the dedicated artifacts folder /tmp/tfx-penguin_custom_INTERACTIVE-nq5dn56x/Transform/post_transform_stats/5
, i have just one FeaturesStats.pb
inside, but not subfolders Split-train
and Split-eval
with a FeaturesStats.pb
inside each.
However, I have the subfolders inside artifacts dedicated to transformed examples (/tmp/tfx-penguin_custom_INTERACTIVE-nq5dn56x/Transform/transformed_examples/5/
).
Here is how i define the transform components by explicitly providing splits and also disable_statistics=False
:
transform = tfx.components.Transform(
examples=example_gen.outputs['examples'],
schema=schema_gen.outputs['schema'],
disable_statistics=False,
splits_config= transform_pb2.SplitsConfig(
analyze=['train'], transform=['train', 'eval']),
module_file=_transformer_module_file)
I went to the docstring and even the __init__
of the component https://github.com/tensorflow/tfx/blob/master/tfx/components/transform/component.py
, it seems there is nothing i would have forgotten or mistaken but i was very disturbed to read following comment with an untraceable location for stats....
disable_statistics: If True, do not invoke TFDV to compute pre-transform
and post-transform statistics. When statistics are computed, they will
will be stored in the `pre_transform_feature_stats/` and
`post_transform_feature_stats/` subfolders of the `transform_graph`
export.
For now, the workaround is to explicitly disable stats in the transform component and define next to it, a dedicated statistics components to work on transformed features splits but it would have been great to have the splits statistics inside transform component directly.
Thanks for any help