2

I've been getting the following error when I run context.run(example_gen) , and I like to understand what does it mean and how can I avoid getting the error. Please advice and thanks in advance!

Error: RuntimeError: Files in same split /home/jupyter/.../data/* have different header.

The data is a csv with headers of "A,B,C,D"

from  tfx.proto import example_gen_pb2

base_dir = '/home/jupyter/.../data/'
#Input has a single split 'input_dir/*
#Output 2 splits: train:eval=3:1.'
output = example_gen_pb2.Output(
            split_config=example_gen_pb2.SplitConfig(splits=[
                example_gen_pb2.SplitConfig.Split(name='train', hash_buckets=3),
                example_gen_pb2.SplitConfig.Split(name='eval', hash_buckets=1)
            ]))

examples = csv_input(os.path.join(base_dir))
example_gen = CsvExampleGen(input=examples, output_config=output)
LLTeng
  • 385
  • 3
  • 4
  • 15

1 Answers1

2

We've had the same error. In our case the directory also contained hidden files. To be more precise, a jupyter notebook checkpoint directory.

To fix this issue: make sure the directory only contains .csv files. No other (hidden) files.

Credits for this comment on github.

Pieter
  • 3,262
  • 1
  • 17
  • 27