0

How can I write the following code in java? If I have list of records/dicts in java how can I write the beam code to write them in tfrecords where tf.train.Examples are serialized. There are lot of examples to do that with python, below is one example in python, how can I write the same logic in java ?

import tensorflow as tf
import apache_beam as beam
from apache_beam.runners.interactive import interactive_runner
from apache_beam.coders import ProtoCoder

class Foo(beam.DoFn):
  def process(self, element, *args, **kwargs):
    import tensorflow as tf

    foo = element.get('foo')
    bar = element.get('bar')

    feature = {
      "foo":
        tf.train.Feature(bytes_list=tf.train.BytesList(value=[foo.encode('utf-8')])),
      "bar":
        tf.train.Feature(bytes_list=tf.train.BytesList(value=[bar.encode('utf-8')]))
    }
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    yield example_proto

p = beam.Pipeline(runner=interactive_runner.InteractiveRunner())

records = p | "Create records" >> beam.Create([{'foo': 'abc', 'bar': 'pqr'} for _ in range(10)])
tf_examples = records | "Convert to tf examples" >> beam.ParDo(Foo())
tf_examples | "Dump Records" >> beam.io.WriteToTFRecord(file_path_prefix="./output/data-",
                                                    coder=ProtoCoder(tf.train.Example()),
                                                    file_name_suffix='.tfrecord', num_shards=2)

p.run()
Jayendra Parmar
  • 702
  • 12
  • 30
  • You can try referring https://www.tensorflow.org/install/lang_java to use Tensorflow in Java and use https://beam.apache.org/get-started/quickstart-java/ to write java pipeline. – Ankur Apr 16 '20 at 21:04
  • I can't believe writing a TFRecord from PCollection is so involved. I have this issue and i can't believe this is the only piece of code I found (at the time of writing) on how to do this in python. Is there not a better, more direct way to convert from one to the other? Have you found one? thanks – DarioB Apr 06 '22 at 10:04

1 Answers1

0

I have attempted this with java but I am still getting some issues, The link to new to question is here Write tfrecords from beam pipeline?.

Jayendra Parmar
  • 702
  • 12
  • 30