2

I have a huge database stored in Bigtable in GCP. I am migrating the bigtable data from one account to another GCP Account using DataFlow. but, when I created a job to create a sequence file from the bigtable it has created 3000 sequence files on the destination bucket. so, it is not possible to create a single dataflow for each 3000 sequence file so, Is there any way to reduce the sequence files or a way to provide the whole 3000 sequence files at once in a Data Flow Job template in GCP

We have two sequence file wanted to upload data sequentially one after another(10 rows and one column), but actually getting result uploaded(5 rows and 2 columns)

avnshrai
  • 73
  • 6

1 Answers1

1

The sequence files should have some sort of pattern to their naming e.g. gs://mybucket/somefolder/output-1, gs://mybucket/somefolder/output-2, gs://mybucket/somefolder/output-3 etc.

When running the Cloud Storage SequenceFile to Bigtable Dataflow template set the sourcePattern parameter to the prefix of that pattern like gs://mybucket/somefolder/output-* or gs://mybucket/somefolder/*

Billy Jacobson
  • 1,608
  • 2
  • 13
  • 21
  • Thank you for your response, it is working but not getting the output as expected, we have two sequence file wanted to upload data sequentialy one after another(10 rows and one column), but actually getting result uploaded(5 rows and 2 columns) – avnshrai Sep 07 '21 at 13:36
  • added a image for your reference – avnshrai Sep 07 '21 at 13:42
  • If your row keys have the same name, then they will be treated as the same row. Are you sure you have 10 unique rowkeys? – Billy Jacobson Sep 07 '21 at 13:45
  • thanx for your help, now it's working perfect :-) – avnshrai Sep 07 '21 at 14:28