-1

I'm using Google Cloud Platform DataFusion products.

Does it supposed to put a regular expression in the Regex Path Filter part in the Advanced section of the GCS Properties? e.g) /[0-9]

But, If i enter a value in the Regex Path Filter and run the data pipeline, "Output records have not been generated for stage GCS. Please verify your logic, or try sending more data."

I would appreciate it if you could give me an example of how to write in the Regex Path Filter secion.

Thank you for reading.

Quack
  • 680
  • 1
  • 8
  • 22

1 Answers1

1

Currently, there is an Open issue in CDAP for updating its documentation about Regex Path Filter field, here.

The Regex Path Filter is used only to filter files, using Regex according to this documentation.

For example, you can write gs://data_directory/*/file_prefix* to filter the documents by file prefix or gs://data_directory/.*\.csv to filter the files by extension. Whereas Path points to GCS directory, such as gs://data_directory.

Kabilan Mohanraj
  • 1,856
  • 1
  • 7
  • 17
Alexandre Moraes
  • 3,892
  • 1
  • 6
  • 13
  • Can't I filter the data_directory? – Quack Oct 15 '20 at 01:00
  • To apply txt only to filter Added filter value as gs://data_directory/.*\.txt. But I keep getting the error statement as below. "Output records have not been generated for stage "GCS". Please verify your logic, or try sending more data." – Quack Oct 15 '20 at 06:05
  • Can you describe more about your process? Did you add the filter in the GCS Source Properties? – Alexandre Moraes Oct 16 '20 at 12:27
  • 1
    Thank you for your response. First I entered `/.*\.txt` in the **regex path filter** entry box of GCS properties. The intention is to filter txt files only. But I don't think it's working well. **Path** is listed as `gs:data_directory/` . @Alexandre Moraes – Quack Oct 17 '20 at 12:12
  • I understand, so you are using the GCS connector as a source. Can you set **Path** to gs://data_directory ? Also, can you share the error you with your failed pipeline? – Alexandre Moraes Oct 19 '20 at 08:12
  • Errors don't come out. However, if I create a data pipeline, i cannot check the values of Out and In. I had a hunch that this would not carry out the data pipeline. – Quack Oct 20 '20 at 02:34
  • If I understood correctly your pipeline is executed successfully. Right? Can you elaborate more about what is your process? Also, if the output of the pipeline is correct? – Alexandre Moraes Oct 20 '20 at 08:01