2

I am creating a data fusion pipeline to load csv data from GCS to BigQuery for my use case i need to create a property macros and provide the value during runtime. Need to understand how we can pass the schema file as Macros to BigQuery sink. If i simply pass the json schema file path to Macros values i am getting the following error.

java.lang.IllegalArgumentException: Invalid schema: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 1

Mustaquim
  • 103
  • 5

2 Answers2

3

There is currently no way to use the contents of a file as a macro value, though there is a jira open for something like this (https://issues.cask.co/browse/CDAP-15424). It is expected that the schema contents should be set as macro value. The UI currently doesn't handle these types of macro values very well (https://issues.cask.co/browse/CDAP-15423), so I would suggest setting it through the REST endpoint (https://docs.cdap.io/cdap/6.0.0/en/reference-manual/http-restful-api/preferences.html#H2290), where the app name is the pipeline name.

Alternatively, you can make your pipeline a little more generic by writing an Action plugin that looks something like:

@Override
public void run(ActionContext context) throws Exception {
  String schema = readFileContents();
  context.getArguments().setArgument(key, schema);
}

The plugin would be the first stage in your pipeline, and would allow subsequent stages in your pipeline to use ${key} as a macro that would be replaced with the actual schema.

Albert Shau
  • 226
  • 1
  • 3
0

If you are using BatchSink

You can read in the

@Override
  public void prepareRun(BatchSinkContext context) {

by something like:

 String token =
        Objects.requireNonNull(
            context.getArguments().get("token"),
            "Argument Setter has failed in initializing the \"token\" argument.");
    HTTPSinkConfig.setToken(token);
Zahid Khan
  • 2,130
  • 2
  • 18
  • 31