0

I have a samza job which I am trying to run on yarn cluster using

./bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file:///home/anshu/samzaJob.properties

The job triggers and runs fine with this configuration.

Now after the job has started, I have some application specific configurations (in form of separate properties files) which I am trying to load using apache commons configuration library. For this, I have created a appconfig folder and trying to read all the files in that folder

CONFIGURATION_FILE_PATH = System.getProperty("user.dir") + "/config/appconfig";

This works fine on my local box, but when this is run on yarn cluster, this resolves to

/var/lib/hadoop-yarn/data/samza-yarn/usercache/anshu/appcache/application_1462311090906_0973/container_e19_1462311090906_0973_01_000003/config/appconfig

which is not correct.

How should I find the correct path to load the file from? Or is there any other way this can be done?

Ansh
  • 357
  • 1
  • 2
  • 13

1 Answers1

0

Well, it looks like the way I was trying to do this was not correct.

It was working on local box as the path for the properties file given were correct and the file were actually residing there. But when trying to run on the yarn cluster, this approach of giving the absolute path to properties was not working as System.getProperty("user.dir") will always give the path to samza container and if the properties file are not at that location, it will fail.

The ideal approach is to put the file at some location which is loaded in classpath so as to make sure that they will always be there when you are trying to load the class and load them using

ClassLoader loader = Thread.currentThread().getContextClassLoader(); InputStream resourceStream = loader.getResourceAsStream(propertiesFilePath)); FileConfiguration configuration = new PropertiesConfiguration(); configuration.load(resourceStream);

Ansh
  • 357
  • 1
  • 2
  • 13