0

I have HDFS cluster with Active and Stanby Namenodes. Sometimes when cluster gets restarted, Namenodes exchange their roles - Standby becomes Active, and vice versa.

Then I have NiFi flow with PutParquet processor writing some files to this HDFS cluster. Processor is configured with directory property as "hdfs://${namenode}/some/path", where ${namenode} variable value is like "first.namenode.host.com:8020".

Now, when cluster gets restarted and actual Namenode gets changed to "second.namenode.host.com:8020", configuration in NiFi is not updated and processor still tries to use old namenode address, and thus some exception is thrown (I don't remember actual error text, but I think it doesn't matter for my question).

And now the question is: how can I track this event in NiFi, to automatically update PutParqet processor configuration when HDFS configuration changed?

NiFi version is 1.6.0, HDFS version is 2.6.0-cdh5.8.3

megazlo
  • 3
  • 2

3 Answers3

2

I haven't confirmed this, but I thought with HA HDFS (Active and Standby NNs), you'd have the HA properties set in your *-site.xml files (probably core-site.xml) and would refer to the "cluster name" which the Hadoop client will then resolve into a list of NameNodes, which it would then try to connect to. If that's the case, then try the cluster name (see the core-site.xml file on the cluster) rather than a hardcoded NN address.

mattyb
  • 11,693
  • 15
  • 20
1

Two things that you could do:

  • If you know the IP address or hostname of the two name nodes, you can try this: Connect the failure relationship of PutParquet and connect it to either UpdateAttribute and change the directory value if you're using NiFi expressions for Directory property or another PutParquet processor with the directory value configured with the standby name node.
  • You could use PutHDFS but I'm not sure if PutParquet offers better performance over PutHDFS
Sivaprasanna Sethuraman
  • 4,014
  • 5
  • 31
  • 60
  • PutHDFS and PutParquet both have Directory property, and I think both expect "hdfs://${namenode}/some/path" as value of this property, so what are the differences between them? And what is the output format of PutHDFS? How to force it to write Parquet files? I don't see any properties for that... – megazlo Sep 14 '18 at 13:23
  • 1
    You should be able to specify the directory as a "/some/path" and it will be at the root of the file system based on whatever is configured in core-site.xml for the default file system. I think you only need the hdfs:// prefix if you are writing to a different filesystem than the default. – Bryan Bende Sep 14 '18 at 13:26
0

Seems I have solved my problem. But that was not a "problem" at all :) here is the solution: httpfs error Operation category READ is not supported in state standby.

I had not to track event of changing active namenode manually within NiFi, instead of this I just had to configure my Hadoop client properly with core-site.xml to force it to get actual namenode automatically.

So the solution is just to set property "fs.defaultFS" in core-site.xml to the value of property "dfs.nameservices" from hdfs-site.xml (in my case "fs.defaultFS" in core-site.xml pointed to the actual host of active namenode - "first.namenode.host.com:8020").

I say "seems" because I have not tested this solution yet. But using this approach I can write to HDFS cluster without setting active hanemode address anywhere in NiFi. I just set it to use some "nameservice" rather then actual address, so I think if actual address changes - probably this does not affect NiFi, and Hadoop client handles this event.

Later I'm going to test it.

Thanks to @mattyb for an idea!

megazlo
  • 3
  • 2