0

I built a 3 host nifi cluster. It's working but I cannot use a balancer. All I can do is to connect directly to a single host. Did somebody set a nifi cluster using a balancer? I.E. how do you handle the certificate issue?

ozw1z5rd
  • 3,034
  • 3
  • 32
  • 49

1 Answers1

1

Are you trying to load balance the UI interaction or a specific processor/input source? With NiFi's zero-master clustering (ZMC), available in 1.0.0+, you can connect to the UI of any connected node and monitor & modify the flow. If you are trying to load balance input data, I would suggest either having a single point of entry processor that runs on the primary node (see excerpt below) and then distributes the data throughout the cluster, or if you really need load balancing immediately for performance, maybe setting up something like HAProxy or another front-end load balancer (even round-robin DNS) and pointing to all the available nodes.

If you are trying to load balance work across the cluster, Remote Process Groups, which allow for transmission between nodes, automatically balance data across the available NiFi nodes.

From the NiFi Admin Guide

Primary Node: Every cluster has one Primary Node. On this node, it is possible to run "Isolated Processors" (see below). ZooKeeper is used to automatically elect a Primary Node. If that node disconnects from the cluster for any reason, a new Primary Node will automatically be elected. Users can determine which node is currently elected as the Primary Node by looking at the Cluster Management page of the User Interface.

Isolated Processors: In a NiFi cluster, the same dataflow runs on all the nodes. As a result, every component in the flow runs on every node. However, there may be cases when the DFM would not want every processor to run on every node. The most common case is when using a processor that communicates with an external service using a protocol that does not scale well. For example, the GetSFTP processor pulls from a remote directory, and if the GetSFTP Processor runs on every node in the cluster tries simultaneously to pull from the same remote directory, there could be race conditions. Therefore, the DFM could configure the GetSFTP on the Primary Node to run in isolation, meaning that it only runs on that node. It could pull in data and - with the proper dataflow configuration - load-balance it across the rest of the nodes in the cluster. Note that while this feature exists, it is also very common to simply use a standalone NiFi instance to pull data and feed it to the cluster. It just depends on the resources available and how the Administrator decides to configure the cluster.

Community
  • 1
  • 1
Andy
  • 13,916
  • 1
  • 36
  • 78
  • 2
    Hello Andy, I'm trying to load balance the UI. But there are some issues with the balancer. When request goes through the balancer I'm no longer able to login. Also the cluster is behind nifi.thiscompanydomain.com and the certificate seen by browser depends on the host selected by the balancer. Since each host has it's own certificate this is raising some problems. – ozw1z5rd Oct 14 '16 at 08:00
  • Understood. I would need more details on the "unable to login" part - i.e. what authentication mechanism is in place, etc. - but the certificate issue may be easily resolvable -- can you add `nifi.thiscompanydomain.com` as a SAN entry in the NiFi certs? – Andy Oct 14 '16 at 16:45
  • The authentication is ldap. I checked and balancer routers for each connection uses a different host. This is making a mess. About SAN, yeah.. this can be a good solution. – ozw1z5rd Oct 17 '16 at 06:42
  • LDAP authentication should work out of the box -- ensure that the manager credentials are correct in `$NIFI_HOME/conf/login-identity-providers.xml` and that provider is referenced by `nifi.security.user.login.identity.provider=` in `$NIFI_HOME/conf/nifi.properties`. See [NiFi Admin Guide - User Authentication](https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#lightweight-directory-access-protocol-ldap). – Andy Oct 17 '16 at 17:12
  • Yes, LDAP authentication works. The problems were the certificate issue ( thanks for the hint about SAN field ) and how the round robin policy into the balancer. – ozw1z5rd Oct 17 '16 at 19:24