Can CHEF have different databags in different datacenters

Question

Chef 12 has the notion of organizations, and environments.

How would you logically model different data centers?

For example, if you have two data centers, with 2 environments.

us-datacenter  
 \_ production
 \_ stage

eu-datacenter  
 \_ production
 \_ stage

Each environment has web servers that need to point at different databases

us-prod-web01 => us-prod-db01
eu-prod-web01 => eu-prod-db01

us-stage-web01 => us-stage-db01
eu-stage-web01 => eu-stage-db01

The obvious answer would be to create nested databags which contain the ip addresses of the correct database.

$datacenter/$environment/web
$datacenter/$environment/database

However databags have a strict 2 level hierchy, and there is no concept of a 'datacenter' that I can find.

How could this best be modeled? The two approaches I can think of.

Use 2 level databags

$environment/us-web
$environment/us-database
$environment/eu-web
$environment/eu-database

This has the downside of putting a lot of databags in one directory. Even if you use the gui with hosted CHEF, thats still hard to scroll through databags. (8 data centers * 4 environments * 6 types of webservers + 2 types of database servers = a lot )

Put a data center variable in the role / node attributes and write a helper method to look up the correct databag.

This was suggested on IRC, however writing a helper method seems very complex (being new to chef, and having never done anything like this). Also it seems like it is a lot of reinventing the wheel for a use case that should be very common. For example, this can be done simply in puppet+hiera

Surely I'm not the first person using chef to have different settings in different datacenters.

score 1 · Answer 1 · answered Jun 23 '15 at 04:45

If you're at the level of planning separate datacenters, you'll likely to run into two issues:

One chef installation being too slow to support everything - you'll probably end up with one per DC.
A lot of databags for various purposes. At that point GUI and manual editing stop being an issue. It will be easier to just synchronise whole repository on every change instead of dealing with single files or doing anything in the GUI.

What you may find useful (if you have similar services in each DC) is to use full names for each environment - that is stuff like "eu-test", "eu-prod", etc. and invert the hierarchy, so you end up with

/ database
  - eu-stage
  - eu-prod
  - us-prod
/ web
  ...

and access /service/$environment.

I believe that databags are also separate between organisations, so if your us and eu are completely different, then yes it could be useful. If they're similar, then there's probably no point in that kind of separation.

PS: If you're just starting with chef, don't call your service "web" - you're likely to get another webserver with different role soon... Give it a more meaningful name. Same for "database".

score 0 · Answer 2 · answered Nov 10 '15 at 20:34

The solution I ended up implementing is documented on my blog

Basically I have 3 roles, and 2 environments. I apply multiple roles to a server based on its location and type.

service roles (roleA)
override roles (override-roleA)
datacenter roles (datacenter-us-west)
prod environment (prod, stage)

The hierarchy ends up looking like this:

Example role

Can CHEF have different databags in different datacenters

2 Answers2