0

I'd like your advice on a setup I'm implementing in order to allow multiple hosts to share occasionally varying data on shared iSCSI storage. I'm using GFS2 to share access to an LVM2 logical volume on the iSCSI, and I'd rather avoid the complexity of setting-up a cluster using CoroSync etc.

I've formatted the filesystem with locking set to lock_nolock and a single journal. A single node would be tasked with performing periodic updates, which typically include copying new files into the volume but no changes to the existing ones, and all the other nodes would mount it as spectator,ro. According to the man page this would:

Mount this filesystem using a special form of read-only mount. The mount does not use one of the filesystem's journals. The node is unable to recover journals for other nodes.

Can I reasonably expect this setup to be stable and performant? Any gotchas I should pay attention to?

Can I assume that attempting to mount R/W from multiple hosts will fail since the filesystem only has a single journal?

Isac Casapu
  • 235
  • 1
  • 11
  • Sounds like NFS is an much easier option – eckes Apr 26 '18 at 16:22
  • It would be, but the storage array I obtained 2nd hand only supports iSCSI, ruling out NFS unfortunately. – Isac Casapu Apr 26 '18 at 16:28
  • Ok depends on the network, but you could have one node have a local Filesystem mounted on iSCSI and export it via NFS to all others. – eckes Apr 26 '18 at 16:30
  • The shared storage needs to provide high throughput access to a whole grid of Hadoop Spark / Yarn processing nodes. Doing what you suggested would create a serious bottleneck. I am in fact considering a setup like you proposed for another shared storage with more modest bandwith requirements. – Isac Casapu Apr 26 '18 at 16:35
  • Sounds like a job for replicated HDFS then. – eckes Apr 26 '18 at 16:36
  • The compute nodes I have at hand have very little internal storage, since they only take 2.5" disks, and fitting them with sufficient internal storage to store local copies of the data would be prohitively expensive. Once we gain more experience with Hadoop and build-up scale, we might consider something like HDFS in the future. – Isac Casapu Apr 26 '18 at 16:40
  • 1
    My advice: Don't avoid the complexity of a CRM and communication layer like corosync and pacemaker. I get that pacemaker is legitimately complicated, but not implementing fencing and resource management is going to cause way more problems and headaches than just biting the bullet and learning how to use pacemaker. – Spooler Apr 26 '18 at 16:51

1 Answers1

0

I've implemented the setup above and it works fine with one major limitation: The hosts that are mounting R/O have no way of knowing that the shared volume has changed. After performing updates from the host that has write access, I need to manually sync the filesystem and then force the reading clients to flush their inode buffers using a command like echo -n 2 | sudo -n /bin/dd of=/proc/sys/vm/drop_caches. Note, that if file content might change, you need to write 3 instead of 2, to also flush files.

Another issue I sometimes encounter, is that R/O clients might fail to mount the shared storage with 'permission denied'. To resolve this I need to unmount the volume from the R/W node, mount on any R/O nodes that experience the issue and then mount again on the R/W node.

Below is an Ansible role that accomplishes this:

---
- name: Determine the canonical path of the shared-data directory
set_fact:
    shared_dir_real_path: "{{ shared_dir_path | realpath }}"

- debug:
    msg: "Manually forcing flushing and re-read of directories on volume at {{ shared_dir_path }} (real path: {{ shared_dir_real_path }})."
    verbosity: 1

- name: Determine shared-dir mount point
command: "/usr/bin/env stat -c '%m' {{ shared_dir_real_path }}"
register: shared_dir_mount_point
changed_when: False

- name: Determine the mount point's filesystem type and mount options
set_fact:
    "shared_dir_mount_{{ item }}": "{{ ansible_mounts | selectattr('mount', 'equalto', shared_dir_mount_point.stdout) | map(attribute = item) | join(',') }}"
with_items:
    - fstype
    - options

- name: Verify the shared-dir is mounted GFS2
assert:
    that: "'{{ shared_dir_mount_fstype }}' == 'gfs2'"

- name: Determine the access to the shared-data directory
set_fact:
    shared_dir_access_flags: "{{ ['ro', 'rw']  | intersect( shared_dir_mount_options.split(',') )}}"

- name: Verify Access mode sanity
assert:
    that: shared_dir_access_flags | length == 1

- name: Sync the shared filesystem
command: "sudo -n /bin/sync -f {{ shared_dir_real_path }}"
args:
  warn: false # Silence warning about the use of sude instead of 'become', which is here deliberate

when: "'rw' in shared_dir_access_flags"

- name: Force re-load of directory inodes
shell: "echo -n 2 | sudo -n /bin/dd of=/proc/sys/vm/drop_caches"
when: "'ro' in shared_dir_access_flags"
Isac Casapu
  • 235
  • 1
  • 11