Best Practice: Add an Additional Disk to Expand Logical Volume or Expand Existing Disk

Question

We host a client’s Oracle VMs on our Nutanix platform. To date, whenever their VMs require more space they have us add an additional vDisk which they then add to the VG in order to expand the required LV. The reason they’re doing it this way is because they don’t know how to expand a disk and its partitions inside Linux without rebooting the OS.

Of course, it is completely possible to add a disk in Linux and expand the parition, LV and filesystem while the OS is running, and in my opinion, this is the preferred method in terms of keeping things simple and linear. However, I don’t know enough about LVM on a pooled storage backend to justify this from a performance perspective.

So my question is:

How would multiple vDisks for a single LV impact I/O performance for an Oracle DB compared to using a single large vDisk on a virtualisation platform where storage from multiple physical disks are pooled together?

Cameron Kerr · Answer 1 · 2021-04-07T11:21:20.643

I would suggest you should avoid using partitions, because partitions are a) awkward to resize and can require a reboot, b) cause geometry change when resizing over a size limit of about 500GB. Multiple disks also cause confusion as to what your hypervisor calls the disks.

What I've tended to do is is have a SYS disk and DATA disk(s). The SYS data I tread like a regular disk (with some partitioning), but the DATA disk (say /dev/sdb) I leave unpartitioned and just use it directly as a Physical Volume (PV).

Let's say you have a directory /srv that is mounted as ext4 from /dev/mapper/vg-DATA/lv-srv

If I want to add 100GB to that logical volume, I tend to do the following (with no reboot needed):

Resize the underlying disk (sdb) in the hypervisor
echo - - - > /sys/block/sdb/device/rescan to cause the kernel to rescan that SCSI device. (not sure if this is needed for devices such as /dev/vd*)
dmesg | tail should show the kernel has picked up the capacity change.
pvresize /dev/sdb will cause the PV to adjust its size automatically (it will say '1 successful, 0 unsuccessful' or similar.
vgs DATA will not show it has some free space.
Resize the LV ('srv' in my example) you want using lvresize --name srv --extents='+100%FREE' --resizefs DATA

(I typed all these from memory, so if I got that all correct, its a testament to how repeatable the process is)

Note that I said --resizefs, which assumes the filesystem you're using is capable of online resizing (eg. ext4 and xfs)

I should say that this is perhaps not COMMON practice... but in my experience it was MUCH BETTER than what we used before with partitions. The problematic servicing tends to be when the SYS (partitioned) disk needs work, but in this design, I told our engineers that should be a sign that they should create a DATA disk and refactor the storage (this is something that does tend to need an outage to cutover). When I was specifying this practice, I was trying to streamline our operations to have an experience closer to what you see in cloud services.

Beware 'Best Practice' in this case is overly informed from physical kit, and in terms of VMs is well due for a rethink. Its perhaps a bit too early to know what 'Best Practice' is new paradigms. I would settle for 'Good Consistent Local Practice' that makes it easy to meet your servicing requirements.

I should also warn you that the disk may appear empty to the likes of 'fdisk'. Use 'lsblk' to see what storage is used (and install this by default). This is why Oracle don't recommend using ASM on unpartitioned disks; but perhaps fdisk/parted is smart enough to recognise an unpartitioned disk that is a physical volume (I don't know myself).

With regard to performance, when we moved our Oracle Database workloads into VMs we had some discussion with our VMWare admins around this. In this case, they had some dedicated storage for this (with other optimisations; I can't remember what, although I do recall they disabled snapshots). They did have some concern that we didn't have a bunch of virtual scsi devices on the same virtual bus; but I don't know to what extent that matters today.

In short, from a performance point of view, if Nutanix don't have a reference architecture for your version of Oracle DB and your version and configuration of the storage and virtual infrastructure, then you'd have to benchmark and compare. Tools such as bonnie++ may (still?) be useful. You should also be caring more about whether ASM is going to be used, or if files will be used.

Thank you very much for the time you took to write that detailed answer Cameron. I should probably have mentioned that I indeed know how to resize a disk, partition, PV, LV and filesystem while the OS is running. It’s our client who doesn’t know how to do this. Each time they request more storage on their servers, we have to add an additional vDisk for them instead of increasing the size of an existing vDisk because they don’t know how to expand a disk inside Linux without rebooting the OS, which is of course easily done by just rescanning the SCSI bus. — Reginald Greyling, Apr 09 '21 at 13:22
So this is why I asked my question, as I told them it’s possible to expand a disk inside Linux while the OS is running and I would help them do so, however, they asked me for a reason to expand an existing disk compared to adding an additional one. So apart from keeping things simple and linear, I was wondering whether this might impact IO performance. While waiting for an answer to this question, I actually reached out to Nutanix and asked for their recommendation. As it turns out, Nutanix, in fact, recommends more vDisks, and when it comes to Oracle, a minimum of 8 striped for one LV. — Reginald Greyling, Apr 09 '21 at 13:26
One last comment with regards to partitioning disks for LVM, a lot of sysadmins still recommend partitioning the disk, mostly because of the management issues it can create and it could result in poorly-aligned volumes/filesystems and. Any other OS that looks at the disk will not recognize the LVM metadata and display the disk as being free, so it is likely it will be overwritten. — Reginald Greyling, Apr 09 '21 at 13:40

score 0 · Answer 2 · answered Apr 07 '21 at 11:26

How would multiple vDisks for a single LV impact I/O performance for an Oracle DB compared to using a single large vDisk on a virtualisation platform where storage from multiple physical disks are pooled together?

The performance bottlenecks and limitations that I can off-hand think off are/can be:

the uplink to the storage network that connects the hypervisor to the storage back-end.
The same uplink will be used regardless of one or many vDisks will be accessed, no difference there.
Assigned IO limits.
Most virtualisation stacks allow the platform to set quota/limits to guests to prevent a single runaway VM from causing resource starvation for the other VM's running concurrently on the same hypervisor.
If IO limits are assigned at the level of the guest, then using a single large vDisk or many smaller vDisks does not make much of a difference.
When IO limits are assigned per block device, then each additional vDisk gives you additional IO quota and extra (potential) performance.
Maximum number of block devices.
There is an upper limit to number volumes/vDisks that can be attached to a single VM. That imposes an up limit to the amount of resizing you can do by adding more vDisks.

Thanks for the answer Herman. As I explained to @cameron, I reached out to Nutanix support to find out whether they have any Oracle recommendations, and they, in fact, recommend a minimum of 8 vDisks striped for an LV on which to store the database datafiles. But this then leaves the question, at what point would you look at adding another vDisk compared to expanding the existing ones? I was quite disappointed with this recommendation because it makes managing storage so much more complicated. — Reginald Greyling, Apr 09 '21 at 13:44

Best Practice: Add an Additional Disk to Expand Logical Volume or Expand Existing Disk

So my question is:

2 Answers2