Over-provisioning in Proxmox: Getting the most from your system

Show Video

Today, I want to take a look at over provisioning in Proxmox and take a look at what is over provisioning, which resources within Proxmox you can over provision, how to monitor the level of over provisioning you're doing in Proxmox to make sure you're not pushing your system too far, or maybe figuring out you have extra resources available on your system that you could give to additional VMs. I also want to note while this video is going to be about Proxmox, I'm going to try to use a lot of general topics that applies to most other hypervisors. So most of the general topics and ideas will be applicable to other hypervisors. So let's get into the really quick overview summary of what is over provisioning. Over provisioning is the term for assigning more resources to the VMs within your system that are actually available on the host. A quick example of this is using CPU cores.

So for example, if I have eight VMs, each of which have four CPU cores assigned to them, and I have a physical host that only has 16 CPU cores, for example, there'd be more resources assigned to VMs that actually exist on the system, thus I am over provisioning resources. The whole theory of over provisioning is built upon the idea that VMs spend a lot of their time at relatively low amounts of load. So why allocate all the cores they potentially need, for example, when you could get less cores and have them share the resources. Having these VMs share resources means you can buy significantly less hardware, saving you money, and also saving you rack space, power and other things that hardware requires, and likely also save you money on other costs like licensing or personnel that maintain a different service. Now the disadvantage of over provisioning is you need to do a little bit more planning and setup for it, because if your VMs are being over provisioned too much, you can get performance to drop off and latency and other issues that can cause problems and make your services not work as well as you want.

But because of the huge cost savings that you can potentially get, it's typically worth it to look at your workload and see if setting up over provisioning or changing how much you do can save you more money or make it better and easier to maintain your systems. Now, I wanna take a little bit of a look at the assumption that over provisioning is built upon. And that is that VMs aren't using all the resources available to them at any given time. And I've used my personal servers and taken a look at a lot of other people's home servers as well as small and medium business servers and found this pattern to generally be true.

A lot of VMs are running and using far from all of their available resources. And instead of having those resources just unused in a running server, having it so you could merge those VMs together and have less unused resources can be really nice. Now, the thing is, this isn't true for every workload. I can take a quick look at your VMs or existing systems if you're moving to VMs and figure out, is that what my workloads really like? An easy way is take a look at something like task manager on your system. On this system I'm currently using for screen recording, I can see I'm barely touching the CPU, for example. And if I take a look at some of my VMs within Proxmox, I can see that these VMs aren't using that much CPU either.

So it looks like there's actually quite a bit of room for me to over provision. Now, one other question would be, well, why not just give them less resources and have less available? One reason I find is it's really nice to have extra resources available for bursty workloads or when I need to change things, like having updates run faster when I need to do that, or having it ready so that if someone, if a lot of people log in at any specific time, they can handle the additional load without slowing down as much because they can use the extra free resources on host essential. Now, there are some workloads that you probably don't want to over provision. Some of those are things like maybe compression or video encoding or other compute tasks, where they essentially run as fast as the resources you give them.

So while you could over provision them, they're just gonna run slower. And at that case, buying more hardware makes sense because you can get the job done faster. The other workload that you probably don't want to over provision or over provision too much is latency critical tasks. For a lot of the servers I see used, for example, things like DNS servers or Active Directory login servers, a little bit more time won't be noticed by a user and won't even cause a noticeable change. But some workloads are latency critical.

And if it takes too long, it can cause major issues. And in those cases, I take a look at metrics for latency and make sure that you're not pushing it too hard or potentially even set up it on bare metal or give it pinned resources so that you can be guaranteed the latency won't shoot up too much. On your resources, like a lot of the other ones I've seen where the average utilization is really low across your VMs or physical hosts, let me know in the comments below.

I'm curious if my assumption is true for my viewers workloads as well. Let's first take a look at over provisioning CPU resources. One misconception I've heard online is that when you assign something like CPU cores to a VM, it effectively takes those cores from the CPU and pins it to that VM and it can't be used for anything else.

So the misconception that, for example, if I have a eight core CPU, I can only assign eight cores to my different hosts. And generally by default, most hypervisors, that is not true. And the VMs will effectively share resources like share CPU resources by default. I have a video on CPU pinning, if you wanna take a look at it of how to set that up so that VMs have access to specific cores and nothing else can use them. But in general, essentially when a VM needs its CPU or task manager, for example, showing that CPU is being used, it uses a CPU on the host.

But when a VM is not using the CPU, it's not using the host CPU and that host CPU is free to run other tasks like other VMs or other host tasks that it needs to. And generally hypervisors are pretty smart about making it so they can bounce resources between CPUs, running your VMs on optimal CPU cores so that it performs to the best. And sometimes the kernel gets it wrong, but generally it tries to do the best and is pretty good at doing that. Now, sometimes I hear rules for kind of a rule of thumb of how much you should over-provision your system as in how many virtual CPU cores you can assign to your VMs for every real CPU core. One number I've heard around a little bit is four to one. So for every four virtual CPU cores on your VMs, you should have one physical core on your system.

And actually, let's take a look at resources right now and try to get it kind of an idea of how well that might work. So taking a look at my host right now for my personal home servers, what I like to do is set it to the maximum for a longer period of time, like a week or a month. And what the maximum does is it shows me the peaks in that period and not the average load. You want your system under normal peaks to still feel snappy and working well and not be pushing it to its limits under average load. So when a slightly higher load comes along, it's gonna struggle and slow everything down. One thing also to possibly think about is are your tasks correlated when they might have higher peaky loads? So for example, if it's maybe a start of a business day and people are logging in using the network more, so your network monitoring VMs doing more, your file server is starting to do more and a lot of login tasks, and maybe DNS has to start moving a lot more.

And if all of those are being pushed at the same time, you're gonna get a bigger peak on the host. So you can't just assume that their peaks are gonna be randomly distributed. Taking a look at the host CPU on my system, I see it's relatively low, typically in the 12, 15, maybe 20% range most of the time.

I've had some higher usage periods in the past, but generally it appears pretty low. One other thing I wanna note is even if your system is at 50% usage, it doesn't mean you have the same amount free as you're currently using. One example of why this might be true is with hyperthreading. Your kernel is gonna assign your VMs to one thread per logical core, and that will appear as 50% usage on your CPU. But when you go max out your CPU and use the second thread per logical core effectively, you're gonna get maybe 30% more performance instead of the 100% more performance that you might expect if you're at 50% more and then use the rest of your CPU.

But this is a generally good guideline that I'm not using too much CPU and I might be there. The other thing I wanna take a look at is my different VMs and their CPU usage. It's just kind of a feel of how much I can over provision them.

So one example I have here is I have my router VM of six CPUs and it's sitting at in its peaks kind of 20%, peaking up to 50 or something like that. So it looks like it's not able to use all of its, it's not using all of its CPUs, so it seems to be doing fine. As an example of some of the very low load CPUs I've seen, this is a WireGuard VM that isn't used too frequently it looks like and most of the time it's at like less than 2% and this is even with the maximum looking at.

So this VM is doing relatively little compared to a lot of the other ones. And taking a look at these different VMs, I can see how much they bounce around and their average usage. I think the general rule of thumb here online, which is expect about four to one when it comes to virtual CPUs assigned to your VMs, the real ones, makes sense, but that's kind of built upon the idea that you're gonna see loads like this, that they're relatively low, less than 20% usage most of the time and that general rule of thumb will be true. But I would take a look at your workloads, if the physical workloads are existing, what's the CPU usage average when they're being used and just kind of see is that if they were sitting at about 25% load on average or less, four to one makes sense.

If you're taking a look at your host and see that host CPU usage is starting to go up or that things like server load is also going up, that's probably a sign you're pushing your CPUs too hard. And especially if you start to see VM performance drop off, you're pushing it too hard, you either wanna get a faster CPU for your system or more systems or reduce the workload on your host. The next thing I wanna take a look at when it comes to over provisioning is memory.

Memory is often listed as one of the most limiting factors when it comes to the amount of VMs you can run on your host. Typically you'll run into memory limitations well before a CPU. And as I take a look at when it comes to CPU, my CPU usage for VM is often quite low, but those VMs still need quite a bit of memory.

And that means memory is gonna be the limiting factor most of the time. And even though you can do a bit of kind of over-provisioning and optimization of the memory on your host, you'll still wanna get typically as much memory as you can if you wanna run a lot of VMs. So let's talk a little bit about those tricks. The first thing I would start off with and kind of the simplest is to limit the amount of memory a host has assigned to it.

If you're giving your VM above and beyond in the memory capacity beyond what it really needs, VMs will often try to use as much memory as they can because it's often assumed that unused memory is wasted memory because if that memory can't be used, it's just sitting unused on the host. So you might as well use it for caching or something like that within your VM. The problem is when it's a VM, it's not really the case if you wanna actually over-vision your memory because there's potentially other VMs or caching on the host that could better utilize that RAM. And I'd say kind of the easiest, most simplest way is to just reduce RAM usage if your VM doesn't have it. Now, the kind of automatic way to try to set that up is the ballooning device within Proxmox. So if I go under hardware and then I go under memory, I can set up a ballooning device under advanced and it's checked by default.

And what this device will try to do is to grab up the available memory the VM has and make it so that it's available to the host. And I can see that it's in use on some of my VMs if I go under summary again and take a look at memory usage and see how that I assign it a total of four gigabytes, but it's not quite using everything. This VM has this interesting sawtooth pattern, but for example, my WireGuard VM's only using like 870 megabytes, for example, out of its total two gigabytes. I have seen some issues with this most notably on Windows.

If you want to use the ballooning device on Windows, you have to have the guest drivers and all those VERT IO drivers installed as though typically not installed by default like a lot of Linux distros. I've also, especially with older Windows OSs, seen it try to just chew up free memory. And what this makes Windows sometimes think is that it doesn't have free memory available and then it'll start swapping out instead of using less ballooning device, which really isn't optimal. It depends on your workload.

You might want to be doing tweaking, but if you want to optimize your memory usage to the maximum, I would say make sure that that ballooning device is enabled and then take a look at the summary here and just make sure that it supports that less is being used. One other note is even without the ballooning device, if you start a new VM, by default, it won't use much memory as there's no processes to assign it, but as time goes on, it will use more memory. So on the VM level, assign less VM if you don't need it, and then just use the ballooning device to make it so that it can dynamically use less memory on the host if it doesn't need everything assigned to it. Now, there's also some optimizations you can do on the host itself when it comes to getting the most out of your memory. Swap devices are kind of an interesting thing in Proxmox, and I might do a full video of it, but if you have a lot of VMs that you aren't actively using and not super performance critical, and you can't add more RAM to a system, it might be worth looking into.

You want an SSD for swap, as using mechanical storage is way too slow to get anything usable out of swap, but I found it possible to have a good amount of VMs kind of swapped out and still get a usable experience. If you're using something like a mini PC or a laptop-based system where you have soldered in memory and it's your only option, it might let you run more VMs and could be something worth looking into to get the most out of your hardware. Now, what I like to do on my host is take a look at the shell, and my favorite tool for this is DSTAT, and take a look at this paging tab right here.

And this tells you how much data is actively being moved into and out of swap. So even though my system right now, if I look at 3-H, has a total of 4GB of swap used, that doesn't really worry me. What worries me is if swap has to be actively used in order to make VMs run. If this paging number starts to go up, you're likely gonna have VMs going in and out of pager memory, cause a huge performance drop-off, and I'd be very careful about setting up swap memory on any performance-critical host, because if those VMs start getting pushed out to swap, performance can just tank, and that's gonna be very bad.

Overall, memory is probably one of the least over-provisionable aspects when it comes to VMs within Proxmox, and I would kind of really try to get more on your host or limit it, your workload, but if you have to, limiting what VMs have, and setting up swap might be worth looking into to get the most out of the memory you have. Now let's talk about over-provisioning storage. Now, I'm gonna split storage over-provisioning into two subcategories, disk I.O. and bandwidth, and the amount of space used in like gigabytes or terabytes on a drive. Starting off with disk bandwidth and throughput, a drive only has so much bandwidth available to it, and if you put multiple virtual disk images on a single disk, they have to compete for that limited amount of bandwidth.

But again, I'm gonna pull out the PC I'm currently using for screen recording as an example, that most of the time when you're just running an OS, the disk isn't being pushed at hard, so you don't need all of the speed that the SSD in your system likely has available to it. So you can typically get away with running quite a few VMs on standard SSDs without any real issue, but this of course depends on your exact workload. I will note that's true for SSDs, for mechanical hard drives, you're likely not gonna have a great experience running multiple VMs on one spinning disk, because they just can't handle that many IOPS, and if something, for example, wants to run an update, that's probably gonna hurt performance of everything else, and really just slow down the system. Now let's dive into the Proxmox interface, and the command line to try to get a better feel for which VMs are using the most disk IO, and how to figure out if you're pushing your disk IO on your system too hard, and if you are, which drives are the actual issue. When it comes to the overall, am I limited by disk IO on my system? I first like to take a look at the IO delay on your system. This is a measurement of how long the system is spent waiting for disks to complete their job, and on my system right now, I can see it's like less than 1%, or it's almost rounding down to zero at times, which means IO is essentially not a limitation for what I'm currently doing on this server.

But if I look in the past month, I can see there was some periods where I was pushing my IO very hard on this system, and it was a limitation, because this blue just shoots up, and under recent times, it looks like it's been fairly tame almost all of the time, so IO is likely not a limitation on this system. If I'm in a situation like I had a month ago where the IO was shooting up, what I like to do is to then open up the shell on the system and use tools like IOstat-XM1. This is part of the Sysstat package, so apt-install Sysstat to get this on your system. What this does that the IO wait metric doesn't do, is it shows you which disks you're actually spending your time waiting for. And as I can see right now, this percent utilization on all of my disks is essentially nothing. But as Sysstat operations hop up, sometimes it shoots up on that disk, and if I see a disk with continually high usage, it's probably limited by that disk, and then I'll take a look at the device name and try to figure out what file system and VMs are being stored on that disk.

One other tool that might come in handy to help figure out what's going on with this IO is using Htop, and in the newer versions, it has this disk bandwidth monitor. And if you use tab to switch over to IO, you can now see which VMs or other processes on the system are using the most disk IO to try to find out which VMs might need to be optimized or switch to other storage or other hosts. So in this example right now, it looks like my VM131 and 118 are pushing the disk the hardest on my system in terms of bandwidth, but in both cases, this is in the kilobyte. So it's almost a non-issue in terms of total bandwidth being used. I will note that while Proxmox does show bandwidth for VMs in the summary tab for disk IO, bandwidth and disk usage isn't completely correlated because if you're doing a lot of sequential IO, perhaps like a backup job, that won't use that much of the drive's time compared to doing a lot of random IO, which isn't a lot of bandwidth, but requires a lot of time to process from the disk. The general rule of thumb is if I start seeing IO, weight or disk usage start going up, I'm gonna either wanna get a faster disk, like moving maybe from a SATA to an NVMe or Optane drive, for example, add or tweak caching on the system.

If I'm using something like a ZFS and a RAID 10 array, maybe try adding a few more pairs of disks as it'll spread the IO across more disks, or perhaps just add more hosts to spread the IO load across. Now let's take a look at setting up over-provisioning when it comes to the space used on disks. When you set up a virtual disk on a VM, under the settings, there's a size parameter, and that parameter is the maximum amount of size the VM can use.

But typically, a VM won't be using all of the space on its disk, and if you do thin provisioning, you can essentially have that unused space free on the host available for other VMs or other things that need storage to use them on the system. The disadvantage of doing this too much is if your VMs need to actually use that space, they can go use it up and essentially run out of free space on the host system, which will pause all the VMs on the system and cause a lot of problems. You really don't want something like that to happen, as it will stop all of your VMs.

Using thick provisioning, where you essentially allocate all of the space of that disk image at the beginning so nothing else can use it, even if the VM isn't using it, can help make sure that you don't accidentally have your VMs pause because of you running out of space on the host. Now, I do want to note that even with thick provisioning, you can still run out of space if you use snapshots. And the issue with using snapshots is they can essentially use additional space to store previous states. So if you aren't careful and keep old snapshots, you can still run into problems. What I like to do if I have a system running for a period of time, is occasionally take a look at this usage graph and try to get a feel for what my trends are.

On this system right now, it's effectively flat. So I'm not using or freeing up very much space. But if it was trending upward, I'd probably want to look into if I can get more disks to have more free available space when I need it in the future and what kind of timeframe I need to start getting those disks in. But right now it looks like it's flat.

So I can kind of forget about this storage probably for another month if I'm not adding or removing many VMs. And if you have an alerting system, setting up alertings when you get too low on space is a really good thing to do as you don't want to be in a situation where you run out of space, it's an ugly scenario. Now let's take a look at configuring thin versus stick provisioning. And unfortunately, due to how Proxmox has many different types of storage and some have their own settings, it can get kind of complicated for each of those different storage types. Starting off with probably the simplest is ZFS. And if I select the ZFS volume, I can enable or disable thin provisioning.

Now I will note that this parameter here only affects newly created disks after you've changed a parameter. It doesn't affect any of the existing disks on that storage. If you want to change existing disks on the storage, you can use the ZFS set parameter and use it on screen here. And that will affect the current storage of whether or not it's thickly provisioned or thinly provisioned. I do want to note that ZFS isn't doing a traditional thick provisioning where it allocates all the data for a virtual disk on the drive. Whereas it's basically saying in metadata, make sure I always have this much free space available to the disk and it reserves that amount of space.

So it's technically not a true thick provision in that way, but it works like a thick provision would. Now, when it comes to other types of storage, there's a lot of different ways that can be here because there's this huge list of existing storage you can use in Proxmox. Starting off with directory storage, these can either use raw or QCOW2 files of which in my experience are generally going to be thinly provisioned and only use the minimal amount of space they need on the system. LVM is thick provisioned, LVM thin is thinly provisioned, NFS, S&B and CIFS, I believe use all the same raw and QCOW2 files as directory just on a network storage and are generally gonna be thinly provisioned.

But I think this can kind of depend on how the file system on the other hand is handling some of those parameters. And that covers it for most of the local storage you'd want to be using on Proxmox. There's a lot of different parameters when it comes to other storage on now. I believe Saff is generally going to be thinly provisioned, iSCSI is thickly provisioned, but you can set up iSCSI to be thinly provisioned on your SAN that you're connecting to.

So this gets really complicated. Let me know if you have a specific case that you want me to look into in the comments below and I'll try to get a feel for what's going on there. Now, beyond setting up thin provisioning, there's still more ways to get the most out of your storage within Proxmox. I want to start off with taking a look at linked clones. So if I have a template of a VM within Proxmox, I can do a normal clone of it to create another VM that I use.

And this can be really nice if I want something like a Windows Server VM that already has all my applications and management set up on it. So it's a lot faster than doing a full install. But when I create a new VM and clone it, I have the option to do a linked clone. Well, essentially what this does is when I just do the initial clone, it just links all of its storage disk to the template I cloned.

And if I make any changes, it stores those separately. What this means is I can use very minimal storage by only storing the changes I've made from that template and not the full amount of storage. So if I actually just do one quick little one here, this clone uses effectively no space on disk right now as I haven't made any changes. But as I start the VM up and start to do things and use it, I can start to get more, it'll start using more space as those changes will be stored differently. Now, one disadvantage of it is it means I can't just delete this clone because this clone is relied upon for the VMs that have a linked clone on it. But this could be a good way if you, for example, have a lot of VMs that are fairly similar and want to save some more disk space.

Now let's take a look at some ways I can use the file system to save a little bit more space on the disk. Using ZFS or other file systems like SAN solutions to support these features, I can set up compression. A lot of OS files can be relatively compressed. So storing all of these VMs on a compressed storage solution can save a moderate amount of space. And especially on slower storage, where your CPU can compress data much faster than your disk can store it, it's almost three in terms of the amount of additional space and you can almost get more speed as your CPU can compress the data faster than your disk could store it uncompressed. In Proxmox, this is enabled by default if you're using ZFS style, if you're using ZFS storage, but a lot of other file systems like LVM, EXT4, XFS and others don't support it.

If you're using something like a SAN solution to those, those often could support compression as well. One other technique to take a look at is deduplication. This can be done in ZFS as well as a lot of other SAN solutions. Though I will note ZFS has some big caveats when it comes to performance in ZFS and performance with deduplication enabled and making sure you have enough RAM enabled.

But if you check all those requirements and have a lot of files that are similar like OS files from multiple installs, you can save a lot of disk space by having deduplication running instead of having to store full copies of all the different data. And if you do have a workload where that can make sense, the additional RAM requirements, for example, for deduplication can be easily paid for by having less disks that you need to use to store all of the data. One other component to take a very quick look at when it comes to over-provisioning is networking on your system. In most of the networking cases in Proxmox and other hypervisors, you're gonna be sharing a certain amount of nicks on the system that effectively have less bandwidth available to them than all of the VMs can use at the same time. Now, this often isn't an issue because if you're being limited by your internet bandwidth, for example, it's likely gonna be slower than the ports on your board anyway. So those won't be a limitation of your workload.

But if, for example, you have a lot of VMs that wanna talk to a lot of physical clients on your system, be aware that you're gonna be sharing the one gigabit or two, 10 gigabit, or whatever speed the network card is between all of your VMs. If this is becoming a problem and you're running into issues when you're maxing out your network bandwidth on your system, one idea that you might wanna take a look at is setting up rate limiting on certain VMs if they're chewing up too much bandwidth. So for example, if you have a low priority high bandwidth task, like running a backup job, maybe limit that to half of your network speed if you have other VMs competing for the speed. So those higher priority tasks can get vast, aren't being limited by that low priority high bandwidth task on your system. CPU, memory, disk, and networking are the main resources that are shared within Proxmox and the over-provisioning is managed within Proxmox.

You can give other resources to a VM. So for example, USB or PCIe pastor, but I don't think of those as being over-provisioned as you have to assign it to a specific VM and those resources can't be shared. There are some other examples, like for example, with GPUs or some network cards with SOIOV, where they will essentially show up multiple virtual devices to the guest and those will be shared, but they're typically managed by the device itself you're putting in, not within Proxmox. So I'm gonna leave that out of the scope of this video for today, but let me know if you want me to take a look at SIR, IOV, and networking or in graphics cards in the future and I'll work on videos taking a look at those topics. How are you setting up over-provisioning on your Proxmox system and how much are you happy with how it's set up? Let me know in the comments below and I'm curious what your experiences with it are. Thanks for watching this video and I'm looking forward to reading your comments.

Also, I've tried setting up memberships for this channel, there should be a join button below if you wanna get a little bit of behind the scenes comments, content, and I'll try to post some of these videos a little bit early too if you wanna get a sneak preview of these videos. Thanks for watching.

2025-03-08 17:01

Show Video

Other news

Nvidia Shrugs Off China Concerns With Upbeat Forecast | Bloomberg Technology 2025-05-30 11:26

MIT Robotics - Cecilia Laschi - Methods and technologies for new robotics scenarios 2025-05-30 09:58

Telefonica Tech Reimagines Itself for the Hybrid Cloud Era 2025-05-30 04:58