SMAPIv3 - Feedback & Bug reports
tjkreidl Ambassador 📣
@olivierlambert said in SMAPIv3 - Feedback & Bug reports:
You need to get rid of SMAPIv1 concepts If you meant "iSCSI block" support, the answer for right now: no.
It's a brand new approach so we'll take time to find the best one, to avoid all the mess that had SMAPIv1 on block devices (non thin, race conditions etc.)
I think the next big "device type" support might be
raw(passing a whole disk without any extra layer to the guest).
Ages ago (in the 1980s), I experimented with raw disk I/O on VAX systems using QIO calls. Yes, it's fast, but also doesn't take bad block or deteriorating disk sectors into account. I can't recall offhand if there way a way to at least update bad block lists or if you had to start from scratch.
Are there better mechanisms these days to handle such things as read/write errors and re-allocation to good blocks if bad blocks are detected on a running system?
Andrew Top contributor 💪
@tjkreidl In days gone by drives used to have a bad sector list printed on the case (SMD/MFM/RLL). It would also be stored on the drive for quick reference. When you formatted the drive the software would use the bad sector list and then add to it during formatting tests. These sectors were "allocated" in the filesystem so they would not be used for normal storage. DOS and unix support a hidden bad block list for this.
As time progressed the controllers got smarter and the bad sector avoidance moved from the OS to the controllers. The systems were able to map out bad blocks into spare sectors or tracks. As the controllers became integrated onto the drives (SCSI, IDE, etc) the drives mapped out bad sectors automatically and hidden from the OS and offered a continuous range of good blocks to the OS. This is why systems have moved to LBA and don't use Head/Track/Sector.
So data block X is always data block X even if the drive moved it somewhere else..... the OS does not know or care.
This contiguous whole disk range of good blocks exists today with flash storage and is automatically and dynamically handled by the flash controllers. As the flash blocks fail (or just get near failure) and get reallocated the spare block count decreases. When spare blocks reach 0 (zero, none) most flash drives force a read-only mode and the device has reached end of life. Hard drives also have a limited number of spare blocks. SMART tools can be used to check how healthy a drive is.
So today RAW drive/storage devices are not really raw but managed by the device and storage controller (flash, SATA, SAS, RAID, etc) to provide good blocks. I/O failure is very bad as it indicates a true unrecoverable failure and time to replace the drive.
tjkreidl Ambassador 📣
@Andrew Thank you for that, much appreciated. Although I was aware of this process for SSD drives, I did not know that spinning disks had become that much smarter in the interim (~40 years!). But in any case, raw drives are very powerful if you have decent code to access them and the overhead can be appreciably less than with formatted drives.
@olivierlambert hi. I'm also eager to see how the new v3 is progressing. From my company point of view, being able to compact VDIs using guest trim/unmap is very valuable as it minimises storage space usage and improves backup/restore speeds.
A big blog post is coming soon. I need to check with @matiasvl about trim passing via raw tapdisk datapath.
Please let us know when we can test that new zfs-ng!
SMAPI v3 looks very exciting, unfortunately on the bottom is still tapdisk, and that has one but it's a very serious limitation - no io/bandwidth limit ;(
It's not obvious/100% sure that tapdisk is the bottleneck
Hmm, if we creating a volume plugin that combine linux cgroups (iops/bandwidth limit) + filesystem (zfs block device - zvol), that would be one possible workaround no matter what's at the bottom.
I think you are oversimplifying how the storage is working in Xen It's not KVM.
See https://xcp-ng.org/blog/2022/07/27/grant-table-in-xen/ for more details.