Goodmorning all! Hope the xen winter meetup has been great for the ones who joined!
We migrated to XCP-NG 8.3 this week from 8.2.1. On 8.2.1 we had everything working like a charm without issues with cbt anymore. On 8.3 however we seem to run into some frustrating things i can’t really put a finger on the root cause. So i think it’s important to share them with you all so we can see where it goes wrong and how we could fix them.
The first one is related to NBD/Xapi, what i found is that in some rare cases one of the vdi’s from a vm is stucked at the nbd host. I had 3 overnight after restarting the dedicated hosts but now i see new ones on different servers. We have set nbd limit to 1 connection but strange thing was in one case i was able to see there where 4 nbd connections or 4 vdi’s connected, not shure but 3 where disconnected 1 left. Could it be the software is sometimes not respecting the limit and then setting up multiple connections?
Second issue i found is that sometimes cbt is broken on a vm, in that case u can run a full over and over again but that will not work until you disable cbt and enable it again, forcing it someway to setup a new cbt chain. For some reason in the other cases it remains broken. Would it be an option to let the software handle this automatically? Be aware that i found some cases where disabling cbt on a running vm caused an error map_duplicate_key, this only happened when i disabled cbt manually on that vdi, when i did an online storage migration (what is doing the same if i am correct) it was working without this issue, not shure what is the difference but if the same error occurs when doing it automatically from the software u can cause your vm to be destroyed ;-).
Hope anyone can help me into the correct direction on this subjects! I will also open a ticket with support to investigate but would like to share my new experiences here, anyone seeing the same issues?