XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-NG 8.3 freezes and / or reboot on HGR-SAP3 by Ovh / Using ZFS on Raid 50 / With deduplication

    Scheduled Pinned Locked Moved Hardware
    20 Posts 2 Posters 498 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Y Offline
      yann_m
      last edited by

      Sorry title :

      XCP-NG 8.3 freezes and / or reboot on HGR-SAP3 by Ovh

      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates ๐Ÿช Co-Founder CEO
        last edited by

        Hi,

        You need to add more memory in your Dom0

        Y 1 Reply Last reply Reply Quote 0
        • Y Offline
          yann_m @olivierlambert
          last edited by

          @olivierlambert

          Thank's.

          Is it just a lack of memory for my Dom0 ?
          What memory size do you recommend ?

          This is my production environment, so I'll have to wait before I can do that.

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates ๐Ÿช Co-Founder CEO
            last edited by

            Because ZFS eats a lot of RAM, go with 32GiB in the Dom0.

            Y 1 Reply Last reply Reply Quote 0
            • Y Offline
              yann_m @olivierlambert
              last edited by yann_m

              @olivierlambert "
              Tank's a lot

              I'll test it as soon as I can restart my host,
              I've read a lot of messages saying that putting too much memory in Dom0 is counterproductive.
              I'll keep you posted.

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates ๐Ÿช Co-Founder CEO
                last edited by

                It is, if you don't use ZFS. ZFS is a memory hog.

                1 Reply Last reply Reply Quote 0
                • Y Offline
                  yann_m
                  last edited by

                  Last night, I put 32GiB on the Dom0.

                  For information, since I restarted, it's consuming approximately 17 GiB.

                  I'll see if the increased load expected this weekend and the next won't cause it to crash.

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates ๐Ÿช Co-Founder CEO
                    last edited by

                    Great, keep us posted ๐Ÿ™‚

                    1 Reply Last reply Reply Quote 0
                    • Y Offline
                      yann_m
                      last edited by

                      Unfortunately, I'm really pessimistic about my technological choice.

                      Maybe I should never have installed ZFS on it, or perhaps I should have never enabled deduplication ?

                      I didn't have time to post this before, but Wednesday morning, after some major backups, I lost all my VMs for 3-5 minutes (lost, but no crash) due to overactivity on tapdisk.

                      Luckily I found them again after.

                      So I haven't lost anything except my sleep.

                      Here's an example of one of my 14 logs (14 VMs) (I've anonymized the UUIDs).

                      Sep 24 04:02:03: [04:02:03.958] tapdisk[728741]: log start, level 0
                      Sep 24 05:27:00: [05:27:00.943] tapdisk_vbd_check_progress: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: pending requests idle for 624 seconds
                      Sep 24 05:27:00: [05:27:00.943] tapdisk_vbd_debug: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: state: 0x00000000, new: 0x00, pending: 0x06, failed: 0x00, completed: 0x00, last activity: 1758683796.081829, errors: 0x0000, retries: 0x00c2, received: 0x000f9b42, returned: 0x000f9b3c, kicked: 0x0439e111
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: /var/run/sr-mount/myZFSraid50/diskVM1.vhd: QUEUED: 0x000871bb, COMPLETED: 0x000871b5, RETURNED: 0x000871b4
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: WRITES: 0x000437db, AVG_WRITE_SIZE: 28.221727
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: READS: 0x000439e0, AVG_READ_SIZE: 23.678925
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: ALLOCATED REQUESTS: (352 total)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 92: vreq: xenvbd-14-768.67.0, err: 0, op: 2, lsec: 0x02d74e6f, flags: 0, this: 0x1b02ed0, next: (nil), tx: (nil)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 150: vreq: xenvbd-14-768.67.0, err: 0, op: 2, lsec: 0x01b7f720, flags: 0, this: 0x1b07580, next: (nil), tx: (nil)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 190: vreq: xenvbd-14-768.67.0, err: 0, op: 2, lsec: 0x01879d78, flags: 0, this: 0x1b0a640, next: (nil), tx: (nil)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 220: vreq: xenvbd-14-768.67.0, err: 0, op: 2, lsec: 0x02d74ec7, flags: 0, this: 0x1b0cad0, next: (nil), tx: (nil)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 235: vreq: xenvbd-14-768.67.0, err: 0, op: 2, lsec: 0x02d74ee7, flags: 0, this: 0x1b0dd18, next: (nil), tx: (nil)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 259: vreq: xenvbd-14-768.67.0, err: 0, op: 2, lsec: 0x18f628e2, flags: 10, this: 0x1b0fa58, next: (nil), tx: 0x1af8a68
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: BITMAP CACHE:
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 0: blk: 0x2d74, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afb228, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 1: blk: 0x34f2b, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afb080, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 2: blk: 0x1bdb, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afaed8, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 3: blk: 0xd237, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afad30, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 4: blk: 0x072f, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afab88, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 5: blk: 0x0740, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afa9e0, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 6: blk: 0x8a08, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afa838, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 7: blk: 0x0773, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afa690, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 8: blk: 0x3ee8, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afa4e8, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 9: blk: 0x1b7f, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afa340, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 10: blk: 0x070e, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1afa198, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 11: blk: 0x11512, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9ff0, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 12: blk: 0x26ca, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9e48, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 13: blk: 0xe6f3, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9ca0, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 14: blk: 0x0fc2, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9af8, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 15: blk: 0x1c900, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9950, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 16: blk: 0x96ac, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af97a8, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 17: blk: 0x0790, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9600, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 18: blk: 0x0734, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9458, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 19: blk: 0x0752, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af92b0, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 20: blk: 0x0732, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af9108, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 21: blk: 0x1c1bd, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8f60, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 22: blk: 0xfc4c, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8db8, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 23: blk: 0x1d0d, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8c10, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 24: blk: 0x18f62, status: 0x0000000a, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 8, in use: 1, tx: 0x1af8a68, tx_error: 0, started: 1, finished: 1, status: 1, reqs: 0x1b0fa58, nreqs: 1
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 25: blk: 0x1960b, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af88c0, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 26: blk: 0x134b, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8718, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 27: blk: 0x1bbb, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8570, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 28: blk: 0x1879, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af83c8, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 29: blk: 0x076a, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8220, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 30: blk: 0xb73b, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af8078, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: 31: blk: 0x0731, status: 0x00000000, q: (nil), qnum: 0, w: (nil), wnum: 0, locked: 0, in use: 0, tx: 0x1af7ed0, tx_error: 0, started: 0, finished: 0, status: 0, reqs: (nil), nreqs: 0
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: BAT: status: 0x00000000, pbw_blk: 0x0000, pbw_off: 0x00000000, tx: (nil)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: /run/sr-mount/myZFSraid50/disk_an_other_VM.vhd: QUEUED: 0x00099bd9, COMPLETED: 0x00099bd9, RETURNED: 0x00099bd9
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: WRITES: 0x00000000, AVG_WRITE_SIZE: 0.000000
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: READS: 0x00099bd9, AVG_READ_SIZE: 23.188253
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: ALLOCATED REQUESTS: (352 total)
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: BITMAP CACHE:
                      Sep 24 05:27:00: [05:27:00.943] vhd_debug: BAT: status: 0x00000000, pbw_blk: 0x0000, pbw_off: 0x00000000, tx: (nil)
                      Sep 24 05:27:01: [05:27:01.910] watchdog_cleared: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: requests were blocked
                      Sep 24 06:03:03: [06:03:03.461] tapdisk_vbd_check_progress: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: pending requests idle for 11 seconds
                      Sep 24 06:03:04: [06:03:04.400] watchdog_cleared: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: requests were blocked
                      Sep 24 06:10:47: [06:10:47.027] tapdisk_vbd_check_progress: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: pending requests idle for 11 seconds
                      Sep 24 06:10:48: [06:10:48.416] watchdog_cleared: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: requests were blocked
                      Sep 24 06:27:16: [06:27:16.740] tapdisk_vbd_check_progress: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: pending requests idle for 100 seconds
                      Sep 24 06:27:18: [06:27:18.465] watchdog_cleared: vhd:/var/run/sr-mount/myZFSraid50/diskVM1.vhd: watchdog timeout: requests were blocked
                      

                      2025-09-24 08_45_26-Backup โ€” Mozilla Firefox.png

                      Finally, when I say massive backups, my hypervisor is at 50% of its occupancy rate.

                      And this weekend, I'm thinking of skipping all the scheduled backup tasks to be on the safe side.

                      I don't know what to think anymore

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates ๐Ÿช Co-Founder CEO
                        last edited by

                        Probably worth trying regular local ext SRs (in mdadm if needed) and see how it behave. KISS principle is very strong in general.

                        Y 1 Reply Last reply Reply Quote 0
                        • Y Offline
                          yann_m @olivierlambert
                          last edited by

                          Unfortunatly this is also my conclusion.

                          This is what I used to do with smaller servers.
                          This is the first time with such a "huge server" + hardware RAID + ZFS with dedup.

                          Thanks
                          Have a nice day

                          1 Reply Last reply Reply Quote 0
                          • olivierlambertO Offline
                            olivierlambert Vates ๐Ÿช Co-Founder CEO
                            last edited by

                            It will be a lot better with future SMAPIv3 drivers (it might include ZFS if we want to officially support it)

                            1 Reply Last reply Reply Quote 0
                            • Y Offline
                              yann_m
                              last edited by

                              Do you think the problem could be due to ZFS deduplication ?

                              I just saw that it's possible to disable deduplication "without restarting" .
                              I think my problem is due to excessive disk writes (adding then deleting a snapshot, then coalescing?)
                              Without deduplication, less disk usage ?

                              Or am I wrong ?

                              1 Reply Last reply Reply Quote 0
                              • olivierlambertO Offline
                                olivierlambert Vates ๐Ÿช Co-Founder CEO
                                last edited by olivierlambert

                                In general, at least the consensus, is to NOT use deduplication with ZFS. It's probably one cause of the issue, yes. It's not worth the RAM needed.

                                1 Reply Last reply Reply Quote 0
                                • Y Offline
                                  yann_m
                                  last edited by

                                  Yes, I should probably create a new post ?
                                  To discuss and "to survive" : how to make my current server usable again

                                  • starting by testing with deduplication turned off.
                                  1 Reply Last reply Reply Quote 0
                                  • olivierlambertO Offline
                                    olivierlambert Vates ๐Ÿช Co-Founder CEO
                                    last edited by

                                    Let's continue there, you can always rename your thread ๐Ÿ™‚

                                    1 Reply Last reply Reply Quote 0
                                    • Y Offline
                                      yann_m
                                      last edited by

                                      Okay, but I can't edit the subject.

                                      1 Reply Last reply Reply Quote 0
                                      • olivierlambertO Offline
                                        olivierlambert Vates ๐Ÿช Co-Founder CEO
                                        last edited by

                                        You can edit your first post, which contains the title

                                        Y 1 Reply Last reply Reply Quote 0
                                        • Y Offline
                                          yann_m @olivierlambert
                                          last edited by

                                          @olivierlambert said in HCP-NG 8.3 freezes and / or reboot on HGR-SAP3 by Ovh:

                                          You can edit your first post, which contains the title

                                          thanks a lot

                                          I'll wait until next week to disable deduplication (never on Friday ๐Ÿ˜‰ )

                                          1 Reply Last reply Reply Quote 1
                                          • First post
                                            Last post