XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Full backup - VM failed in first attempt but successful in second attempt

    Scheduled Pinned Locked Moved Backup
    23 Posts 3 Posters 1.5k Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • hbognerH Offline
      hbogner @hbogner
      last edited by

      With zstd compression backup fails on the first run, lasts 11 hours, Transfer: 1.81 TiB.

      With no compression backup runs ok in the first run, lasts 15 hours, Transfer: 5.12 TiB.

      I'd like to keep it compressed in the long run 🙂

      Upgraded to latest: "Xen Orchestra(commit 74e6f), Master(commit 74e6f), Your Xen Orchestra is up to date"

      Started new compressed backup and it failed again:

      root@backup:~# grep "xo-server" /var/log/syslog
      2024-09-13T09:24:17.518727+02:00 backup xo-server[243615]: 2024-09-13T07:24:17.516Z xo:backups:worker INFO starting backup
      2024-09-13T11:11:56.199073+02:00 backup xo-server[243615]: 2024-09-13T09:11:56.195Z xo:backups:worker WARN possibly unhandled rejection {
      2024-09-13T11:11:56.199261+02:00 backup xo-server[243615]:   error: BodyTimeoutError: Body Timeout Error
      2024-09-13T11:11:56.199310+02:00 backup xo-server[243615]:       at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28)
      2024-09-13T11:11:56.199337+02:00 backup xo-server[243615]:       at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13)
      2024-09-13T11:11:56.199362+02:00 backup xo-server[243615]:       at listOnTimeout (node:internal/timers:581:17)
      2024-09-13T11:11:56.199387+02:00 backup xo-server[243615]:       at process.processTimers (node:internal/timers:519:7) {
      2024-09-13T11:11:56.199408+02:00 backup xo-server[243615]:     code: 'UND_ERR_BODY_TIMEOUT'
      2024-09-13T11:11:56.199428+02:00 backup xo-server[243615]:   }
      2024-09-13T11:11:56.199450+02:00 backup xo-server[243615]: }
      2024-09-13T11:11:57.748715+02:00 backup xo-server[243615]: 2024-09-13T09:11:57.745Z xo:backups:AbstractVmRunner WARN writer step failed {
      2024-09-13T11:11:57.748812+02:00 backup xo-server[243615]:   error: BodyTimeoutError: Body Timeout Error
      2024-09-13T11:11:57.748837+02:00 backup xo-server[243615]:       at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28)
      2024-09-13T11:11:57.748858+02:00 backup xo-server[243615]:       at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13)
      2024-09-13T11:11:57.748878+02:00 backup xo-server[243615]:       at listOnTimeout (node:internal/timers:581:17)
      2024-09-13T11:11:57.748898+02:00 backup xo-server[243615]:       at process.processTimers (node:internal/timers:519:7) {
      2024-09-13T11:11:57.748917+02:00 backup xo-server[243615]:     code: 'UND_ERR_BODY_TIMEOUT'
      2024-09-13T11:11:57.748937+02:00 backup xo-server[243615]:   },
      2024-09-13T11:11:57.748961+02:00 backup xo-server[243615]:   step: 'writer.run()',
      2024-09-13T11:11:57.748981+02:00 backup xo-server[243615]:   writer: 'FullRemoteWriter'
      2024-09-13T11:11:57.749000+02:00 backup xo-server[243615]: }
      
      1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        I think the issue might not be XO but XCP-ng with zstd.

        hbognerH 1 Reply Last reply Reply Quote 0
        • hbognerH Offline
          hbogner @olivierlambert
          last edited by

          @olivierlambert you may be right, while digging thru logs on xcpng I found following in xensource.log:

          Sep 13 11:18:22 clientxcpng xapi: [ warn||4289975 HTTPS <--IP ADRESS OF XO SERVER->:::80|[XO] VM export R:cd1a9a8d151f|pervasiveext] finally: Error while running cleanup after failure of main function: (Failure "Compression via zcat failed: exit code 25")
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] [XO] VM export R:cd1a9a8d151f failed with exception Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Raised Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 1/18 xapi Raised at file ocaml/xapi/stream_vdi.ml, line 129
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 2/18 xapi Called from file ocaml/xapi/stream_vdi.ml, line 263
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 3/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 4/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 5/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 6/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 7/18 xapi Called from file ocaml/xapi/stream_vdi.ml, line 240
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 8/18 xapi Called from file list.ml, line 110
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 9/18 xapi Called from file ocaml/xapi/export.ml, line 653
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 10/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 11/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 12/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 13/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 14/18 xapi Called from file ocaml/xapi/server_helpers.ml, line 92
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 15/18 xapi Called from file ocaml/xapi/server_helpers.ml, line 113
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 16/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 17/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 18/18 xapi Called from file lib/backtrace.ml, line 177
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace]
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] VM.export D:1c0a6106f079 failed with exception Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Raised Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 1/1 xapi Raised at file (Thread 4289975 has no backtrace table. Was with_backtraces called?, line 0
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace]
          Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||xapi_http] Unhandled Api_errors.Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
          

          Is this any help?

          1 Reply Last reply Reply Quote 0
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by olivierlambert

            @hbogner said in Full backup - VM failed in first attempt but successful in second attempt:

            Compression via zcat failed: exit code 25

            Yes, it is useful. I wonder if you have enough RAM or disk space in your Dom0 🤔

            1 Reply Last reply Reply Quote 0
            • hbognerH Offline
              hbogner @hbogner
              last edited by

              @hbogner said in Full backup - VM failed in first attempt but successful in second attempt:

              @olivierlambert Dom0 memory is set to 12GiB of 256GiB available on the host.

              12GB should be enough for 9 VM's.

              Disk should also be ok:

              Filesystem          Size  Used Avail Use% Mounted on
              devtmpfs            5.9G   32K  5.9G   1% /dev
              tmpfs               5.9G  532K  5.9G   1% /dev/shm
              tmpfs               5.9G   11M  5.9G   1% /run
              tmpfs               5.9G     0  5.9G   0% /sys/fs/cgroup
              /dev/md127p1         18G  2.1G   15G  13% /
              xenstore            5.9G     0  5.9G   0% /var/lib/xenstored
              /dev/md127p5        3.9G  337M  3.3G  10% /var/log
              tmpfs               1.2G     0  1.2G   0% /run/user/0
              

              This is a Windows VM, with separate 100GB disk just for pagefile, could that be an issue while compressing?

              1 Reply Last reply Reply Quote 0
              • olivierlambertO Offline
                olivierlambert Vates 🪐 Co-Founder CEO
                last edited by

                I don't know but I'm suspecting something going bad with ZSTD compression. Can you monitor your disk space during the export?

                hbognerH 2 Replies Last reply Reply Quote 0
                • hbognerH Offline
                  hbogner @olivierlambert
                  last edited by

                  @olivierlambert Tested it just now, no disk usage change on host(sample interval was 1 minute).
                  8 VM's finish ok, just this 1 VM fails on first run.

                  1 Reply Last reply Reply Quote 0
                  • hbognerH Offline
                    hbogner @olivierlambert
                    last edited by

                    @olivierlambert for now I "fixed" the problem by isolating this VM into separate backup job which doesn't overlap with any other backups and it finishes without error.

                    We'll try it again(smart mode VM group select) in a few months to see if new updates change anything.

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Even with compression? So it could be the dom0 overloaded with too many zstd pipes or something, interesting feedback.

                      hbognerH 1 Reply Last reply Reply Quote 0
                      • hbognerH Offline
                        hbogner @olivierlambert
                        last edited by

                        @olivierlambert yes, with compression.

                        In other 8 VMs there are larger VMs(larger diskd, more disks, more ram, more cpu), so export has higher load than this one VM, but only this one fails if it runs with concurrency more than 1.

                        Compression is a requirement for this situation because backups are synced to remote location over internet.

                        1 Reply Last reply Reply Quote 0
                        • olivierlambertO Offline
                          olivierlambert Vates 🪐 Co-Founder CEO
                          last edited by

                          I understand your requirements, I'm trying to pinpoint the issue. For now, everything converge on dom0 load when having many zstd processes at once.

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post