XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. hbogner
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 1
    • Posts 12
    • Groups 0

    hbogner

    @hbogner

    0
    Reputation
    7
    Profile views
    12
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    hbogner Unfollow Follow

    Latest posts made by hbogner

    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @olivierlambert yes, with compression.

      In other 8 VMs there are larger VMs(larger diskd, more disks, more ram, more cpu), so export has higher load than this one VM, but only this one fails if it runs with concurrency more than 1.

      Compression is a requirement for this situation because backups are synced to remote location over internet.

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @olivierlambert for now I "fixed" the problem by isolating this VM into separate backup job which doesn't overlap with any other backups and it finishes without error.

      We'll try it again(smart mode VM group select) in a few months to see if new updates change anything.

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @olivierlambert Tested it just now, no disk usage change on host(sample interval was 1 minute).
      8 VM's finish ok, just this 1 VM fails on first run.

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @hbogner said in Full backup - VM failed in first attempt but successful in second attempt:

      @olivierlambert Dom0 memory is set to 12GiB of 256GiB available on the host.

      12GB should be enough for 9 VM's.

      Disk should also be ok:

      Filesystem          Size  Used Avail Use% Mounted on
      devtmpfs            5.9G   32K  5.9G   1% /dev
      tmpfs               5.9G  532K  5.9G   1% /dev/shm
      tmpfs               5.9G   11M  5.9G   1% /run
      tmpfs               5.9G     0  5.9G   0% /sys/fs/cgroup
      /dev/md127p1         18G  2.1G   15G  13% /
      xenstore            5.9G     0  5.9G   0% /var/lib/xenstored
      /dev/md127p5        3.9G  337M  3.3G  10% /var/log
      tmpfs               1.2G     0  1.2G   0% /run/user/0
      

      This is a Windows VM, with separate 100GB disk just for pagefile, could that be an issue while compressing?

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @olivierlambert you may be right, while digging thru logs on xcpng I found following in xensource.log:

      Sep 13 11:18:22 clientxcpng xapi: [ warn||4289975 HTTPS <--IP ADRESS OF XO SERVER->:::80|[XO] VM export R:cd1a9a8d151f|pervasiveext] finally: Error while running cleanup after failure of main function: (Failure "Compression via zcat failed: exit code 25")
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] [XO] VM export R:cd1a9a8d151f failed with exception Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Raised Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 1/18 xapi Raised at file ocaml/xapi/stream_vdi.ml, line 129
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 2/18 xapi Called from file ocaml/xapi/stream_vdi.ml, line 263
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 3/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 4/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 5/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 6/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 7/18 xapi Called from file ocaml/xapi/stream_vdi.ml, line 240
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 8/18 xapi Called from file list.ml, line 110
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 9/18 xapi Called from file ocaml/xapi/export.ml, line 653
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 10/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 11/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 12/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 13/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 14/18 xapi Called from file ocaml/xapi/server_helpers.ml, line 92
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 15/18 xapi Called from file ocaml/xapi/server_helpers.ml, line 113
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 16/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 17/18 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 35
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 18/18 xapi Called from file lib/backtrace.ml, line 177
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace]
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] VM.export D:1c0a6106f079 failed with exception Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] Raised Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace] 1/1 xapi Raised at file (Thread 4289975 has no backtrace table. Was with_backtraces called?, line 0
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||backtrace]
      Sep 13 11:18:22 clientxcpng xapi: [error||4289975 :::80||xapi_http] Unhandled Api_errors.Server_error(CLIENT_ERROR, [ INTERNAL_ERROR: [ Unix.Unix_error(Unix.EPIPE, "write", "") ] ])
      

      Is this any help?

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      With zstd compression backup fails on the first run, lasts 11 hours, Transfer: 1.81 TiB.

      With no compression backup runs ok in the first run, lasts 15 hours, Transfer: 5.12 TiB.

      I'd like to keep it compressed in the long run 🙂

      Upgraded to latest: "Xen Orchestra(commit 74e6f), Master(commit 74e6f), Your Xen Orchestra is up to date"

      Started new compressed backup and it failed again:

      root@backup:~# grep "xo-server" /var/log/syslog
      2024-09-13T09:24:17.518727+02:00 backup xo-server[243615]: 2024-09-13T07:24:17.516Z xo:backups:worker INFO starting backup
      2024-09-13T11:11:56.199073+02:00 backup xo-server[243615]: 2024-09-13T09:11:56.195Z xo:backups:worker WARN possibly unhandled rejection {
      2024-09-13T11:11:56.199261+02:00 backup xo-server[243615]:   error: BodyTimeoutError: Body Timeout Error
      2024-09-13T11:11:56.199310+02:00 backup xo-server[243615]:       at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28)
      2024-09-13T11:11:56.199337+02:00 backup xo-server[243615]:       at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13)
      2024-09-13T11:11:56.199362+02:00 backup xo-server[243615]:       at listOnTimeout (node:internal/timers:581:17)
      2024-09-13T11:11:56.199387+02:00 backup xo-server[243615]:       at process.processTimers (node:internal/timers:519:7) {
      2024-09-13T11:11:56.199408+02:00 backup xo-server[243615]:     code: 'UND_ERR_BODY_TIMEOUT'
      2024-09-13T11:11:56.199428+02:00 backup xo-server[243615]:   }
      2024-09-13T11:11:56.199450+02:00 backup xo-server[243615]: }
      2024-09-13T11:11:57.748715+02:00 backup xo-server[243615]: 2024-09-13T09:11:57.745Z xo:backups:AbstractVmRunner WARN writer step failed {
      2024-09-13T11:11:57.748812+02:00 backup xo-server[243615]:   error: BodyTimeoutError: Body Timeout Error
      2024-09-13T11:11:57.748837+02:00 backup xo-server[243615]:       at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28)
      2024-09-13T11:11:57.748858+02:00 backup xo-server[243615]:       at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13)
      2024-09-13T11:11:57.748878+02:00 backup xo-server[243615]:       at listOnTimeout (node:internal/timers:581:17)
      2024-09-13T11:11:57.748898+02:00 backup xo-server[243615]:       at process.processTimers (node:internal/timers:519:7) {
      2024-09-13T11:11:57.748917+02:00 backup xo-server[243615]:     code: 'UND_ERR_BODY_TIMEOUT'
      2024-09-13T11:11:57.748937+02:00 backup xo-server[243615]:   },
      2024-09-13T11:11:57.748961+02:00 backup xo-server[243615]:   step: 'writer.run()',
      2024-09-13T11:11:57.748981+02:00 backup xo-server[243615]:   writer: 'FullRemoteWriter'
      2024-09-13T11:11:57.749000+02:00 backup xo-server[243615]: }
      
      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @olivierlambert said in Full backup - VM failed in first attempt but successful in second attempt:

      Also check your Dom0 memory, I have the feeling the zstd compression is failing and leaving the export opened until XO timeouts

      When I disable compression backup is successful in the first run, but it lasts longer. This VM is finished, but larger ones are still at it, they'll finish tomorrow, estimated by XO in tasks menu.

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @rtjdamen Only XCP-ng 8.2

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      @olivierlambert Dom0 memory is set to 12GiB of 256GiB available on the host.

      Failed/successful again yesterday:

      2024-09-08T12:15:00.729026+02:00 backup xo-server[186436]: 2024-09-08T10:15:00.726Z xo:backups:worker INFO starting backup
      2024-09-08T13:24:53.065460+02:00 backup xo-server[186436]: 2024-09-08T11:24:53.061Z xo:backups:worker WARN possibly unhandled rejection {
      2024-09-08T13:24:53.065709+02:00 backup xo-server[186436]:   error: BodyTimeoutError: Body Timeout Error
      2024-09-08T13:24:53.065746+02:00 backup xo-server[186436]:       at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28)
      2024-09-08T13:24:53.065775+02:00 backup xo-server[186436]:       at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13)
      2024-09-08T13:24:53.065799+02:00 backup xo-server[186436]:       at listOnTimeout (node:internal/timers:581:17)
      2024-09-08T13:24:53.065822+02:00 backup xo-server[186436]:       at process.processTimers (node:internal/timers:519:7) {
      2024-09-08T13:24:53.065843+02:00 backup xo-server[186436]:     code: 'UND_ERR_BODY_TIMEOUT'
      2024-09-08T13:24:53.065864+02:00 backup xo-server[186436]:   }
      2024-09-08T13:24:53.065894+02:00 backup xo-server[186436]: }
      2024-09-08T13:24:53.612370+02:00 backup xo-server[186436]: 2024-09-08T11:24:53.609Z xo:backups:AbstractVmRunner WARN writer step failed {
      2024-09-08T13:24:53.612468+02:00 backup xo-server[186436]:   error: BodyTimeoutError: Body Timeout Error
      2024-09-08T13:24:53.612494+02:00 backup xo-server[186436]:       at Timeout.onParserTimeout [as callback] (/opt/xen-orchestra/node_modules/undici/lib/dispatcher/client-h1.js:626:28)
      2024-09-08T13:24:53.612516+02:00 backup xo-server[186436]:       at Timeout.onTimeout [as _onTimeout] (/opt/xen-orchestra/node_modules/undici/lib/util/timers.js:22:13)
      2024-09-08T13:24:53.612538+02:00 backup xo-server[186436]:       at listOnTimeout (node:internal/timers:581:17)
      2024-09-08T13:24:53.612558+02:00 backup xo-server[186436]:       at process.processTimers (node:internal/timers:519:7) {
      2024-09-08T13:24:53.612578+02:00 backup xo-server[186436]:     code: 'UND_ERR_BODY_TIMEOUT'
      2024-09-08T13:24:53.612599+02:00 backup xo-server[186436]:   },
      2024-09-08T13:24:53.612630+02:00 backup xo-server[186436]:   step: 'writer.run()',
      2024-09-08T13:24:53.612658+02:00 backup xo-server[186436]:   writer: 'FullRemoteWriter'
      2024-09-08T13:24:53.612681+02:00 backup xo-server[186436]: }
      2024-09-08T22:53:59.126305+02:00 backup xo-server[186436]: 2024-09-08T20:53:59.125Z xo:backups:worker INFO backup has ended
      2024-09-08T22:53:59.272277+02:00 backup xo-server[186436]: 2024-09-08T20:53:59.270Z xo:backups:worker INFO process will exit {
      2024-09-08T22:53:59.272333+02:00 backup xo-server[186436]:   duration: 38338542920,
      2024-09-08T22:53:59.272372+02:00 backup xo-server[186436]:   exitCode: 0,
      2024-09-08T22:53:59.272403+02:00 backup xo-server[186436]:   resourceUsage: {
      2024-09-08T22:53:59.272424+02:00 backup xo-server[186436]:     userCPUTime: 34156788569,
      2024-09-08T22:53:59.272449+02:00 backup xo-server[186436]:     systemCPUTime: 4578502065,
      2024-09-08T22:53:59.272476+02:00 backup xo-server[186436]:     maxRSS: 73808,
      2024-09-08T22:53:59.272500+02:00 backup xo-server[186436]:     sharedMemorySize: 0,
      2024-09-08T22:53:59.272524+02:00 backup xo-server[186436]:     unsharedDataSize: 0,
      2024-09-08T22:53:59.272545+02:00 backup xo-server[186436]:     unsharedStackSize: 0,
      2024-09-08T22:53:59.272565+02:00 backup xo-server[186436]:     minorPageFault: 2821906,
      2024-09-08T22:53:59.272595+02:00 backup xo-server[186436]:     majorPageFault: 0,
      2024-09-08T22:53:59.272622+02:00 backup xo-server[186436]:     swappedOut: 0,
      2024-09-08T22:53:59.272643+02:00 backup xo-server[186436]:     fsRead: 21304,
      2024-09-08T22:53:59.272674+02:00 backup xo-server[186436]:     fsWrite: 3902768784,
      2024-09-08T22:53:59.272702+02:00 backup xo-server[186436]:     ipcSent: 0,
      2024-09-08T22:53:59.272723+02:00 backup xo-server[186436]:     ipcReceived: 0,
      2024-09-08T22:53:59.272743+02:00 backup xo-server[186436]:     signalsCount: 0,
      2024-09-08T22:53:59.272773+02:00 backup xo-server[186436]:     voluntaryContextSwitches: 111664073,
      2024-09-08T22:53:59.272802+02:00 backup xo-server[186436]:     involuntaryContextSwitches: 65186
      2024-09-08T22:53:59.272829+02:00 backup xo-server[186436]:   },
      2024-09-08T22:53:59.272859+02:00 backup xo-server[186436]:   summary: { duration: '11h', cpuUsage: '101%', memoryUsage: '72.08 MiB' }
      2024-09-08T22:53:59.272884+02:00 backup xo-server[186436]: }
      

      I'll now try with concurrency 3, to see how it acts with less load.

      posted in Backup
      hbognerH
      hbogner
    • RE: Full backup - VM failed in first attempt but successful in second attempt

      Full report after first fail and second success:

       Retry the VM backup due to an error
      
          attempt
          1
          error
          "Body Timeout Error"
      
          Snapshot
          Start: 2024-09-02 14:06
          End: 2024-09-02 14:08
          full
              transfer
              Start: 2024-09-02 14:08
              End: 2024-09-02 15:16
              Duration: an hour
              Error: Body Timeout Error
          Start: 2024-09-02 14:08
          End: 2024-09-02 15:16
          Duration: an hour
          Error: Body Timeout Error
          Snapshot
          Start: 2024-09-03 00:07
          End: 2024-09-03 00:08
          full
              transfer
              Start: 2024-09-03 00:08
              End: 2024-09-03 01:17
              Duration: an hour
              Size: 17.62 GiB
              Speed: 4.34 MiB/s
          Start: 2024-09-03 00:08
          End: 2024-09-03 01:17
          Duration: an hour
      
      Start: 2024-09-02 14:06
      End: 2024-09-03 01:17
      Duration: 11 hours
      Type: full
      

      This is the rest of syslog regarding full backup, delta backup job has started and finished while this full backup was running.

      root@backup:~# grep "xo-server" /var/log/syslog
      2024-09-03T01:17:34.085428+02:00 backup xo-server[152409]: 2024-09-02T23:17:34.084Z xo:backups:worker INFO backup has ended
      2024-09-03T01:17:34.242886+02:00 backup xo-server[152409]: 2024-09-02T23:17:34.240Z xo:backups:worker INFO process will exit {
      2024-09-03T01:17:34.242985+02:00 backup xo-server[152409]:   duration: 40292717752,
      2024-09-03T01:17:34.243018+02:00 backup xo-server[152409]:   exitCode: 0,
      2024-09-03T01:17:34.243045+02:00 backup xo-server[152409]:   resourceUsage: {
      2024-09-03T01:17:34.243067+02:00 backup xo-server[152409]:     userCPUTime: 34830207909,
      2024-09-03T01:17:34.243098+02:00 backup xo-server[152409]:     systemCPUTime: 5032889601,
      2024-09-03T01:17:34.243121+02:00 backup xo-server[152409]:     maxRSS: 75204,
      2024-09-03T01:17:34.243155+02:00 backup xo-server[152409]:     sharedMemorySize: 0,
      2024-09-03T01:17:34.243180+02:00 backup xo-server[152409]:     unsharedDataSize: 0,
      2024-09-03T01:17:34.243200+02:00 backup xo-server[152409]:     unsharedStackSize: 0,
      2024-09-03T01:17:34.243220+02:00 backup xo-server[152409]:     minorPageFault: 3320295,
      2024-09-03T01:17:34.243243+02:00 backup xo-server[152409]:     majorPageFault: 0,
      2024-09-03T01:17:34.243266+02:00 backup xo-server[152409]:     swappedOut: 0,
      2024-09-03T01:17:34.243286+02:00 backup xo-server[152409]:     fsRead: 4144,
      2024-09-03T01:17:34.243308+02:00 backup xo-server[152409]:     fsWrite: 3906893520,
      2024-09-03T01:17:34.243335+02:00 backup xo-server[152409]:     ipcSent: 0,
      2024-09-03T01:17:34.243357+02:00 backup xo-server[152409]:     ipcReceived: 0,
      2024-09-03T01:17:34.243377+02:00 backup xo-server[152409]:     signalsCount: 0,
      2024-09-03T01:17:34.243396+02:00 backup xo-server[152409]:     voluntaryContextSwitches: 113572473,
      2024-09-03T01:17:34.243424+02:00 backup xo-server[152409]:     involuntaryContextSwitches: 80368
      2024-09-03T01:17:34.243451+02:00 backup xo-server[152409]:   },
      2024-09-03T01:17:34.243477+02:00 backup xo-server[152409]:   summary: { duration: '11h', cpuUsage: '99%', memoryUsage: '73.44 MiB' }
      2024-09-03T01:17:34.243501+02:00 backup xo-server[152409]: }
      

      @olivierlambert yes, it is full backup using zstd. Not using compression is problematic because it's a lot of data, but I will test it.

      @rtjdamen There were no VDI's attached to control domain.

      Difference between XO before upgrade and the latest XO is error message in web interface:

      Now: "Body Timeout Error"
      Before: "VDI must be free or attached to exactly one VM"

      Before previous upgrade ago it was: "Body Timeout Error"

      But the syslog error references to the same lines.

      PS. This VM in not the largest one nor the smallest one, but just this one fails, the rest are OK.

      posted in Backup
      hbognerH
      hbogner