Stuck tasks within XO (rrd_updates)?
-
Hi guys!
I don't know if this is XO-related or XCP-ng related - feel free to move this to the respective sub-board if needed.
I am running XO from its sources (xo-server 5.73.0 and xo-web 5.76.0) as well as XCP-ng with all latest patches applied.
After patching my XCP-ng servers to the latest updates released a few days ago (and having rebooted all servers as well, of course! ) I discover tasks within XO, that seem to be stuck.
In the meanwhile there are already some of these - equal - tasks:As we can see, they are all affecting the same host, which is the pool master as well.
Any clue, why these tasks seem not to be able to finish?
Thanks!
-
Restart the toolstack and report if it persists
-
Hi @olivierlambert,
thanks for your prompt reply.
After doing the restart of the toolstack on the pool master the processes have indeed been gone.I will monitor this and report, if these processes come back after a while without being finished!
Thanks!
-
It did not take that long; right now one of these tasks is visible again with 0% progress and which looks like to be stuck again.
-
Update #2: Now we already have two of these tasks there...
-
Number three is now there...
-
Count of these processes raised up to 9:
-
@tanjix if you run:
xe task-list
on the cli, do you also see them?
Do you know what RRD's are?
You might check your logfiles in /var/log if you see anything suspicious
MM
-
@jedimarcus said in Stuck tasks within XO (rrd_updates)?:
@tanjix if you run:
[xe task-list](link url)on the cli, do you also see them?
I will check that once these tasks are back; I cleaned the task list up by restarting the toolstack again.
@jedimarcus said in Stuck tasks within XO (rrd_updates)?:
Do you know what RRD's are?
Afaik, RRDs is used for any sort of graphs, so I assume any kind of statistics (performance data or similiar?) is tried to be drawn.
-
@jedimarcus said in Stuck tasks within XO (rrd_updates)?:
@tanjix if you run:
xe task-list
on the cli, do you also see them?
Yes, they are also visible in the task-list:
[13:43 vmcluster01 ~]# xe task-list uuid ( RO) : d5f93708-20c2-5e97-057d-c5faf38c3a62 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : de1a6f96-e1c0-b1ee-a753-62dadd9f6a6c name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 76c71fa9-2ce8-9e8a-dbc4-7d119ee8190c name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 8b89dfdd-b16b-8adf-8f80-1b2bd639afbb name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 6d0a8ee3-b2e2-6aa4-8f03-3abfe4504e70 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : ba1709d6-e8bc-8c87-9dff-90118dc38f8f name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 9d3cf6b0-2744-981a-099c-40cb2cb3cbf9 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 903039a0-584c-4a93-ee60-969a6aea1211 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : a30802ca-0652-8a84-6866-8e22f5ad441f name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 42be10a5-c054-0c4f-4f42-a65be0c4599b name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 6ee1df6d-e634-4133-bf4b-6fdc462f6fc2 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 58ff3b74-6d7c-f76a-7f93-184f4e061025 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : db37bd55-b599-c21c-92e5-dff6199f3f93 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : 925c6d8a-9fe7-b338-4ba8-5d0b2396f8d1 name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 uuid ( RO) : b577365f-8e9c-41b9-225f-278f3e12464d name-label ( RO): [XO] Xapi#getResource /rrd_updates name-description ( RO): status ( RO): pending progress ( RO): 0.000 [13:43 vmcluster01 ~]#
Their amount reaised to 19 in the meanwhile...
-
@tanjix Check audit.log and xensource.log in /var/log
If you see any errors there?
-
@jedimarcus said in Stuck tasks within XO (rrd_updates)?:
@tanjix Check audit.log and xensource.log in /var/log
If you see any errors there?
[13:52 vmcluster01 ~]# cat /var/log/audit.log | grep rrd (...) Jan 25 13:27:02 vmcluster01 xapi: [20210125T12:27:02.152Z|audit||252297 INET :::80|handler:http/rrd_updates D:750a7f18dcb2|audit] ('trackid=c62d355204466eab132e8a9db671552d' 'LOCAL_SUPERUSER' '' 'ALLOWED' 'OK' 'HTTP' 'http/rrd_updates' ()) Jan 25 13:32:09 vmcluster01 xapi: [20210125T12:32:09.367Z|audit||253127 INET :::80|handler:http/rrd_updates D:ef6cdab4202a|audit] ('trackid=c62d355204466eab132e8a9db671552d' 'LOCAL_SUPERUSER' '' 'ALLOWED' 'OK' 'HTTP' 'http/rrd_updates' ()) Jan 25 13:37:17 vmcluster01 xapi: [20210125T12:37:17.497Z|audit||254056 INET :::80|handler:http/rrd_updates D:901de089dd50|audit] ('trackid=c62d355204466eab132e8a9db671552d' 'LOCAL_SUPERUSER' '' 'ALLOWED' 'OK' 'HTTP' 'http/rrd_updates' ()) Jan 25 13:42:30 vmcluster01 xapi: [20210125T12:42:30.082Z|audit||254876 INET :::80|handler:http/rrd_updates D:e9be8bc70f19|audit] ('trackid=c62d355204466eab132e8a9db671552d' 'LOCAL_SUPERUSER' '' 'ALLOWED' 'OK' 'HTTP' 'http/rrd_updates' ()) Jan 25 13:47:31 vmcluster01 xapi: [20210125T12:47:31.442Z|audit||255789 INET :::80|handler:http/rrd_updates D:f29b7c671ac1|audit] ('trackid=c62d355204466eab132e8a9db671552d' 'LOCAL_SUPERUSER' '' 'ALLOWED' 'OK' 'HTTP' 'http/rrd_updates' ()) [13:52 vmcluster01 ~]#
Nothing suspicious here.
[13:54 vmcluster01 ~]# cat /var/log/xensource.log | grep rrd (...) Jan 25 13:27:02 vmcluster01 xapi: [debug||252297 INET :::80|Get RRD updates. D:42a316633c15|xapi_services] hand_over_connection GET /rrd_updates to /var/lib/xcp/xcp-rrdd.forwarded Jan 25 13:27:56 vmcluster01 xcp-rrdd: [ info||0 monitor_write|main|rrdd_server] Failed to process plugin metrics file: xcp-rrdd-gpumon ((Invalid_argument\x0A "Cstruct.blit_to_bytes src=[0,0](0) dst=[11] src-off=0 len=11")) Jan 25 13:29:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 644162 Jan 25 13:29:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1372160 Jan 25 13:29:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 727987 Jan 25 13:32:09 vmcluster01 xapi: [debug||253127 INET :::80|Get RRD updates. D:c5389f6a1534|xapi_services] hand_over_connection GET /rrd_updates to /var/lib/xcp/xcp-rrdd.forwarded Jan 25 13:32:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 811402 Jan 25 13:32:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1372160 Jan 25 13:32:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 560736 Jan 25 13:35:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 580752 Jan 25 13:35:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:35:09 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 997217 Jan 25 13:37:17 vmcluster01 xapi: [debug||254056 INET :::80|Get RRD updates. D:3badea6de893|xapi_services] hand_over_connection GET /rrd_updates to /var/lib/xcp/xcp-rrdd.forwarded Jan 25 13:38:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 506708 Jan 25 13:38:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:38:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 1071264 Jan 25 13:41:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 837407 Jan 25 13:41:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:41:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 740562 Jan 25 13:42:30 vmcluster01 xapi: [debug||254876 INET :::80|Get RRD updates. D:5cdf089c4396|xapi_services] hand_over_connection GET /rrd_updates to /var/lib/xcp/xcp-rrdd.forwarded Jan 25 13:44:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 505863 Jan 25 13:44:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:44:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 1072109 Jan 25 13:47:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 505881 Jan 25 13:47:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:47:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 1072083 Jan 25 13:47:31 vmcluster01 xapi: [debug||255789 INET :::80|Get RRD updates. D:49078b4a7579|xapi_services] hand_over_connection GET /rrd_updates to /var/lib/xcp/xcp-rrdd.forwarded Jan 25 13:49:22 vmcluster01 xcp-rrdd: [ info||0 monitor_write|main|rrdd_server] Failed to process plugin metrics file: xcp-rrdd-gpumon ((Invalid_argument\x0A "Cstruct.blit_to_bytes src=[0,0](0) dst=[11] src-off=0 len=11")) Jan 25 13:50:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 641704 Jan 25 13:50:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:50:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 936269 Jan 25 13:52:40 vmcluster01 xapi: [debug||256729 INET :::80|Get RRD updates. D:d0acbc896c0c|xapi_services] hand_over_connection GET /rrd_updates to /var/lib/xcp/xcp-rrdd.forwarded Jan 25 13:53:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC live_words = 505866 Jan 25 13:53:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC heap_words = 1577984 Jan 25 13:53:10 vmcluster01 xcp-rrdd: [ info||7 ||rrdd_main] GC free_words = 1072106
Here we have some more interesting things:
Jan 25 13:49:22 vmcluster01 xcp-rrdd: [ info||0 monitor_write|main|rrdd_server] Failed to process plugin metrics file: xcp-rrdd-gpumon ((Invalid_argument\x0A "Cstruct.blit_to_bytes src=[0,0](0) dst=[11] src-off=0 len=11"))
No idea, what to do here?
-
I have the same entries in my 8.2 installation but I don't have the stuck tasks.
-
@tanjix Have you applied the
ca-certificates
update from the xcp-ng-testing repo? If so, maybe that is the source of your problem. -
@danp said in Stuck tasks within XO (rrd_updates)?:
@tanjix Have you applied the
ca-certificates
update from the xcp-ng-testing repo? If so, maybe that is the source of your problem.Negative, I will do that and report later on. Thanks!
-
@tanjix I would hold off installing it for now. My concern was that the update was the source of the issue, but you ruled that out since it hasn't been installed yet.
-
Maybe I found the problem.
In XO's log section I found an entry like:Hostname/IP does not match certificate's altnames: IP: a.b.c.d is not in the cert's list:
The detailed log says at the bottom:
"code": "ERR_TLS_CERT_ALTNAME_INVALID", "url": "https://a.b.c.d/rrd_updates?cf=AVERAGE&host=true&interval=5&json=true&start=1611607291&session_id=OpaqueRef%3A7601e913-a96c-419e-a5ab-be65255ab3d7&task_id=OpaqueRef%3Aecd831ec-bd4f-4a40-82c4-f15059ffb377", "message": "Hostname/IP does not match certificate's altnames: IP: a.b.c.d is not in the cert's list: ",
a.b.c.d is in this case the ip address of my pool master.
I installed TLS certificates (a wildcard certificate, not self signed but from a trusted authority), so that all of my hosts are reachable ssl-secured (done that through XO --> Home --> Hosts --> <host> --> Advanced and there at the bottom).
Of course, this only works if requests are made with the hostname and not with the ip address.
If I open the URL from above and replace the ip address through the hostname of the pool master, then it looks like it works.
So, how can this be fixed that these rrd_things make their calls with the hostname instead of the ip address?
Or did I do anything wrong with the certificates?Thanks!
-
@olivierlambert is there any update on this issue available?
Thanks!
-
This is a normal XAPI behavior. I don't think there's a simple fix for that.
-
@olivierlambert So, how could I get rid of these error messages and the fact, that I have plenty of these tasks running after a while?
I guess, restarting the toolstack every hour is not the preferred way...Thanks!