sr iso disconnect and crashed my hosts
-
I have 50 hosts using xcp-ng and all of these hosts are using an smb/cifs sr to access the isos for new installations when necessary. 2 days ago all of the servers lost access to the repository and the sriso was locked, and I was unable to create, turn off or turn on any VM on any host, and I was unable to detach them either through xcng-center or cli. So I did a umount -l /run/sr-mount/UUID and then ran xe-toolstack-restart and then I was able to detach them and create, turn off and turn on the VMs. What is happening now that the CPU load is very high and I don't want to have to restart my 50 servers? Is there anything I can do to avoid having to turn them off?
-
Hi,
If you use a remote storage for VM disks or ISO, you must be sure the connection is stable. Otherwise, having issues is rather normal. It's like physically cut a SATA cable to your DVD/CD drive. You need to understand why you lost the connection to your SMB/CIFS share in the first place.
-
ok olivierlambert
anything i can do about the hosts as they are all experiencing high cpu consumption after this problem. -
Even after normalizing the processes, they still exist and I can't kill them.
3874 ? Ds 0:00 /usr/bin/python /opt/xensource/sm/ISOSR <methodCall><methodName>sr_scan</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:e64c8226-4c5d-431b-b7d7-e47b0d657348</value></member><member><name>command</name><value>sr_scan</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct><member><name>SRmaster</name><value>true</value></member><member><name>username</name><value>fmcudi\luiz.avelino</value></member><member><name>vers</name><value>3.0</value></member><member><name>cifspassword_secret</name><value>83ff228e-4d7c-12d5-e685-fae07066b3ac</value></member><member><name>iso_path</name><value>/ISO</value></member><member><name>location</name><value>//10.40.2.235/repositorio</value></member><member><name>type</name><value>cifs</value></member></struct></value></member><member><name>session_ref</name><value>OpaqueRef:77a8540e-8149-4889-8148-ff5ea9a2d7f1</value></member><member><name>sr_ref</name><value>OpaqueRef:72cda44d-eccb-4b11-8400-1cc9ad6e400b</value></member><member><name>sr_uuid</name><value>6a09db76-744c-4123-18ef-7e423d0bcad6</value></member><member><name>subtask_of</name><value>DummyRef:|32004b14-6e20-4a66-bfa5-f2dd6b3472d9|SR.scan</value></member></struct></value></param></params></methodCall> 14091 ? D 0:00 \_ df -h 14722 ? D 0:00 \_ mount -o remount /run/sr-mount/6a09db76-744c-4123-18ef-7e423d0bcad6 22275 ? Ds 0:00 /usr/bin/python /opt/xensource/sm/ISOSR <methodCall><methodName>sr_scan</methodName><params><param><value><struct><member><name>host_ref</name><value>OpaqueRef:e64c8226-4c5d-431b-b7d7-e47b0d657348</value></member><member><name>command</name><value>sr_scan</value></member><member><name>args</name><value><array><data/></array></value></member><member><name>device_config</name><value><struct><member><name>SRmaster</name><value>true</value></member><member><name>username</name><value>fmcudi\luiz.avelino</value></member><member><name>vers</name><value>3.0</value></member><member><name>cifspassword_secret</name><value>83ff228e-4d7c-12d5-e685-fae07066b3ac</value></member><member><name>iso_path</name><value>/ISO</value></member><member><name>location</name><value>//10.40.2.235/repositorio</value></member><member><name>type</name><value>cifs</value></member></struct></value></member><member><name>session_ref</name><value>OpaqueRef:5bf06125-0518-43cc-99dc-8d0c235a0e84</value></member><member><name>sr_ref</name><value>OpaqueRef:72cda44d-eccb-4b11-8400-1cc9ad6e400b</value></member><member><name>sr_uuid</name><value>6a09db76-744c-4123-18ef-7e423d0bcad6</value></member><member><name>subtask_of</name><value>DummyRef:|73124cc8-678a-406d-8bca-4a22ad1178a4|SR.scan</value></member></struct></value></param></params></methodCall> 25031 ? D 0:00 \_ mount.cifs //10.40.2.235/repositorio /var/run/sr-mount/6a09db76-744c-4123-18ef-7e423d0bcad6 -o cache=none,vers=3.0,domain=fmcudi 27964 ? D 0:00 \_ mount.cifs //10.40.2.235/repositorio /var/run/sr-mount/6a09db76-744c-4123-18ef-7e423d0bcad6 -o cache=none,vers=3.0,domain=fmcudi
-
Double check your hosts are fully up to date (which version? you haven't provided many useful information on your environment in your first post
)
-
are all on XCP-ng 8.2 version and are not up to date.
-
You should really start by getting all your hosts up to date first, reboot after updates and see if it happens again, while trying to fix your connectivity problem to your network shares.
-
the connection problem is already fixed, I didn't want to have to update or restart the hosts at this time.
-
Then reboot in the next maintenance window
-
The connectivity problem with sr iso was fixed, but the server load was a little high and also several sleep processes.
top - 12:29:57 up 273 days, 13:48, 2 users, load average: 166.86, 166.80, 166.56 Tasks: 763 total, 2 running, 618 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.1 us, 1.2 sy, 0.0 ni, 96.1 id, 0.2 wa, 0.0 hi, 0.1 si, 1.2 st KiB Mem : 8110880 total, 6877372 free, 783152 used, 450356 buff/cache 1 22275 root Ds 0.0 ISOSR - 22691 22701 root Ds 0.0 sadc smb2_reconnect 23099 23104 root Ds 0.0 sadc smb2_reconnect 23232 23243 root Ds 0.0 sadc smb2_reconnect 23259 23264 root Ds 0.0 sadc smb2_reconnect 23284 23292 root Ds 0.0 sadc smb2_reconnect 23381 23386 root Ds 0.0 sadc smb2_reconnect 23977 23983 root Ds 0.0 sadc smb2_reconnect 24062 24074 root Ds 0.0 sadc smb2_reconnect 24065 24077 root Ds 0.0 sadc smb2_reconnect 24089 24094 root Ds 0.0 sadc smb2_reconnect 24120 24125 root Ds 0.0 sadc smb2_reconnect 24792 24797 root Ds 0.0 sadc smb2_reconnect 24959 24970 root Ds 0.0 sadc smb2_reconnect 25018 25028 root Ds 0.0 sadc smb2_reconnect 1 25031 root D 0.0 mount.cifs cifs_get_smb_ses 25195 25200 root Ds 0.0 sadc smb2_reconnect 25279 25289 root Ds 0.0 sadc smb2_reconnect 25742 25747 root Ds 0.0 sadc smb2_reconnect 26013 26018 root Ds 0.0 sadc smb2_reconnect 26137 26142 root Ds 0.0 sadc smb2_reconnect 26266 26276 root Ds 0.0 sadc smb2_reconnect 26494 26505 root Ds 0.0 sadc smb2_reconnect 26519 26524 root Ds 0.0 sadc smb2_reconnect 26975 26981 root Ds 0.0 sadc smb2_reconnect 27006 27014 root Ds 0.0 sadc smb2_reconnect 27636 27641 root Ds 0.0 sadc smb2_reconnect 1 27964 root D 0.0 mount.cifs cifs_get_smb_ses 27966 27983 root Ds 0.0 sadc smb2_reconnect 28127 28139 root Ds 0.0 sadc smb2_reconnect 28138 28143 root Ds 0.0 sadc smb2_reconnect 28182 28187 root Ds 0.0 sadc smb2_reconnect 28339 28350 root Ds 0.0 sadc smb2_reconnect 28937 28942 root Ds 0.0 sadc smb2_reconnect 29031 29036 root Ds 0.0 sadc smb2_reconnect
-
I already suggested you the solution, now it's up to you to live with those process or to decide to reboot
(ideally after doing updates because it's very dangerous to NOT being up to date)