Subcategories

  • All Xen related stuff

    582 Topics
    6k Posts
    marcoiM
    is there anyway to block igpu from being used by xcp-ng during boot? I have it setup for pass through but it fails under the VM. I think it because xcp-ng still displaying the console screen and does not want to give it up.
  • The integrated web UI to manage XCP-ng

    23 Topics
    331 Posts
    P
    @coolsport00 Here is a hint in xo 5 when the newly created vm is running. [image: 1756487404025-c5c5e02c-937d-47ff-a257-b5aaddba23de-image.png]
  • Section dedicated to migrations from VMWare, HyperV, Proxmox etc. to XCP-ng

    102 Topics
    1k Posts
    sidS
    @cichy I know this isn't as easy as what you're asking for, but I wrote some terrible python code. It relies on health checks being defined as VM tags, or at least the management agent being detected. For example in my terraform code I have these tags on a test postgres instance and test nginx instances respectively: # postgres tags = [ "bootOrder/agent-detect-timeout=45", "bootOrder/ip=${jsonencode("auto")}", "bootOrder/healtcheck/tcp=${jsonencode({ "port" : 5432, })}", ] # nginx tags = [ "bootOrder/agent-detect-timeout=45", "bootOrder/ip=${jsonencode("auto")}", "bootOrder/healtcheck/http=${jsonencode({ "port" : 80, "scheme" : "http", "path" : "/" })}", ] Then the actual python: #!/usr/bin/env python3 import urllib3 import json import os import sys import socket import time import logging logging.basicConfig(level=logging.INFO) BOOT_ORDER = [ # Postgres ["55e88cb4-0c50-8384-2149-cf73e40b8c8e"], # nginx ["ba620f01-69d1-ddd8-b1d4-c256abe07e05", "bbe333bd-380a-1f94-4052-881c763b6177"], ] DEFAULT_AGENT_DETECT_TIMEOUT_SECONDS = 60 class HealthCheck: def __init__(self, target: str, config: dict) -> None: self.type = "base" self.target = target self.config = config self.timeout = 3 self.retry_max_count = 5 self.retry_cur_count = 0 self.retry_sleep = 10 def _retry(self): if self.retry_cur_count == 0: logging.info("Starting %s healtcheck against %s", self.type, self.target) self.retry_cur_count += 1 return True if self.retry_cur_count == self.retry_max_count: logging.warning('Failed Healtcheck of type %s for %s', self.type, self.target) return False time.sleep(self.retry_sleep) self.retry_cur_count += 1 return True class TCPHealthCheck(HealthCheck): def __init__(self, **kwargs): super().__init__(**kwargs) self.type = "TCP" def run(self): port = self.config.get("port") while self._retry(): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: sock.settimeout(self.timeout) success = sock.connect_ex((self.target, port)) == 0 if success: return True return False class HttpHealthCheck(HealthCheck): def __init__(self, **kwargs): super().__init__(**kwargs) self.type = "HTTP" def run(self): while self._retry(): assert_hostname = self.config.get("tls_verification", True) http = urllib3.PoolManager( cert_reqs="CERT_REQUIRED" if assert_hostname else "CERT_NONE", ) scheme = self.config.get("scheme", "http") port = self.config.get("port", 80) path = self.config.get("path", "").lstrip("/") url = f"{scheme}://{self.target}:{port}/{path}" response = http.request('GET', url, timeout=self.timeout) if response.status >= 200 and response.status < 300: return True return False class XoaClient: def __init__(self, base_url: str, token: str) -> None: self.base_url = base_url.rstrip("/") self.tags_prefix = "bootOrder/" self.token = token self.http = urllib3.PoolManager() self.headers = { "Content-Type": "application/json", "Cookie": f"token={self.token}", } self._vm_cache = {} def vm_ip(self, uuid): vm_tags = self._extract_vm_tags(uuid) ip = vm_tags.get("ip", "auto") if ip != "auto": return ip return self._get_vm(uuid).get("mainIpAddress") def vm_healthcheck(self, uuid): vm_tags = self._extract_vm_tags(uuid) tcp = vm_tags.get("healtcheck/tcp") http = vm_tags.get("healtcheck/http") return tcp, http def _get_vm(self, uuid: str): url = f"{self.base_url}/rest/v0/vms/{uuid}" # if url in self._vm_cache: # return self._vm_cache[url] response = self.http.request("GET", url, headers=self.headers) result = self._handle_json_response(response) self._vm_cache[url] = result return result def _extract_vm_tags(self, uuid: str) -> dict: dict_tags = {} tags = self._get_vm(uuid).get("tags") for tag in tags: if tag.startswith(self.tags_prefix): k,v = tag.split("=", 1) k = k[len(self.tags_prefix):] dict_tags[k] = json.loads(v) return dict_tags def start_vm(self, uuid: str): if self._get_vm(uuid).get("power_state") == "Running": return url = f"{self.base_url}/rest/v0/vms/{uuid}/actions/start?sync=true" response = self.http.request("POST", url, headers=self.headers) if response.status != 204: raise Exception(f"HTTP {response.status}: {response.data.decode('utf-8')}") return def management_agent_detected(self, uuid: str) -> bool: return self._get_vm(uuid).get("managementAgentDetected") def vm_agent_detection_timeout(self, uuid: str, default_seconds: int = 60) -> bool: tags = self._extract_vm_tags(uuid) return tags.get("agent-detect-timeout", default_seconds) def _handle_json_response(self, response): if response.status >= 200 and response.status < 300: return json.loads(response.data.decode("utf-8")) else: raise Exception(f"HTTP {response.status}: {response.data.decode('utf-8')}") if __name__ == "__main__": xoa_url = os.getenv("XOA_URL") xoa_token = os.getenv("XOA_TOKEN") if not xoa_url: logging.fatal("Missing XOA_URL environment variable") sys.exit(1) if not xoa_token: logging.fatal("Missing XOA_TOKEN environment variable") sys.exit(1) client = XoaClient(xoa_url, xoa_token) group_number = 1 for boot_group in BOOT_ORDER: logging.info("Starting to boot group %s, length %s", group_number, len(boot_group)) # These should be booted in parallel, but aren't for uuid in boot_group: client.start_vm(uuid) timeout = client.vm_agent_detection_timeout( uuid=uuid, default_seconds=DEFAULT_AGENT_DETECT_TIMEOUT_SECONDS, ) mad = False for n in range(timeout): mad = client.management_agent_detected(uuid) if mad: break time.sleep(1) if not mad: raise Exception(f"No management agent detected in host {uuid}") target = client.vm_ip(uuid) tcp, http = client.vm_healthcheck(uuid) if tcp: hc = TCPHealthCheck(target=target, config=tcp) hc.run() if http: hc = HttpHealthCheck(target=target, config=http) hc.run() logging.info("All healthchecks passed for %s", target) group_number += 1 It'll boot each VM in order and wait for its agent to be detected, then wait for all its health checks to pass before moving on to the next VM. This is by no means production ready code, but it might be a decent solution. Finally a systemd timer would be set up on the XOA instance to auto-run this script on boot.
  • Hardware related section

    125 Topics
    1k Posts
    K
    @DustinB Hmm - just got done running mem86+ - 4 passes -- all 14 tests. No RAM errors. I wonder the what would cause this error? I'll probably just save config and reinstall. So strange.
  • The place to discuss new additions into XCP-ng

    241 Topics
    3k Posts
    yannY
    @olivierlambert updating the README will be quick enough... but if the sig is indeed mandatory we need to setup something for this first... and autosigning from a CI rather requires doing that on a trusted runner rather than on gitlab-provided ones, so that requires some provisioning and IT work first.
  • 0 Votes
    35 Posts
    2k Views
    olivierlambertO
    Then try to find anything happening around that time on other hosts, equipment, storage and so on.
  • XCP-NG Kubernetes micro8k

    3
    7
    0 Votes
    3 Posts
    363 Views
    nathanael-hN
    Hello @msupport we published a step by step guide, read more in the announcement there https://xcp-ng.org/forum/post/94268
  • NFS multipathing configuration

    xcp-ng nfs xenorchestra
    9
    3
    0 Votes
    9 Posts
    684 Views
    B
    Great, thank you!
  • 0 Votes
    3 Posts
    152 Views
    F
    @Danp yes I ran "yum update" to be sure but "nothing to upgrade" on the pool-master. I try with storage (iscsi) NIC configured and without Storage NIC configured but the pool join freeze. Seems that persist some "SESSION" (may be referred to the slave host previously configured?) or some incoerence in the pool database... from /var/log/xensource of the slave host when try to join the pool: "session_check D:520c5b4e5b36 failed with exception Server_error(SESSION_INVALID, "
  • Automating VM configurations after mass VMware imports

    9
    0 Votes
    9 Posts
    512 Views
    olivierlambertO
    Thanks, this is helpful We'll discuss that with the @Team-DevOps and try to get things implemented!
  • Commvault backups failing for a VM with large disks

    2
    0 Votes
    2 Posts
    338 Views
    olivierlambertO
    To me it sounds like a Commvault issue. If you want some investigation on Vates side, I would recommend to open a support ticket
  • How to Re-attach an SR

    Solved
    20
    0 Votes
    20 Posts
    1k Views
    tjkreidlT
    @olivierlambert Agreed. The Citrix forum used to be very active, but especially since Citrix was taken over, https://community.citrix.com has had way less activity, sadly. It's still gratifying that a lot of the functionality still is common to both platforms, although as XCP-ng evolves, there will be continually less commonality.
  • Rolling Pool Update - not possible to resume a failed RPU

    13
    0 Votes
    13 Posts
    718 Views
    Tristis OrisT
    @olivierlambert During RPU - yes. i mean manual update in case of failure.
  • Alpine Template Problem

    7
    2
    0 Votes
    7 Posts
    331 Views
    ?
    For anything older than the branches still shown in https://pkgs.alpinelinux.org, (from v3.0 to v3.12), the packages should be downloaded from their cdn: https://dl-cdn.alpinelinux.org/alpine But as mentioned above, anything older than 3 releases from the lastest current one(v3.21) are end of life and should not be used for more than testing.
  • 8.3 Cannot boot from CD Rom

    19
    1
    0 Votes
    19 Posts
    2k Views
    olivierlambertO
    Reping @stormi
  • sr iso disconnect and crashed my hosts

    11
    0 Votes
    11 Posts
    617 Views
    olivierlambertO
    I already suggested you the solution, now it's up to you to live with those process or to decide to reboot (ideally after doing updates because it's very dangerous to NOT being up to date)
  • Install XCP-ng in old HP ProLiant DL160 G6 (gen 6)

    9
    0 Votes
    9 Posts
    668 Views
    S
    @john.c Yeah - like I said it is a good step in the right direction. Just doesn't solve my particular storage related problems.
  • Citrix tools after version 9.0 removed quiesced snapshot

    2
    0 Votes
    2 Posts
    224 Views
    TeddyAstieT
    @vkeven XCP-ng 8.1 release note says VSS and quiesced snapshots support is removed, because it never worked correctly and caused more harm than good. Note that Windows guest tools version 9 (the default for recent versions of Windows if you install Citrix drivers) already removed VSS support, even for older versions of CH / XCP-ng I am not sure if this VSS feature is bound to the PV drivers, or if it also needs hypervisor support. Though it is not recommended to stay on a old version of the guest agent.
  • Diagnosing frequent crashes on host

    15
    0 Votes
    15 Posts
    997 Views
    T
    @olivierlambert @olivierlambert said in Diagnosing frequent crashes on host: Maybe there's a usage that's slightly different since when it was "more solid" and now it's trigger more easily. Is your XCP-ng fully up to date? No; as said originally, I'm still on 8.2.1. I have been concerned about moving to 8.3 because it's a new installation, and I don't want to screw it up, but I'm willing to accept that it's the right thing to do.
  • Script to auto mount USBs on Boot/Reboot. Monitoring Multiple UPS

    7
    0 Votes
    7 Posts
    809 Views
    olivierlambertO
    Ping @stormi so we track this somewhere internally
  • Grub looking for /dev/vda instead of /dev/xvda

    1
    0 Votes
    1 Posts
    142 Views
    No one has replied
  • Storage migration logs

    2
    0 Votes
    2 Posts
    152 Views
    olivierlambertO
    Hi, Check the task view, you'll have the duration of the process visible.
  • reboot of host does it stop or kill running VM's?

    14
    0 Votes
    14 Posts
    2k Views
    N
    Could someone elaborate on the procedure to have all VMs on a host shutdown properly upon XCP-NG host shutdown please? I tried from the host prompt: xe host-disable xe host-shutdown and from XOA Host: shutdown, with warning (This will shutdown your host without evacuating its VMs. Do you want to continue?) and rightly so the host has seemingly become unavailable (ping to its IP stops) But then what happens is very odd: first the VM on it still pings for a couple minutes (yes after the host stops to answers the ping) then the VM stops pinging but AFAICS XCP-NG is not OFF Awkwardly, I just access to the IDRAC8 entreprise license on which XCP-Ng is running, and can't SEE the proper status of XCP-NG from it. AFAIK it's not pinging but it doesn't seem OFF either. At least the IDRAC shows it ON, and upon power cycling and reconnecting to the VM the logs shows it hasn't been cleanly shutdown. NB: the VM has xen-guest-agent running within a container, but from what I gathered, the agent in Linux guests has no role in VM shutdown: See https://xcp-ng.org/forum/topic/10631/understanding-xe-guest-utilities/16 Also, I doubled check Proxmox: it does clean shutdown VMs, either with a "shutdown -h now" command or when triggered from GUI. And that's with a VM that has Promox guest installed. In any case, it would be nice to have XCP-NG/XOA be able to do the same.
  • ACPI Error: SMBus/IPMI/GenericSerialBus

    5
    0 Votes
    5 Posts
    403 Views
    ForzaF
    @dinhngtu Yes, looks like it. I stopped Netdata and the problem went away. But it is strange it started after the latest set of updates.
  • Migrate windows from Xeon Silver to older Xeon or AMD?

    Solved
    3
    0 Votes
    3 Posts
    246 Views
    G
    @olivierlambert I was looking for a way to mark this solved, can't find it. I haven't moved things, but after migrating my big lab to my mini-lab, I'm confident that the warm migration is the way to go. It was fast and seamless as long as you have the right network adapters set up. I had to fool with one of my networks to make a VM function, but that was certainly something I overlooked while setting up the mini-lab. A little testing before moving the VMs should make this go easily if using the old servers is the option for this project.