Subcategories

  • All Xen related stuff

    582 Topics
    6k Posts
    marcoiM
    is there anyway to block igpu from being used by xcp-ng during boot? I have it setup for pass through but it fails under the VM. I think it because xcp-ng still displaying the console screen and does not want to give it up.
  • The integrated web UI to manage XCP-ng

    23 Topics
    331 Posts
    P
    @coolsport00 Here is a hint in xo 5 when the newly created vm is running. [image: 1756487404025-c5c5e02c-937d-47ff-a257-b5aaddba23de-image.png]
  • Section dedicated to migrations from VMWare, HyperV, Proxmox etc. to XCP-ng

    102 Topics
    1k Posts
    sidS
    @cichy I know this isn't as easy as what you're asking for, but I wrote some terrible python code. It relies on health checks being defined as VM tags, or at least the management agent being detected. For example in my terraform code I have these tags on a test postgres instance and test nginx instances respectively: # postgres tags = [ "bootOrder/agent-detect-timeout=45", "bootOrder/ip=${jsonencode("auto")}", "bootOrder/healtcheck/tcp=${jsonencode({ "port" : 5432, })}", ] # nginx tags = [ "bootOrder/agent-detect-timeout=45", "bootOrder/ip=${jsonencode("auto")}", "bootOrder/healtcheck/http=${jsonencode({ "port" : 80, "scheme" : "http", "path" : "/" })}", ] Then the actual python: #!/usr/bin/env python3 import urllib3 import json import os import sys import socket import time import logging logging.basicConfig(level=logging.INFO) BOOT_ORDER = [ # Postgres ["55e88cb4-0c50-8384-2149-cf73e40b8c8e"], # nginx ["ba620f01-69d1-ddd8-b1d4-c256abe07e05", "bbe333bd-380a-1f94-4052-881c763b6177"], ] DEFAULT_AGENT_DETECT_TIMEOUT_SECONDS = 60 class HealthCheck: def __init__(self, target: str, config: dict) -> None: self.type = "base" self.target = target self.config = config self.timeout = 3 self.retry_max_count = 5 self.retry_cur_count = 0 self.retry_sleep = 10 def _retry(self): if self.retry_cur_count == 0: logging.info("Starting %s healtcheck against %s", self.type, self.target) self.retry_cur_count += 1 return True if self.retry_cur_count == self.retry_max_count: logging.warning('Failed Healtcheck of type %s for %s', self.type, self.target) return False time.sleep(self.retry_sleep) self.retry_cur_count += 1 return True class TCPHealthCheck(HealthCheck): def __init__(self, **kwargs): super().__init__(**kwargs) self.type = "TCP" def run(self): port = self.config.get("port") while self._retry(): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: sock.settimeout(self.timeout) success = sock.connect_ex((self.target, port)) == 0 if success: return True return False class HttpHealthCheck(HealthCheck): def __init__(self, **kwargs): super().__init__(**kwargs) self.type = "HTTP" def run(self): while self._retry(): assert_hostname = self.config.get("tls_verification", True) http = urllib3.PoolManager( cert_reqs="CERT_REQUIRED" if assert_hostname else "CERT_NONE", ) scheme = self.config.get("scheme", "http") port = self.config.get("port", 80) path = self.config.get("path", "").lstrip("/") url = f"{scheme}://{self.target}:{port}/{path}" response = http.request('GET', url, timeout=self.timeout) if response.status >= 200 and response.status < 300: return True return False class XoaClient: def __init__(self, base_url: str, token: str) -> None: self.base_url = base_url.rstrip("/") self.tags_prefix = "bootOrder/" self.token = token self.http = urllib3.PoolManager() self.headers = { "Content-Type": "application/json", "Cookie": f"token={self.token}", } self._vm_cache = {} def vm_ip(self, uuid): vm_tags = self._extract_vm_tags(uuid) ip = vm_tags.get("ip", "auto") if ip != "auto": return ip return self._get_vm(uuid).get("mainIpAddress") def vm_healthcheck(self, uuid): vm_tags = self._extract_vm_tags(uuid) tcp = vm_tags.get("healtcheck/tcp") http = vm_tags.get("healtcheck/http") return tcp, http def _get_vm(self, uuid: str): url = f"{self.base_url}/rest/v0/vms/{uuid}" # if url in self._vm_cache: # return self._vm_cache[url] response = self.http.request("GET", url, headers=self.headers) result = self._handle_json_response(response) self._vm_cache[url] = result return result def _extract_vm_tags(self, uuid: str) -> dict: dict_tags = {} tags = self._get_vm(uuid).get("tags") for tag in tags: if tag.startswith(self.tags_prefix): k,v = tag.split("=", 1) k = k[len(self.tags_prefix):] dict_tags[k] = json.loads(v) return dict_tags def start_vm(self, uuid: str): if self._get_vm(uuid).get("power_state") == "Running": return url = f"{self.base_url}/rest/v0/vms/{uuid}/actions/start?sync=true" response = self.http.request("POST", url, headers=self.headers) if response.status != 204: raise Exception(f"HTTP {response.status}: {response.data.decode('utf-8')}") return def management_agent_detected(self, uuid: str) -> bool: return self._get_vm(uuid).get("managementAgentDetected") def vm_agent_detection_timeout(self, uuid: str, default_seconds: int = 60) -> bool: tags = self._extract_vm_tags(uuid) return tags.get("agent-detect-timeout", default_seconds) def _handle_json_response(self, response): if response.status >= 200 and response.status < 300: return json.loads(response.data.decode("utf-8")) else: raise Exception(f"HTTP {response.status}: {response.data.decode('utf-8')}") if __name__ == "__main__": xoa_url = os.getenv("XOA_URL") xoa_token = os.getenv("XOA_TOKEN") if not xoa_url: logging.fatal("Missing XOA_URL environment variable") sys.exit(1) if not xoa_token: logging.fatal("Missing XOA_TOKEN environment variable") sys.exit(1) client = XoaClient(xoa_url, xoa_token) group_number = 1 for boot_group in BOOT_ORDER: logging.info("Starting to boot group %s, length %s", group_number, len(boot_group)) # These should be booted in parallel, but aren't for uuid in boot_group: client.start_vm(uuid) timeout = client.vm_agent_detection_timeout( uuid=uuid, default_seconds=DEFAULT_AGENT_DETECT_TIMEOUT_SECONDS, ) mad = False for n in range(timeout): mad = client.management_agent_detected(uuid) if mad: break time.sleep(1) if not mad: raise Exception(f"No management agent detected in host {uuid}") target = client.vm_ip(uuid) tcp, http = client.vm_healthcheck(uuid) if tcp: hc = TCPHealthCheck(target=target, config=tcp) hc.run() if http: hc = HttpHealthCheck(target=target, config=http) hc.run() logging.info("All healthchecks passed for %s", target) group_number += 1 It'll boot each VM in order and wait for its agent to be detected, then wait for all its health checks to pass before moving on to the next VM. This is by no means production ready code, but it might be a decent solution. Finally a systemd timer would be set up on the XOA instance to auto-run this script on boot.
  • Hardware related section

    125 Topics
    1k Posts
    K
    @DustinB Hmm - just got done running mem86+ - 4 passes -- all 14 tests. No RAM errors. I wonder the what would cause this error? I'll probably just save config and reinstall. So strange.
  • The place to discuss new additions into XCP-ng

    241 Topics
    3k Posts
    yannY
    @olivierlambert updating the README will be quick enough... but if the sig is indeed mandatory we need to setup something for this first... and autosigning from a CI rather requires doing that on a trusted runner rather than on gitlab-provided ones, so that requires some provisioning and IT work first.
  • update via yum or via xoa?

    5
    0 Votes
    5 Posts
    712 Views
    robytR
    @bleader said in update via yum or via xoa?: yes you're basically doing an RPU manually. But it is indeed odd that the process is stuck at 0%, it should be fairly fast to do the install patches, no errors in the logs? i've another install all patches and install all. Now i'll use rpm update and see if speed is the same or not
  • Pool Tasks Don't Complete? Major Issues...

    6
    0 Votes
    6 Posts
    878 Views
    O
    @Danp Yes to both. I've probably restarted the toolstack at least a dozen times, mostly to clear hung tasks. I did notice some weird issues with a secondary SR being disconnected on xcp02, (one of 10 hosts, 9 after I ejected and forgot xcp01), but there's no disks on it. It wasn't being used for anything at all (yet), and it's fine on all the rest. That does lead me to think maybe it was a power bump that rebooted a switch or something though. Maybe it caused some kind of hangup with xcp01 and xcp02, and since xcp01 was the pool master, it cascaded to the other issues I've seen? Could that cause the VM's that were originally running on xcp02 to die and not be able to be recovered easily?
  • Ubuntu 24.04 VMs not reporting IP addresses to XCP-NG 8.2.1

    7
    5
    0 Votes
    7 Posts
    2k Views
    J
    I just tried to install Ubuntu 24.04 to test it out, and I experienced the same problem with it not recognizing the IP address. I was first using the Ubuntu-provided package (xe-guest-utilities=7.20.2-0ubuntu1), which was failing. I then tried the package I had been using with my Ubuntu 22.04 servers that used to be part of the XCP-ng guest-tools.iso (xe-guest-utilities_7.20.0-9_amd64.deb) and had the same results. I mounted my current guest-tools.iso, which now has xe-guest-utilities_7.30.0-11_amd64.deb, and installed it. Now it was retrieving the IP address correctly. I'm not sure why the OP was still having trouble with that version (I'm using UEFI instead of BIOS, but I wouldn't think that would matter). I went ahead and tried out the Rust-based tools mentioned (xen-guest-agent_0.4.0_amd64.deb), and it was properly getting the IP address as well. I'm guessing there's some incompatibility (probably with the 6.x kernel) that was fixed between 7.20 and 7.30 (intentionally or accidentally). Given how much the Linux tools have changed over the years and the fact that they're not used for PV drivers anymore, is there a particular reason to use one over the other (legacy vs Rust)? What features do they really provide now? Is it just CPU/memory/disk/network status?
  • A task keeps poping up every second or so

    7
    1
    0 Votes
    7 Posts
    395 Views
    olivierlambertO
    In any case, you can ignore it.
  • VM migration is blocked during backup whereas no backup in progress

    5
    0 Votes
    5 Posts
    576 Views
    henri9813H
    @Danp Okay, you're a genius, I upgrade this morning after my backup. So that could explain my issue. the mentioned thread is exactly my issue, but I didn't find it when I was searching about my issue. Thanks for all !
  • can't start vm after host disconnect

    29
    0 Votes
    29 Posts
    6k Views
    olivierlambertO
    No, from the XCP-ng point of view, the VM is still running without any interruption.
  • XCP-ng Documentation - Roadmap

    Solved xcp-ng doc roadmap question out of date questions
    6
    0 Votes
    6 Posts
    738 Views
    J
    @olivierlambert @Marc-pezin Thank you very much for sorting this out!
  • problem with export or moving VM between pools

    18
    0 Votes
    18 Posts
    1k Views
    N
    @nick-lloyd I like your interpretation.
  • Upgrading to 2.5GB NICs and Troubleshooting Driver Issues on XCP-ng

    6
    0 Votes
    6 Posts
    1k Views
    A
    @aghering You could try compiling it yourself on 8.1... I don't have an 8.1 test/build system.
  • Optimization of Virtual Machines

    4
    0 Votes
    4 Posts
    477 Views
    M
    @NerdsOrder66 Ah, good question... My most recent environment included a Windows Server with 100+ RDP users. There wasn't anything special for optimization there aside from giving it a bunch of RAM/CPU resources, but YMMV.
  • Recurring crashes on VM

    3
    0 Votes
    3 Posts
    360 Views
    T
    @olivierlambert No, the only thing I get with xl dmesg on the host, for some time back, are random brief reports of individual CPUs running above temperature threshold and then being clocked down, and then resolving. Nothing else.
  • VHD import fails

    1
    0 Votes
    1 Posts
    125 Views
    No one has replied
  • import vhd

    6
    0 Votes
    6 Posts
    711 Views
    olivierlambertO
    I suppose you mean OVA, right? Anyway, good news
  • Very scary host reboot issue

    60
    0 Votes
    60 Posts
    22k Views
    M
    @olivierlambert said in Very scary host reboot issue: I am very very busy so I don't have time to make a search by myself but maybe someone else around with few minutes could point you to the blog post talking about this edit: found it in few sec luckily: https://xcp-ng.org/blog/2024/01/26/january-2024-security-update/ Thanks. I'll check this out.
  • This topic is deleted!

    1
    1
    0 Votes
    1 Posts
    45 Views
    No one has replied
  • PVHv2 - how to configure VM

    4
    0 Votes
    4 Posts
    554 Views
    olivierlambertO
    Indeed, but it's a bit old, and I would say for many reasons HVM with PV drivers is still the way to go for classical server virtualization mode
  • Unable to unblock a vm for reversion of snapshot

    7
    0 Votes
    7 Posts
    826 Views
    D
    @Danp I tried some of the xe commands listed in that post like xe vm-param-clear and xe vm-param-remove and wasn't successful.
  • Problems with existing pool, problems migrating to new pool

    Solved
    12
    0 Votes
    12 Posts
    893 Views
    S
    @tjkreidl Yeah, thanks. 12 hours, 68 VDIs to coalesce down to 10. Quite the improvement.
  • Oops! We removed busybox

    5
    0 Votes
    5 Posts
    454 Views
    olivierlambertO
    Interpreting vulnerability scanners is a hard task. They are often screaming for "common cases", but remember XCP-ng is an appliance, so there are many cases where things do not apply. Happy to help you further via our pro support to answer in details your concerns
  • Rebuild boot / OS drive

    community os rebild disaster-rec
    3
    0 Votes
    3 Posts
    632 Views
    G
    @Danp So, that's the "best" way? Backup the meta-data with XOA, rebuild the OS drive - add the "new" server into XOA and restore the meta-data back to the new install? (I'm not doubting it is, just wanting to be sure we're understanding each other fully.) Seems straight-forward - but there's a ton of things I've done over the years that "seemed" pretty straight-forward that turned out to be anything but, and at least occasionally found I had no way back.