Running CR job directly between nodes instead of through XOA Appliance possible?
-
Hello,
I have configured a Continuous Replication job for a VM on a pool with local storage between 2 hosts within the same pool.
These hypervisors have a migration network which we also configured as the "Default migration network".Now the issue is this migration network is connected with 2x 10Gbit bonded network.
The XOA appliance only has a 1Gbit connection.
The CR job now uses the XOA appliance as a proxy to transfer the data from 1 host in the pool to another host in the same pool over a slow network connection while there is also a fast network available outside of the XOA appliance.Is it possible to do this CR job directly between the two hosts over the migration network, instead of sending the traffic through the XOA appliance?
Cheers,
Niels -
Add a VIF in your XOA inside that network, this way it could flow via your 2x10G network.
-
Hi,
Thanks for your response. Unfortunately the XOA appliance itself is not connected with a 10Gbit connection to the hypervisors so that is not possible.
However, I deployed a XOA Proxy ( https://xen-orchestra.com/docs/proxy.html ) in an attempt to do it.
So I have now done the following:
- The "Primary" XOA appliance (outside of the hypervisors it is managing) has a 1Gbit connection to the hypervisors.
- The Proxy XOA Appliance (a VM on the hypervisor it is managing) has a VIF in the management-VLAN (eth0). The primary XOA appliance can reach the Proxy over this (1Gbit) network
- The Proxy appliance also has an eth1 which has a VIF connected to our (10 Gbit) migration network. This is the network I want to run the backup over.
So this proxy appliance now has the management LAN on eth0 and the migration LAN on eth1. I want to run this backup over the eth1 network. However currently it is still going over eth0 and not eth1. How would I configure it to go over eth1?
The docs do mention adding a secondary network ( https://xen-orchestra.com/docs/proxy.html#adding-a-network-card-to-a-proxy ) so I assume it is possible somehow, but not sure how.
It also says:
As XOA uses the first IP address reported by XAPI to contact the proxy appliance, you may have to switch the network card order if you want your proxy to be connected through a specific IP address.
However since the primary XOA is only on the management (1Gbit) LAN it has to go over the eth0 interface from Primary XOA <-> Proxy and then eth1 from Proxy <-> Backup Target
Cheers,
Niels -
What do you mean? There's no "artificial network limitation speed" in a VM. There's no 1G or 10G NIC, it's paravirtualized, so it's up to the speed of your host CPU.
If you add a VIF in a VM on a 10G network, then you are in a 10G network in your VM.
-
@olivierlambert said in Running CR job directly between nodes instead of through XOA Appliance possible?:
What do you mean? There's no "artificial network limitation speed" in a VM. There's no 1G or 10G NIC, it's paravirtualized, so it's up to the speed of your host CPU.
If you add a VIF in a VM on a 10G network, then you are in a 10G network in your VM.
Hi Olivier,
We keep XOA physically separated (on a separate xcp-ng pool) from all the pools XOA is managing.
This means the physical connection between XOA and the xcp-ng pool where we run a CR job between 2 hosts is 1Gbit.Using the proxy we can eliminate that, but we still cannot eliminate the management network; the entire xcp-ng management network runs through separate switches dedicated for management purposes and they are 1Gbit. So to give a better idea from perspective of the XOA Proxy instance:
hypervisor01
Management IP: 10.40.1.10/24 (Connected through 1Gbit NIC with switch)
Migration IP: 10.40.4.10/24 (Connected through 10Gbit NIC with switch)hypervisor02
Management IP: 10.40.1.11/24 (Connected through 1Gbit NIC with switch)
Migration IP: 10.40.4.11/24 (Connected through 10Gbit NIC with switch)The Primary XOA instance is only able to reach 10.40.1.0/24 (as this is the management network).
So we created a XOA Proxy within the xcp-ng pool it is managing.I.e.
Primary XOA Instance on "remote" location:
IP: 192.168.1.10/24 (1Gbit NIC) -> default gateway with routes to management network -> 10.40.1.100 [XOA Proxy]XOA Proxy (VM within xcp-ng pool it is managing):
IP: 10.40.1.100/24 (1Gbit NIC) (for management purposes; so the Primary XOA instance can reach it
IP2 (separate 10Gbit NIC): 10.40.4.100/24So since we want to go from hypervisor01 to hypervisor02 over the 10.40.4.0/24 network what we need is something like this:
Primary XOA Instance [192.168.1.10] -> Proxy XOA [10.40.1.100] -> hypervisor01 [10.40.4.10] AND hypervisor02 [10.40.4.11]
What it currently does is:
Primary XOA Instance [192.168.1.10] -> Proxy XOA [10.40.1.100] -> hypervisor01 [10.40.1.10] AND hypervisor02 [10.40.1.11]
However, the physical NIC for that 10.40.1.0/24 network is only 1Gbit so traffic going from hypervisor01 <-> hypervisor02 is limited to the 1Gbit physical connection there.
We originally decided to get dedicated switches intended only for management purposes. As we never needed fast connections there these are just simple 1Gbit switches. We also have dedicated cables going to 10Gbit switches intended for backup/migration traffic and that is why we want to choose the network it should use.
We had this issue a little while ago as well for normal VM migrations but a few months back XOA added the option to choose which migration network to use (instead of the default management network which is the slow 1Gbit network) so we thought this was now resolved and we could also do it for the backups, but it seems not.
If it is impossible I guess we can route the 10.40.4.0/24 network from the Primary XOA appliance to the proxy in order to force it to use that network for the connection between the proxy and the hypervisor01 / hypervisor02. I just thought that would not be needed as it was possible to add another VIF to the proxy instance. I thought it would also be possible to add logic so the XOA instance know which VIF to use when there are multiple IPs to reach the underlying destination of the backup.
-
Okay so to update on this: We have now configured the XOA Proxy with only an IP in the 10Gbit network (10.40.4.x) and added routing so our primary XOA Instance can reach it.
Unfortunately the XOA Proxy keeps trying to talk with the wrong IP, so now we just get timeouts when running the CR Job:
Error: connect ETIMEDOUT 10.40.1.11:443
This is of course logical because now the XOA Proxy has IP 10.40.4.100 and cannot reach the 10.40.1.x network.
The docs mention:
TIP As XOA uses the first IP address reported by XAPI to contact the proxy appliance, you may have to switch the network card order if you want your proxy to be connected through a specific IP address.
I'm not sure if this is related; as XOA can reach the proxy appliance. Its simply the Proxy Appliance that is using the wrong IP to reach the xcp-ng host. It's trying to reach it through 10.40.1.11 while it should connect to 10.40.4.11.
Is it at all possible to change the IP the XOA Proxy uses to connect to the xcp-ng host?
-
Try to use the other IP (in the 10G network) in Settings/server to connect to your pool.
-
@olivierlambert said in Running CR job directly between nodes instead of through XOA Appliance possible?:
Try to use the other IP (in the 10G network) in Settings/server to connect to your pool.
Hmm, this seems to alter behaviour a little bit but still not working properly now;
When I start the backup job it'll make a new snapshot and I see a task being created:
"[XO] Exporting content of VDI <name>"
This task is also visible in
xe task-list
output.However it stays on 0%, not sure why. I let it run for 30 minutes just now and then cancelled it to re-try. To cancel I had to restart the toolstack as well, otherwise it wouldn't let me cancel it.
Just restarted it, will let it run for a while to see if it eventually times out with an error, but so-far it stays at 0% without an obvious reason
-
Task is still on pending at 0%
I don't know the exact internals of Xen and how it interacts with XOA during a migration but I think something like this is happening:
- XOA requests a snapshot of the VM to backup through a CR job
- Once snapshot is done, XOA wants to export this VDI and import it on the "backup destination"
- For this XOA creates two tasks , which I also see in the logs:
Jul 26 12:25:45 ede-vmh001 xapi: [ info||1073 HTTPS 10.40.4.100->:::80|task.create D:c09776f47706|taskhelper] task [XO] Exporting content of VDI <_redacted_name_> R:5e4624062d88 (uuid:ef07e18c-77f9-43a6-c98e-4fd1dfea279d) created (trackid=7dd5a690c5e4ff9992370100036107df) by task D:c09776f47706
-
This is the tasks we see as "Pending" in
xe task-list
-
Afterwards XOA will try to download the VDI like this:
Jul 26 12:25:45 ede-vmh001 xapi: [debug||1074 :::80|VDI.export_raw_vdi D:0595aad00e33|importexport] HTTP 302 redirect to: https://10.40.1.27/export_raw_vdi/?format=vhd&vdi=OpaqueRef:571bcabb-37e1-48d7-804c-3dae3490faf9&base=OpaqueRef:f94bf1b1-8312-44da-aa8f-05fa170a01a4&session_id=OpaqueRef:a7fa043d-9f70-4360-ad85-a45c7f725479&task_id=OpaqueRef:a37a5481-ff73-4e99-a655-50edfff11668
-
I guess (?) that XOA will try to follow this redirect but fail, since the XOA appliance cannot reach the 10.40.1.0/24 subnet.
-
The failure is not handled correctly and the result is the task will stay stuck until it times out or the toolstack receives a restart.
Again I'm guessing a bit in how XOA works here when migrating. I was able to find https://github.com/vatesfr/xen-orchestra/blob/c6f22f4d758f6a557fa2afd347f269febe9b88cc/packages/xo-server/src/xapi/index.mjs#L1672 which suggests that the export_raw_vdi is at least something used by XOA and I also found https://github.com/xapi-project/xen-api/blob/550ff27340167239b33aa41b2f517c7771df9213/ocaml/xapi/export_raw_vdi.ml#L131 which is part of xapi. I don't know what exactly it means but I think it suggests the 302 is at least not an error but normal behaviour.
I'm not sure where the 302 itself comes from. I would guess it's a xenserver internal thing to redirect to the management IP...? But I am not sure....
That suggests that unless it is possible to disable this redirect, what I want is basically not possible.Am I sorta close here or way off with my guesses?
-
Any opinion @julien-f ?