ECONNREFUSED when creating SDN network
-
Hi,
Using XO from source, last update two days ago.
I have some script to provision sefl-service which are working well.
But, two days ago, I put a node in maintainance mode. After restarting it, maintenance mode was automatically disabled. I put it again because I don't finish my work.
After that, when I crate a SDn network via xo-cli, I have this message :
✖ JsonRpcError: connect ECONNREFUSED X.X.X.X:6640 at Peer._callee$ (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:139:44) at Peer.<anonymous> (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/regeneratorRuntime.js:52:18) at Generator.<anonymous> (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/regenerator.js:52:51) at Generator.next (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/regeneratorDefine.js:17:23) at asyncGeneratorStep (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:3:17) at _next (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:17:9) at /home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:22:7 at new Promise (<anonymous>) at Peer.<anonymous> (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:14:12) at Peer.exec (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:182:20) { code: -32000, data: { address: 'X.X.X.X', code: 'ECONNREFUSED', errno: -111, message: 'connect ECONNREFUSED X.X.X.X:6640', name: 'Error', port: 6640, stack: 'Error: connect ECONNREFUSED X.X.X.X:6640\n' + ' at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)\n' + ' at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17)', syscall: 'connect' } }
I have checked all tree node, on first and second, port 6640 are listening :
tcp LISTEN 0 10 0.0.0.0:6640 0.0.0.0:* users:(("ovsdb-server",pid=1559,fd=20))
But NOT on third node. Even that, on the third node, opevswitch service is up, and ovsdb-server running :
openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/openvswitch.service.d └─local.conf, slice.conf Active: active (running) since mer. 2025-10-01 12:00:35 CEST; 22h ago Process: 44006 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl stop (code=exited, status=0/SUCCESS) Process: 44042 ExecStart=/usr/share/openvswitch/scripts/ovs-start (code=exited, status=0/SUCCESS) CGroup: /control.slice/openvswitch.service ├─44085 ovsdb-server: monitoring pid 44086 (healthy) ├─44086 ovsdb-server /run/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --ssl-ciphers=AES256-GCM-SHA38:A... ├─44100 ovs-vswitchd: monitoring pid 44101 (healthy) └─44101 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
And, If I check on XO, network are created on every nodes :
xe pif-list uuid=bb594bff-0d27-5477-8e13-94691411fb18 uuid ( RO) : bb594bff-0d27-5477-8e13-94691411fb18 device ( RO): tunnel1234 MAC ( RO): 3a:ac:6d:14:86:7a currently-attached ( RO): true VLAN ( RO): -1 network-uuid ( RO): 4ee84cf6-8c8a-9b5f-5038-1fd980b22d1e host-uuid ( RO): baeeaf84-c362-4057-999c-1c8fc57f3f33
So if the networks are created on the host, what is the impact of the connection being refused?
-
Not sure, let me add @Team-XAPI-Network
-
You're facing 2 different issues:
- it seems the SDN Controller plugin cannot reach OVSDB
- the SDN Controller plugin does not cleanup when there is an issue
For 2. there is a work item to fix this on the XO team side. Just to clarify, what happens is that your request reached the XAPI, a network was created on the pool, then the SDN plugin tries to actually establish the tunnel, but fails as it cannot reach OVSDB, at that point you get the error, but the network have been created on your pool(s).
You can check to confirm that the tunnel is not established using
ovs-vsctl show
, when the network is established, you should see something that looks like:Bridge xapi6 Controller "pssl:" fail_mode: standalone Port vif1.3 Interface vif1.3 Port xapi6 Interface xapi6 type: internal Port xapi6_port1 Interface xapi6_iface1 type: gre options: {key="11", remote_ip="192.168.1.220"}
Of course it could be vxlan instead of gre, but the remote_ip part is the important point. Here I believe you won't see that. And you'll have to manually remove these networks from your pool(s).
Regarding 1. and the connexion refused error, that probably means the 6640 port was not opened in the firewall. This should have been done automatically by XAPI.
You can check if that's the case:
# iptables-save | grep 6640 -A xapi-INPUT -p tcp -m conntrack --ctstate NEW -m tcp --dport 6640 -j ACCEPT
If you don't have that rule you can search your /var/log/xensource.log* for
openvswitch-config-update
and see if there are any errors there. -
@bleader Hi, thnak you for your response.
However, What I can see :
- iptable rule is here :
iptables-save | grep 6640 -A xapi-INPUT -p tcp -m conntrack --ctstate NEW -m tcp --dport 6640 -j ACCEPT
- Service is started and running :
systemctl status openvswitch ● openvswitch.service - Open vSwitch Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/openvswitch.service.d └─local.conf, slice.conf Active: active (running) since mer. 2025-10-01 12:00:35 CEST; 1 day 22h ago Process: 44006 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl stop (code=exited, status=0/SUCCESS) Process: 44042 ExecStart=/usr/share/openvswitch/scripts/ovs-start (code=exited, status=0/SUCCESS) CGroup: /control.slice/openvswitch.service ├─44085 ovsdb-server: monitoring pid 44086 (healthy) ├─44086 ovsdb-server /run/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --ssl-ciphers=AES256-GCM-SHA38:A... ├─44100 ovs-vswitchd: monitoring pid 44101 (healthy) └─44101 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor
- Process ovsdb-server is running :
ps -aux | grep ovsdb-server root 44085 0.0 0.0 44128 556 ? S<s oct.01 0:00 ovsdb-server: monitoring pid 44086 (healthy) root 44086 0.2 0.0 52252 12544 ? S< oct.01 6:52 ovsdb-server /run/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --ssl-ciphers=AES256-GCM-SHA38:AES256-SHA256:AES256-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA --ssl-protocols=TLSv1.2 --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
But, port 6640 is not listening :
tcp LISTEN 0 1 127.0.0.1:5900 0.0.0.0:* users:(("vncterm",pid=1322,fd=3)) tcp LISTEN 0 128 0.0.0.0:111 0.0.0.0:* users:(("rpcbind",pid=1265,fd=8)) tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=1158,fd=3)) tcp LISTEN 0 64 0.0.0.0:36183 0.0.0.0:* tcp LISTEN 0 5 0.0.0.0:10809 0.0.0.0:* users:(("xapi-nbd",pid=2864,fd=6)) tcp LISTEN 0 1 127.0.0.1:9500 0.0.0.0:* users:(("vncterm",pid=1322,fd=4)) tcp LISTEN 0 128 127.0.0.1:8125 0.0.0.0:* users:(("netdata",pid=2783309,fd=61)) tcp LISTEN 0 128 0.0.0.0:19999 0.0.0.0:* users:(("netdata",pid=2783309,fd=7)) tcp LISTEN 0 128 0.0.0.0:48863 0.0.0.0:* users:(("rpc.statd",pid=5012,fd=9)) tcp LISTEN 0 64 [::]:44169 [::]:* tcp LISTEN 0 128 [::]:111 [::]:* users:(("rpcbind",pid=1265,fd=11)) tcp LISTEN 0 128 *:80 *:* users:(("xapi",pid=2859,fd=11)) tcp LISTEN 0 128 [::]:22 [::]:* users:(("sshd",pid=1158,fd=4)) tcp LISTEN 0 128 *:443 *:* users:(("stunnel",pid=3140,fd=9)) tcp LISTEN 0 128 [::]:19999 [::]:* users:(("netdata",pid=2783309,fd=8)) tcp LISTEN 0 128 [::]:57023 [::]:* users:(("rpc.statd",pid=5012,fd=11))
And yes, you're right, on the third node (wich raise this error), there is no "option" line on bridges.
I've tried to start/stop ovs, start/stop sdn-controller. No changes