ECONNREFUSED when creating SDN network

Kptainflintt

Hi,

Using XO from source, last update two days ago.

I have some script to provision sefl-service which are working well.

But, two days ago, I put a node in maintainance mode. After restarting it, maintenance mode was automatically disabled. I put it again because I don't finish my work.

After that, when I crate a SDn network via xo-cli, I have this message :

✖ JsonRpcError: connect ECONNREFUSED X.X.X.X:6640
    at Peer._callee$ (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:139:44)
    at Peer.<anonymous> (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/regeneratorRuntime.js:52:18)
    at Generator.<anonymous> (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/regenerator.js:52:51)
    at Generator.next (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/regeneratorDefine.js:17:23)
    at asyncGeneratorStep (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:3:17)
    at _next (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:17:9)
    at /home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:22:7
    at new Promise (<anonymous>)
    at Peer.<anonymous> (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/@babel/runtime/helpers/asyncToGenerator.js:14:12)
    at Peer.exec (/home/uga/.nvm/versions/node/v22.17.0/lib/node_modules/xo-cli/node_modules/json-rpc-peer/dist/index.js:182:20) {
  code: -32000,
  data: {
    address: 'X.X.X.X',
    code: 'ECONNREFUSED',
    errno: -111,
    message: 'connect ECONNREFUSED X.X.X.X:6640',
    name: 'Error',
    port: 6640,
    stack: 'Error: connect ECONNREFUSED X.X.X.X:6640\n' +
      '    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)\n' +
      '    at TCPConnectWrap.callbackTrampoline (node:internal/async_hooks:130:17)',
    syscall: 'connect'
  }
}

I have checked all tree node, on first and second, port 6640 are listening :

tcp                         LISTEN                        0                             10                                                          0.0.0.0:6640                                                      0.0.0.0:*                            users:(("ovsdb-server",pid=1559,fd=20))

But NOT on third node. Even that, on the third node, opevswitch service is up, and ovsdb-server running :

openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/openvswitch.service.d
           └─local.conf, slice.conf
   Active: active (running) since mer. 2025-10-01 12:00:35 CEST; 22h ago
  Process: 44006 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl stop (code=exited, status=0/SUCCESS)
  Process: 44042 ExecStart=/usr/share/openvswitch/scripts/ovs-start (code=exited, status=0/SUCCESS)
   CGroup: /control.slice/openvswitch.service
           ├─44085 ovsdb-server: monitoring pid 44086 (healthy)
           ├─44086 ovsdb-server /run/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --ssl-ciphers=AES256-GCM-SHA38:A...
           ├─44100 ovs-vswitchd: monitoring pid 44101 (healthy)
           └─44101 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

And, If I check on XO, network are created on every nodes :

xe pif-list uuid=bb594bff-0d27-5477-8e13-94691411fb18
uuid ( RO)                  : bb594bff-0d27-5477-8e13-94691411fb18
                device ( RO): tunnel1234
                   MAC ( RO): 3a:ac:6d:14:86:7a
    currently-attached ( RO): true
                  VLAN ( RO): -1
          network-uuid ( RO): 4ee84cf6-8c8a-9b5f-5038-1fd980b22d1e
             host-uuid ( RO): baeeaf84-c362-4057-999c-1c8fc57f3f33

So if the networks are created on the host, what is the impact of the connection being refused?

olivierlambert

Not sure, let me add @Team-XAPI-Network

bleader

@Kptainflintt

You're facing 2 different issues:

it seems the SDN Controller plugin cannot reach OVSDB
the SDN Controller plugin does not cleanup when there is an issue

For 2. there is a work item to fix this on the XO team side. Just to clarify, what happens is that your request reached the XAPI, a network was created on the pool, then the SDN plugin tries to actually establish the tunnel, but fails as it cannot reach OVSDB, at that point you get the error, but the network have been created on your pool(s).

You can check to confirm that the tunnel is not established using ovs-vsctl show, when the network is established, you should see something that looks like:

    Bridge xapi6
        Controller "pssl:"
        fail_mode: standalone
        Port vif1.3
            Interface vif1.3
        Port xapi6
            Interface xapi6
                type: internal
        Port xapi6_port1
            Interface xapi6_iface1
                type: gre
                options: {key="11", remote_ip="192.168.1.220"}

Of course it could be vxlan instead of gre, but the remote_ip part is the important point. Here I believe you won't see that. And you'll have to manually remove these networks from your pool(s).

Regarding 1. and the connexion refused error, that probably means the 6640 port was not opened in the firewall. This should have been done automatically by XAPI.

You can check if that's the case:

# iptables-save  | grep 6640
-A xapi-INPUT -p tcp -m conntrack --ctstate NEW -m tcp --dport 6640 -j ACCEPT

If you don't have that rule you can search your /var/log/xensource.log* for openvswitch-config-update and see if there are any errors there.

Kptainflintt

@bleader Hi, thnak you for your response.

However, What I can see :

iptable rule is here :

iptables-save  | grep 6640
-A xapi-INPUT -p tcp -m conntrack --ctstate NEW -m tcp --dport 6640 -j ACCEPT

Service is started and running :

systemctl status openvswitch
● openvswitch.service - Open vSwitch
   Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/openvswitch.service.d
           └─local.conf, slice.conf
   Active: active (running) since mer. 2025-10-01 12:00:35 CEST; 1 day 22h ago
  Process: 44006 ExecStop=/usr/share/openvswitch/scripts/ovs-ctl stop (code=exited, status=0/SUCCESS)
  Process: 44042 ExecStart=/usr/share/openvswitch/scripts/ovs-start (code=exited, status=0/SUCCESS)
   CGroup: /control.slice/openvswitch.service
           ├─44085 ovsdb-server: monitoring pid 44086 (healthy)
           ├─44086 ovsdb-server /run/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --ssl-ciphers=AES256-GCM-SHA38:A...
           ├─44100 ovs-vswitchd: monitoring pid 44101 (healthy)
           └─44101 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

Process ovsdb-server is running :

ps -aux | grep ovsdb-server
root       44085  0.0  0.0  44128   556 ?        S<s  oct.01   0:00 ovsdb-server: monitoring pid 44086 (healthy)
root       44086  0.2  0.0  52252 12544 ?        S<   oct.01   6:52 ovsdb-server /run/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --ssl-ciphers=AES256-GCM-SHA38:AES256-SHA256:AES256-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA --ssl-protocols=TLSv1.2 --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor

But, port 6640 is not listening :

tcp                         LISTEN                        0                              1                                                           127.0.0.1:5900                                                       0.0.0.0:*                            users:(("vncterm",pid=1322,fd=3))                            
tcp                         LISTEN                        0                              128                                                           0.0.0.0:111                                                        0.0.0.0:*                            users:(("rpcbind",pid=1265,fd=8))                            
tcp                         LISTEN                        0                              128                                                           0.0.0.0:22                                                         0.0.0.0:*                            users:(("sshd",pid=1158,fd=3))                               
tcp                         LISTEN                        0                              64                                                            0.0.0.0:36183                                                      0.0.0.0:*                                                                                         
tcp                         LISTEN                        0                              5                                                             0.0.0.0:10809                                                      0.0.0.0:*                            users:(("xapi-nbd",pid=2864,fd=6))                           
tcp                         LISTEN                        0                              1                                                           127.0.0.1:9500                                                       0.0.0.0:*                            users:(("vncterm",pid=1322,fd=4))                            
tcp                         LISTEN                        0                              128                                                         127.0.0.1:8125                                                       0.0.0.0:*                            users:(("netdata",pid=2783309,fd=61))                        
tcp                         LISTEN                        0                              128                                                           0.0.0.0:19999                                                      0.0.0.0:*                            users:(("netdata",pid=2783309,fd=7))                         
tcp                         LISTEN                        0                              128                                                           0.0.0.0:48863                                                      0.0.0.0:*                            users:(("rpc.statd",pid=5012,fd=9))                          
tcp                         LISTEN                        0                              64                                                               [::]:44169                                                         [::]:*                                                                                         
tcp                         LISTEN                        0                              128                                                              [::]:111                                                           [::]:*                            users:(("rpcbind",pid=1265,fd=11))                           
tcp                         LISTEN                        0                              128                                                                 *:80                                                               *:*                            users:(("xapi",pid=2859,fd=11))                              
tcp                         LISTEN                        0                              128                                                              [::]:22                                                            [::]:*                            users:(("sshd",pid=1158,fd=4))                               
tcp                         LISTEN                        0                              128                                                                 *:443                                                              *:*                            users:(("stunnel",pid=3140,fd=9))                            
tcp                         LISTEN                        0                              128                                                              [::]:19999                                                         [::]:*                            users:(("netdata",pid=2783309,fd=8))                         
tcp                         LISTEN                        0                              128                                                              [::]:57023                                                         [::]:*                            users:(("rpc.statd",pid=5012,fd=11))

And yes, you're right, on the third node (wich raise this error), there is no "option" line on bridges.

I've tried to start/stop ovs, start/stop sdn-controller. No changes

Kptainflintt

@bleader Hi,

After a restart of the entire host, port 6640 is now listed when I trigger ss.

But, unfortunatly, tunnels are not working, every VM on this host loose connection to other in the same sdn network.

Exemple with an ping between two hosts :

2025-10-09T12:22:54.781Z|00026|tunnel(handler1)|WARN|receive tunnel port not found (arp,tun_id=0x1f1,tun_src=192.0.0.1,tun_dst=192.0.0.3,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=64,tun_erspan_ver=0,gtpu_flags=0,gtpu_msgtype=0,tun_flags=key,in_port=33,vlan_tci=0x0000,dl_src=56:30:10:5c:4d:ad,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=192.168.10.10,arp_tpa=192.168.10.20,arp_op=1,arp_sha=56:30:10:5c:4d:ad,arp_tha=00:00:00:00:00:00)
2025-10-09T12:22:54.781Z|00027|ofproto_dpif_upcall(handler1)|INFO|Dropped 61 log messages in last 59 seconds (most recently, 1 seconds ago) due to excessive rate
2025-10-09T12:22:54.781Z|00028|ofproto_dpif_upcall(handler1)|INFO|received packet on unassociated datapath port 33

If I migrate the VM on the third host to another, network came back.

This is very strange, because the network I've choose to test it is one of firt of all created, not last one, so it have worked before, and not now. I don't understand why and what to do...