Docker Swarm - Can't connect to running services

docker

#1

Hi,

I am using ResinOS (2.0.6) for the Intel Edison, with the custom resinOS Docker (17.03.1-resin https://github.com/resin-os/docker/tree/17.03.1-resin).

I am running into some problems when using Docker Swarm. With the default build of ResinOS, docker services fail creating VXLAN interfaces. When I add VXLAN kernel support, my containers are not accessible through the exposed ports.

#How to reproduce this behavior

I’ve set up a minimal node.js server which listens on port 80.

Docker image: https://hub.docker.com/r/djb7/edison-expressjs-hello-world/
Code: https://github.com/djbb7/edison-expressjs-hello-world

My Intel Edison board has the IP 10.100.57.92

‘docker run’ works

If I ssh into the board and run the following, everything works as expected:

root@resin:~# docker run -d -p 80:80  djb7/edison-expressjs-hello-world
<container id>
root@resin:~# curl 10.100.57.92
Hello World, I'm a container running on resinOS
root@resin:~# docker stop $(docker ps -q)
<container id>

I can even access the web server from other computers in the same network.

‘docker service create’ fails

However, when I try to run the same image using swarm commands, the containers are not able to start due to an error creating vxlan interfaces:

root@resin:~# docker swarm init --advertise-addr 10.100.57.92
Swarm initialized: current note (pr22zwevmsb62zv6jfe1uxlmf) is now a manager.
...

root@resin:~# docker service create -p 80:80 djb7/edison-expressjs-hello-world
<service id>


root@resin:~# docker service ps relaxed_hypatia --no-trunc

ID                         NAME                   IMAGE                                                                                                             NODE   DESIRED STATE  CURRENT STATE                     ERROR                                                                                                                                 PORTS

1t2w7su0dm7r29yb3pxfffvpb  relaxed_hypatia.1      djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca  resin  Ready          Preparing less than a second ago                                                                                                                                        

w3cjzfjo1vodl24w7z0fco8kf   \_ relaxed_hypatia.1  djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca  resin  Shutdown       Failed less than a second ago     "starting container failed: subnet sandbox join failed for "10.255.0.0/16": error creating vxlan interface: operation not supported"  

xaezmi61sla1noz3bev3kpj0l   \_ relaxed_hypatia.1  djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca  resin  Shutdown       Failed 5 seconds ago              "starting container failed: subnet sandbox join failed for "10.255.0.0/16": error creating vxlan interface: operation not supported"  

ykvo1u7ar48o5nbdjv5swt6mh   \_ relaxed_hypatia.1  djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca  resin  Shutdown       Failed 11 seconds ago             "starting container failed: subnet sandbox join failed for "10.255.0.0/16": error creating vxlan interface: operation not supported"  

olj25mbvcnqsfjypawbic1trb   \_ relaxed_hypatia.1  djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca  resin  Shutdown       Failed 16 seconds ago             "starting container failed: subnet sandbox join failed for "10.255.0.0/16": error creating vxlan interface: operation not supported"  

txt9kuj0jzysj1ucazab0jkib   \_ relaxed_hypatia.1  djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca  resin  Shutdown       Failed 21 seconds ago             "starting container failed: subnet sandbox join failed for "10.255.0.0/16": error creating vxlan interface: operation not supported" 

#Possible cause

I suspect that this might have something to do with the network configuration, even at kernel level. So I ran the check-config.sh script that comes with docker (https://github.com/resin-os/docker/blob/17.03.1-resin/contrib/check-config.sh). There are some kernel components which are missing (CONFIG_NETFILTER_XT_MATCH_IPVS, CONFIG_IP_NF_NAT, CONFIG_IPVLAN, CONFIG_VXLAN, CONFIG_DUMMY):

root@resin:/mnt/data/resin-data# ./check-config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: missing
- CONFIG_IP_NF_NAT: missing
- CONFIG_NF_NAT: enabled
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled

Optional Features:
- CONFIG_USER_NS: missing
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: missing
- CONFIG_MEMCG_SWAP_ENABLED: missing
- CONFIG_MEMCG_KMEM: missing
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: missing
- CONFIG_NETPRIO_CGROUP: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: missing
- CONFIG_IP_VS_NFCT: missing
- CONFIG_IP_VS_RR: missing
- CONFIG_EXT3_FS: enabled
- CONFIG_EXT3_FS_XATTR: enabled
- CONFIG_EXT3_FS_POSIX_ACL: enabled
- CONFIG_EXT3_FS_SECURITY: enabled
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: missing
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: missing
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: missing
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: missing
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled
  - "btrfs":
    - CONFIG_BTRFS_FS: missing
    - CONFIG_BTRFS_FS_POSIX_ACL: missing
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: missing
    - CONFIG_DM_THIN_PROVISIONING: missing
  - "overlay":
    - CONFIG_OVERLAY_FS: missing
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

#Attempted solution
I recompiled resinOS with the missing CONFIG_* components. I did so by adding them to layers/meta-resin-edison/recipes-kernel/linux/files/defconfig as well as adding them to RESIN_CONFIGS_DEPS[docker] in layers/meta-resin/meta-resin-common/classes/kernel-resin.bbclass. After re-flashing I run the check-config.sh script:

root@resin:/mnt/data/resin-data# ./check-config.sh
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_NF_NAT_IPV4: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: missing
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: missing
- CONFIG_MEMCG_SWAP_ENABLED: missing
- CONFIG_MEMCG_KMEM: missing
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: missing
- CONFIG_NETPRIO_CGROUP: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: enabled
- CONFIG_IP_VS: enabled (as module)
- CONFIG_EXT3_FS: enabled
- CONFIG_EXT3_FS_XATTR: enabled
- CONFIG_EXT3_FS_POSIX_ACL: enabled
- CONFIG_EXT3_FS_SECURITY: enabled
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
    Optional (for secure networks):
    - CONFIG_XFRM_ALGO: enabled
    - CONFIG_XFRM_USER: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled
  - "btrfs":
    - CONFIG_BTRFS_FS: missing
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: missing
    - CONFIG_DM_THIN_PROVISIONING: missing
  - "overlay":
    - CONFIG_OVERLAY_FS: missing
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

As can be seen the ‘Generally Necessary’ configs are present, as well as CONFIG_VXLAN. Somehow CONFIG_IPVLAN is still missing, even if it is marked in the defconfig with a =y, and in the kernel-resin.bbclass recipe. I attempt to run my simple hello world server again:

root@resin:/mnt/data/resin-data# docker swarm init —advertise-addr 10.100.57.92
Swarm initialized: current node (5qf4hvfpt9o8ss8tjr9ul4xvo) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    —token SWMTKN-1-0ki53aas597wxdnp9w54wdcwpb9iiujog63vr1hjrn7qisuq3e-2yherw73m9a9lh6cuyizcj8dx \
    10.100.57.92:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

root@resin:/mnt/data/resin-data# docker service create -p 80:80 djb7/edison-expressjs-hello-world
441w76skiv8bgr2f1xiwxaphb

root@resin:/mnt/data/resin-data# docker service ls
ID            NAME             MODE        REPLICAS  IMAGE
441w76skiv8b  youthful_mclean  replicated  1/1       djb7/edison-expressjs-hello-world:latest

root@resin:/mnt/data/resin-data# docker service inspect youthful_mclean --pretty

ID:    441w76skiv8bgr2f1xiwxaphb
Name:    youthful_mclean
Service Mode:  Replicated
 Replicas:  1
Placement:
UpdateConfig:
 Parallelism:  1
 On failure:  pause
 Max failure ratio: 0
ContainerSpec:
 Image:    djb7/edison-expressjs-hello-world:latest@sha256:4a7e9aad85d3fd2af6b60914a78890d50103d93d61d1e9d5956e41bd5e01deca
Resources:
Endpoint Mode:  vip
Ports:
 PublishedPort 80
  Protocol = tcp
  TargetPort = 80

root@resin:/mnt/data/resin-data# curl http://10.100.57.92
curl: (7) Failed to connect to 10.100.57.92 port 80: Connection refused

As can be seen this time the service is running, but it’s simply not possible to access it from outside the container. Sorry for the quite lengthy post, I tried to be as explicit as possible.

Maybe I am missing something? Is there a way to make services accessible to the outside world by exposing ports? Hope you can help me solve this.

Thanks

Edit: 1) CONFIG_IP_NF_NAT is also not being loaded, despite the fact that it’s present in the kernel-resin.bbclass. The latest check-config.sh script, available from the moby/moby repository lists it as generally necessary.
2) If I don’t specify any published ports when I create the service, I am able to ping from inside the service container to the outside network (e.g. google.com). If I specify ports, then ping doesn’t work.


#3

Just as a small update, I think one of the issues might be that the ingress network used by the swarm is conflicting with the network addresses assigned by the wireless access point to the devices. On Docker 17.05 it is possible to replace the ingress network with a default one, but since we are using Docker 17.03 I will attempt to replace the host’s network addresses instead.