Problem with Wifi disconnecting


#1

We have a BeagleBone Black running beaglebone-debian:jessie-20161228 and Resin OS 1.24.0

We are connecting to an AP by writing a file to /host/var/lib/connman

Type=wifi
SSID=4D6F6F31
Passphrase=keithlliomorfyddindeg
Nameservers=8.8.8.8, 8.8.4.4

After a while (could be minutes or hours) the system disconnects and doesn’t reconnect.

I used a serial cable to read the logs and I got the following :

[38829.495300] cfg80211: World regulatory domain updated:
[38829.524066] cfg80211:  DFS Master region: unset
[38829.554782] cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
[38829.591292] cfg80211:   (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[38829.616143] cfg80211:   (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
[38829.644180] cfg80211:   (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
[38829.668536] cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
[38829.698008] cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
[38829.731496] cfg80211:   (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
[38829.759575] cfg80211:   (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
[38829.784466] cfg80211:   (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
[38829.825399] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready

We have 5 copies of the hardware - all show the same symptoms. We also have various other things attached to this AP and they all stay attached.

Any ideas how we can diagnose this problem?


#2

What wifi dongle are you using?

I’m guessing that connman does not reconnect after a disconnection like this. Might need to run some checks in the container whether or not there’s network connectivity, and restart connman if there isn’t. We are testing it out now with the setup you mentioned, and see whether we got any local disconnects. For the restart scripts, if go down on that route, should use dbus to communicate with the system connman, and probably systemd to run the checks automatically, at least that’s what I would do. But before that, probably need more information, what’s really going wrong.


#3

It’s a Realtek RTL8188CUS

Thanks for looking.

Yeah, i’m not averse to monitoring and then reconnecting. I already run a service to switch on an led when the wifi is connected; so this shouldn’t be too much extra work if i knew the dbus command.


#4

That seems a pretty common component, will check it out whether there’s any internal knowledge!

In the meantime, this is not a recommendation, just a possible workaround. resinOS 2.x switches to NetworkManager which is a lot more reliable and feature-rich for networking

connman & dbus

I got some dbus info from a gist:

The relevant part is:

# Turn off wifi
dbus-send --system --dest=net.connman \
          --print-reply /net/connman/technology/wifi net.connman.Technology.SetProperty \
          string:Powered variant:boolean:false

# Turn on wifi
dbus-send --system --dest=net.connman \
          --print-reply /net/connman/technology/wifi net.connman.Technology.SetProperty \
          string:Powered variant:boolean:true

This needs one more setting, this environment variable to tell dbus-send where to look for the control socket (see more info in the docs):

# on resinOS 1.x
DBUS_SYSTEM_BUS_ADDRESS=unix:path=/host_run/dbus/system_bus_socket

Also, if it was me, I’d probably set up network monitoring and reconnect as a systemd service, along the lines of connman-reconnect, except using the above dbus-send commands to restart the wifi service.

:construction: Just some work-in-progress thoughts :construction:


#5

@keithejc what kind of network workload is your container doing? Knowing that will help us try recreate that issue. Also is your wifi dongle one of the ones with an external antenna? The reason I ask this is because there are some known issue with wifi interference with dongles that have their chip antenna in the same plane as the board. I think this link has some info on it: https://learn.adafruit.com/setting-up-wifi-with-beaglebone-black/hardware#hdmi-port-interference

In general the suggestion above of switching on and off the connection manager is a really dangerous action and if it goes wrong you can brink your device. So its better to fix the underlying issue. If we can get a nice way to reproduce the issue and pop a serial cable on the device, we can definitely see what is causing the connection to drop.


#6

Not much network usage - serving a web page for controlling the device using crossbar.io

We have an external antenna and the hdmi port is disabled as we’re using the GPIO for controlling our hardware - LEDs and switching power to some other hardware.

I have a serial cable on the device - here is a complete log from starting up to losing connection:

https://drive.google.com/open?id=0B-WKra-gmr3ldmhLbkszRHV2X1U


#7

Wow, thats pretty interesting. I haven’t ever had this issue with any of my BBB device, but I never set the wifi from the /host/var/lib/connman directory. Is there a specific reason you are using this way of configuring the device wifi?

It also might be worth checking out the new resinOS 2.0 images on dashboard.resinstaging.io . Specifically, the 2.0.0-beta10.rev1, I have been testing that for last two days and its been connected and operational non-stop. Note that in the 2.0 image we have switched from connman to NetworkManager. Its also worth mentioning that mid-March we will be releasing 2.0 as the recommended production version and urging users to start transitioning to that version as soon as they can.


#8

I used that way as that was all i knew - i reverse engineered resin wifi connect
Our web portal has a UI that sets up a list of APs that it can connect to (the device is mobile so we need a list, not just one)

Yeah, I guess we’ll have to move to 2.0 and I’ll have to refactor my code to support networkmanager


#9

We had the same problem on devices with low signal quality (away from the wireless ap).

We solved it the hard way, if it disconnects after 1 hour, we restart the wireless using dbus.