Device comes up but doesn't work

support

#1

My Raspberry Pi 1 is in a strange state. I can manually power cycle it, it comes up online on the Resin dashboard, but the service doesn’t start running. Also the restart and reboot buttons gives an error:
Request error: tunneling socket could not be established, cause=socket hang up
The public URL also doesn’t work.

I can however still SSH into the host OS where I can run commands, like this:

root@55ef69e:~# journalctl -fn 100 -u resin-supervisor
-- Logs begin at Tue 2018-02-27 17:41:30 UTC. --
Jul 18 14:07:02 55ef69e systemd[1]: Starting Resin supervisor...
Jul 18 14:07:03 55ef69e balena[1049]: resin_supervisor
Jul 18 14:07:03 55ef69e systemd[1]: Started Resin supervisor.
Jul 18 14:07:40 55ef69e healthdog[1056]: [56B blob data]
Jul 18 14:07:40 55ef69e healthdog[1056]: [59B blob data]
Jul 18 14:07:45 55ef69e healthdog[1056]: [87B blob data]
Jul 18 14:07:45 55ef69e healthdog[1056]: [79B blob data]
Jul 18 14:07:52 55ef69e healthdog[1056]: 2018/07/18 14:07:52 main.go:41: Resin Go Supervisor starting
Jul 18 14:07:52 55ef69e healthdog[1056]: 2018/07/18 14:07:52 main.go:32: Starting HTTP server on /var/run/resin/gosuper.sock
Jul 18 14:09:33 55ef69e healthdog[1056]: [2018-07-18T14:09:33.348Z] Event: Supervisor start {}
Jul 18 14:09:33 55ef69e healthdog[1056]: [2018-07-18T14:09:33.751Z] Starting pubnub logger
Jul 18 14:09:44 55ef69e healthdog[1056]: [2018-07-18T14:09:44.211Z] Unhandled rejection TypeError:Cannot read property 'slice' of null
Jul 18 14:09:44 55ef69e healthdog[1056]:     at t (/usr/src/app/dist/app.js:1:730120)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at Zt (/usr/src/app/dist/app.js:1:8073)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at Function.Ja [as map] (/usr/src/app/dist/app.js:1:42834)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at Object.t.envArrayToObject (/usr/src/app/dist/app.js:1:730145)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at e.exports.e.fromContainer (/usr/src/app/dist/app.js:1:744249)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at u (/usr/src/app/dist/app.js:1:75045)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at A._settlePromiseFromHandler (/usr/src/app/dist/app.js:1:783032)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at A._settlePromise (/usr/src/app/dist/app.js:1:783832)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at A._settlePromise0 (/usr/src/app/dist/app.js:1:784531)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at A._settlePromises (/usr/src/app/dist/app.js:1:785858)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at s._drainQueue (/usr/src/app/dist/app.js:1:788861)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at s._drainQueues (/usr/src/app/dist/app.js:1:788922)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at Immediate.drainQueues (/usr/src/app/dist/app.js:1:787133)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at runCallback (timers.js:574:20)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at tryOnImmediate (timers.js:554:5)
Jul 18 14:09:44 55ef69e healthdog[1056]:     at processImmediate [as _immediateCallback] (timers.js:533:5)

Running ip addr gives me this:

root@55ef69e:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel qlen 1000
    link/ether b8:27:eb:42:91:9b brd ff:ff:ff:ff:ff:ff
    inet 192.168.8.101/24 brd 192.168.8.255 scope global dynamic eth0
       valid_lft 85653sec preferred_lft 85653sec
    inet6 fe80::a8be:f323:acb8:7f62/64 scope link
       valid_lft forever preferred_lft forever
3: resin-dns: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue qlen 1000
    link/ether 1a:2e:76:b0:51:f4 brd ff:ff:ff:ff:ff:ff
    inet 10.114.102.1/24 scope global resin-dns
       valid_lft forever preferred_lft forever
    inet6 fe80::182e:76ff:feb0:51f4/64 scope link
       valid_lft forever preferred_lft forever
4: resin-vpn: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel qlen 100
    link/[65534]
    inet 10.240.37.153 peer 52.4.252.97/32 scope global resin-vpn
       valid_lft forever preferred_lft forever
    inet6 fe80::984e:ffff:5915:65a3/64 scope link
       valid_lft forever preferred_lft forever
5: balena0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
    link/ether 02:42:b8:1c:17:55 brd ff:ff:ff:ff:ff:ff
    inet 10.114.101.1/24 scope global balena0
       valid_lft forever preferred_lft forever
6: supervisor0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
    link/ether 02:42:60:4f:f1:0a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global supervisor0
       valid_lft forever preferred_lft forever
7: br-b068b6a8c920: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue
    link/ether 02:42:1a:6e:24:e3 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 scope global br-b068b6a8c920
       valid_lft forever preferred_lft forever

The supervisor does indeed seem unhealthy:

root@55ef69e:~# balena ps -a
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS                      PORTS               NAMES
08bc27ea0b56        2b09725d2576                   "/usr/bin/entry.sh..."   About an hour ago   Up20 seconds                                   main_370772_539856
eef64d9001fd        resin/rpi-supervisor:v7.1.14   "/sbin/init"             2 months ago        Up16 minutes (unhealthy)                       resin_supervisor

Does anyone have a suggestion how I can get the Pi to work normally again. It is installed at a remote location, so reflashing the SD card is not an option.


Services reported running but containers not responding
#4

If you could go to the HostOS via SSH and run md5sum --quiet -c /resinos.fingerprint this will confirm whether the supervisor that is running has become corrupted or not. It is normal for files like hostname, .rnd and time to deviate from the version provided with the OS.


#5
root@55ef69e:~# md5sum --quiet -c /resinos.fingerprint
/etc/hostname: FAILED
/etc/machine-id: FAILED
/etc/systemd/timesyncd.conf: FAILED
/home/root/.rnd: FAILED
md5sum: WARNING: 4 computed checksums did NOT match

This really doesn’t look good. So I guess I’m stuck with a corrupt SD card. Any way I can force an fsck on it?


#6

That is actually a perfectly good result!
In between the publication and execution of the OS the hostname, machine-id, timesyncd and .rnd are expected to change to pick up it’s name, id latest time and pseudo-random seed respectively.