Supervisor killing local build process


#1

On both of the two Raspberry 3-Devices I have set up with resin-Shopscreen-2.3.0+rev1-dev-v6.1.3 stuff first worked but after putting the device into local mode and pushing a few local build of different projects the app start getting killed and it is not possible to do local builds because they exit with status 137. It looks like the supervisor is killing the apps and also the local builds (exit code 137 usually means that the process got a kill signal). When I kill the supervisor it just gets restarted but when I docker pause it, I am able to push local applications again. As soon as I unpause the supervisor the app gets killed again.

There is also a bunch of additional problems which seem to be related:

  • Cannot reboot the device in the resin admin, error flash message is “Request error: [object Object]”
  • Same error for shutting down the device
  • Even manual reboots do not change the behavior

And this log output in the supervisor:

[2017-10-13T08:43:31.100Z] Scheduling another update attempt due to failure:  30000 Error: 1 error: (HTTP code 404) no such container - No such container: null
    at /usr/src/app/dist/app.js:564:23484
    at r (/usr/src/app/dist/app.js:1:2435)
    at i._settlePromiseFromHandler (/usr/src/app/dist/app.js:294:42283)
    at i._settlePromise (/usr/src/app/dist/app.js:294:43083)

Can someone give me a hint, what the reason for this behavior can be? I am trying out resin.io for a deployment of kiosk systems but this issue does not give a lot of confidence in the system.


#3

Yes, code 137 effectively always means the supervisor is killing your local build. There’s a few reasons the supervisor will do that:

  1. you haven’t turned on local mode (but it sounds like you have here)
  2. you’ve never previously had a git pushed build deployed to this device
  3. the supervisor state has got corrupted somehow

We’re looking at making 2 optional in the future, but not yet, and there are fixes en route for a couple of ways you can hit 3 (mots commonly if you run out of memory on the device while the supervisor is doing something) that will be handled in a new ResinOS release coming soon I think.

Have you definitely previously git pushed at least once to these devices? If so, then perhaps your supervisor has got corrupted somehow, I think @pcarranzav might have some hints on how to debug that (and how to collect debug info, so we can make sure we permanently fix the underlying issues on devices that can cause this).


#4

Yes, local mode is definitely turned on! For the first device I even followed the getting started tutorial so also one of the first steps was the git push to the resin remote. Is there anything I can do about 3? Restarting the device doesn’t help so is there any way to reset the supervisor?

Btw, this is coming from the device logs:

13.10.17 10:58:38 (+0200) Failed to kill application ‘registry2.resin.io/XXX’ due to '[object Object]'
13.10.17 10:58:38 (+0200) Failed to update application ‘registry2.resin.io/XXX’ due to ‘[object Object]’

I guess this is also because of the supervisor problem, the terminal in the resin admin cannot connect also:

Spawning shell...
SSH session disconnected
SSH reconnecting...
Spawning shell...
SSH reconnecting...
Spawning shell...
SSH reconnecting...

#5

@MichaelR this is indeed a supervisor bug that was fixed in https://github.com/resin-io/resin-supervisor/pull/496 - we actually saw on our Athens hackathon that this affects local mode when you reboot after entering local mode, and it causes the supervisor to be unable to actually switch onto local mode, so it keeps killing your containers :frowning:

The fix will come out in the next ResinOS release. The workaround, in the meantime, is to run:

systemctl stop resin-supervisor
rm /resin-data/resin-supervisor/database.sqlite
systemctl start resin-supervisor

Which clears the supervisor’s local state, which it will recover when it starts again.


#6

Wow, thanks for the fast answers @pcarranzav and @pimterry! Looking forward to the next ResinOS release!