BeagleBone Green Read-only Filesystem Lockup


#1

Hi,

I’m trying to set up an application on the BeagleBone Green using resinOS 2.0.0-beta.1. Unfortunately the system has been freezing before the application successfully installs (using rdt push). Digging a little bit I found many messages like the following in the kernel log at the moment when the system freezes:

[ 2181.584874] mmcblk1: unknown error -1 sending read/write command, card status 0x900
[ 2181.598593] EXT4-fs warning (device mmcblk1p6): ext4_end_bio:329: I/O error -5 writing to inode 24729 (offset 0 size 3117056 starting block 407538)
[ 2186.837059] mmcqd/1: page allocation failure: order:2, mode:0x208c020
[ 2186.844391] CPU: 0 PID: 79 Comm: mmcqd/1 Not tainted 4.4.21+ #1
[ 2186.844396] Hardware name: Generic AM33XX (Flattened Device Tree)
[ 2186.844437] [<c0015e09>] (unwind_backtrace) from [<c001259d>] (show_stack+0x11/0x14)
[ 2186.844456] [<c001259d>] (show_stack) from [<c03f7925>] (dump_stack+0x69/0x78)
[ 2186.844478] [<c03f7925>] (dump_stack) from [<c0100a4d>] (warn_alloc_failed+0xb5/0xf0)
[ 2186.844493] [<c0100a4d>] (warn_alloc_failed) from [<c0103393>] (__alloc_pages_nodemask+0x693/0x7d4)
[ 2180.317996]  slab_reclaimable:26074 slab_unreclaimable:4506
[ 2180.317996]  mapped:6297 shmem:3258 pagetables:242 bounce:0
[ 2180.317996]  free:6500 free_pcp:160 free_cma:4084
[ 2180.401309] Normal free:25860kB min:2764kB low:3452kB high:4144kB active_anon:82200kB inactive_anon:12484kB active_file:127196kB ina
ctive_file:127244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:522240kB managed:502920kB mlocked:0kB dirty:23440kB w
riteback:3044kB mapped:25188kB shmem:13032kB slab_reclaimable:104324kB slab_unreclaimable:18024kB kernel_stack:1032kB pagetables:968kB
unstable:0kB bounce:0kB free_pcp:728kB local_pcp:728kB free_cma:16336kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[ 2180.505049] lowmem_reserve[]: 0 0 0
[ 2180.512810] Normal: 2540*4kB (UMEC) 654*8kB (UMC) 299*16kB (C) 125*32kB (C) 22*64kB (C) 2*128kB (C) 0*256kB 0*512kB 0*1024kB 0*2048k
B 0*4096kB 0*8192kB = 25840kB
[ 2180.545917] 66868 total pagecache pages
[ 2180.558227] 0 pages in swap cache
[ 2180.567947] Swap cache stats: add 0, delete 0, find 0/0
[ 2180.581299] Free swap  = 0kB
[ 2180.588373] Total swap = 0kB
[ 2180.597346] 130560 pages RAM
[ 2180.604372] 0 pages HighMem/MovableOnly
[ 2180.614986] 4830 pages reserved
[ 2180.618152] 6144 pages cma reserved
[ 2180.633814] edma 49000000.edma: edma_prep_slave_sg: Failed to allocate a descriptor
[ 2180.653850] omap_hsmmc 481d8000.mmc: prep_slave_sg() failed
[ 2180.667957] omap_hsmmc 481d8000.mmc: MMC start dma failure
[ 2180.692921] mmcblk1: unknown error -1 sending read/write command, card status 0x900
[ 2180.708693] blk_update_request: I/O error, dev mmcblk1, sector 3256208
[ 2180.725511] blk_update_request: I/O error, dev mmcblk1, sector 3256216
[ 2180.739970] blk_update_request: I/O error, dev mmcblk1, sector 3256224
[ 2180.758633] blk_update_request: I/O error, dev mmcblk1, sector 3256232
[ 2180.774702] blk_update_request: I/O error, dev mmcblk1, sector 3256240
[ 2180.785432] blk_update_request: I/O error, dev mmcblk1, sector 3256248
[ 2180.805465] blk_update_request: I/O error, dev mmcblk1, sector 3256256
[ 2180.820204] blk_update_request: I/O error, dev mmcblk1, sector 3256264
[ 2180.838379] blk_update_request: I/O error, dev mmcblk1, sector 3256272
[ 2180.853675] blk_update_request: I/O error, dev mmcblk1, sector 3256280
[ 2180.875371] EXT4-fs warning (device mmcblk1p6): ext4_end_bio:329: I/O error -5 writing to inode 24729 (offset 0 size 3117056 startin
g block 407282)
[ 2180.901033] Buffer I/O error on device mmcblk1p6, logical block 234994
[ 2180.901058] Buffer I/O error on device mmcblk1p6, logical block 234995
[ 2180.901067] Buffer I/O error on device mmcblk1p6, logical block 234996
[ 2180.901076] Buffer I/O error on device mmcblk1p6, logical block 234997
[ 2180.901085] Buffer I/O error on device mmcblk1p6, logical block 234998
[ 2180.774702] blk_update_request: I/O error, dev mmcblk1, sector 3256240
[ 2180.785432] blk_update_request: I/O error, dev mmcblk1, sector 3256248
[ 2180.805465] blk_update_request: I/O error, dev mmcblk1, sector 3256256
[ 2180.820204] blk_update_request: I/O error, dev mmcblk1, sector 3256264
[ 2180.838379] blk_update_request: I/O error, dev mmcblk1, sector 3256272
[ 2180.853675] blk_update_request: I/O error, dev mmcblk1, sector 3256280

It goes on further with similar errors. Eventually it gets to:

[ 2187.267171] mmcblk1: unknown error -1 sending read/write command, card status 0x900
[ 2187.277157] blk_update_request: 502 callbacks suppressed
[ 2187.277164] blk_update_request: I/O error, dev mmcblk1, sector 2464968
[ 2187.289417] blk_update_request: I/O error, dev mmcblk1, sector 2464976
[ 2187.304340] blk_update_request: I/O error, dev mmcblk1, sector 2464984
[ 2187.313436] blk_update_request: I/O error, dev mmcblk1, sector 2464992
[ 2187.322545] blk_update_request: I/O error, dev mmcblk1, sector 2465000
[ 2187.331761] blk_update_request: I/O error, dev mmcblk1, sector 2465008
[ 2187.340798] blk_update_request: I/O error, dev mmcblk1, sector 2465016
[ 2187.349947] blk_update_request: I/O error, dev mmcblk1, sector 2465024
[ 2187.359918] blk_update_request: I/O error, dev mmcblk1, sector 2465032
[ 2187.369165] blk_update_request: I/O error, dev mmcblk1, sector 2465040
[ 2187.387979] Aborting journal on device mmcblk1p6-8.
[ 2187.394174] EXT4-fs error (device mmcblk1p6) in ext4_reserve_inode_write:5117: Journal has aborted
[ 2187.411897] EXT4-fs error (device mmcblk1p6): ext4_journal_check_start:56: Detected aborted journal
[ 2187.419440] EXT4-fs (mmcblk1p6): Delayed block allocation failed for inode 153273 at logical offset 0 with max blocks 1 with error 30
[ 2187.423066]
[ 2187.423073] EXT4-fs (mmcblk1p6): Remounting filesystem read-only
[ 2187.490799] EXT4-fs (mmcblk1p6): This should not happen!! Data will be lost

It happens at different stages of deploying the application, but almost always happens before the application is successfully installed. Seems like it might have to do with swapping? I came across this similar thread elsewhere on the net: Flashing SD image to eMMC fails with I/O errors.

Anybody have ideas on why this is happening and how to resolve it?


#2

Also just for fun, here’s the end of one of my most recent runs:

[ 2214.209489] aufs au_plink_put:434:docker[875]: pseudo-link is not flushed
ysRq : HELP :�KNJ&����ѡb) crash(c) terminat�i+��
                                                ɥ kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(��.�+       �ݕɽ���o) show-regi�͡t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)                                  +����r�������sysrq: Sysrq�ZWVi��͡l) show-memory-usage(m)�o��er(z)
�loglevel(0-9) reboot(b) crash(c)�|u��T-tasks(n) poweroff(o) show-registe��ƽ���ٕ��0-9) reboot(b) crash(c) ���Q�tasks(n) poweroff(o)

It made me feel a twinge of sympathy for the struggling system, which is now blinking pitifully in stasis on my desk.


#3

Hey @jmptable… Thats a nasty one, its something we have been debugging and working of for a little while. We did a thorough investigation of the issue and found that the reason it was getting filesystem errors was due to the eMMC subsystem being unable to fulfil read/write requests coming from the filesystem. And the reason the eMMC subsystem was failing was because it couldn’t allocate memory, and the reason it couldn’t allocate memory was due to memory fragmentation. This was indicated by dmesg logs like the following…

    [ 3511.434696] edma 49000000.edma: edma_prep_slave_sg: Failed to allocate a descriptor
    [ 3511.442410] omap_hsmmc 481d8000.mmc: prep_slave_sg() failed
    [ 3511.448149] omap_hsmmc 481d8000.mmc: MMC start dma failure
    [ 3511.922619] mmcblk0: unknown error -1 sending read/write command, card status 0x900
    [ 3511.930544] blk_update_request: 1839 callbacks suppressed
    [ 3511.936050] blk_update_request: I/O error, dev mmcblk0, sector 6357616

…and it was further verified by inspecting /proc/buddyinfo.
This issue seems to be happening at least since 2011[1] on beaglebones and keeps happening even with new kernels[2]. The suggested workaround in most threads was to set the sysctl parameter vm.min_free_kbytes to 8192[3]. This causes the kernel to more aggressively reclaim memory and reduce fragmentation. However, this didn’t help and we were still hitting the problem on your device.

You can try the following to fix it:

add the following contents in /etc/sysctl.d/fix-mmc-bbb.conf

vm.min_free_kbytes=8192
vm.dirty_ratio=5
vm.dirty_background_ratio=10

Then reboot the device.

We hope to release new images with a bunch of fixes for this in the coming week or so.

[1] https://bugs.launchpad.net/ubuntu/+source/linux-ti-omap4/+bug/746137
[2] https://groups.google.com/forum/#!topic/beagleboard/tdCUVMicDrk
[3] http://www.keypressure.com/blog/yay-fedora-23-on-beaglebone-black/


#4

Excellent, I’ll try the fix and make use of the new image as soon as it’s out.

Really appreciate the fast, thorough response.

Edit: just FYI seems like the fix didn’t take. Verified that the kernel parameters were changed to the values you suggested (after editing the config and rebooting) with sysctl -a. Failure was the same with the filesystem being remounted RO partway through the docker image build on the device.


#5

@jmptable that’s a bummer. It might help to go really big with vm.min_free_kbytes but obviously this is not idea going forward. I’m really hoping the new beta.8 version of resinOS will be much much more resilient to this, its a really annoying bug and almost makes the beaglebone unusable :frowning:


#6

Any update on this issue?


#7

hey @jisenhart, this issue should now be resolved on the 2.0.0-rc4 version of resin.io, for all the Beaglebone variants.