doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

broken filesystem, e2fsck can not fix things

Mon Apr 10, 2017 6:01 am

Hello.

rPi 2B.

16GB Sandisk Ultra. Distro is from KeiDei (for their screens) (Raspbian modified with their kernel and video driver).

After 6 months using it, and 30 days uptime, system seems not running fine; from live system I got dmesg:

Code: Select all

[    6.965717] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[    7.163500] random: nonblocking pool is initialized
[    7.582733] i2c /dev entries driver
[    7.678094] bcm2708 watchdog, heartbeat=10 sec (nowayout=0)
[    7.782183] NET: Registered protocol family 10
[   11.862970] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   11.863529] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   13.546531] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   13.547011] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
[   16.161762] Adding 102396k swap on /var/swap.  Priority:-1 extents:2 across:2101240k SSFS
[498736.174939] bcm2708_rng_init=bd250000
[5958899.587538] mmc0: Timeout waiting for hardware interrupt.
[5958899.826741] mmcblk0: error -110 transferring data, sector 8802312, nr 8, cmd response 0x900, card status 0xc00
[6214016.917598] mmc0: Controller never released inhibit bit(s).
[6214016.918089] ------------[ cut here ]------------
[6214016.918125] WARNING: CPU: 0 PID: 54 at drivers/mmc/host/bcm2835-mmc.c:476 bcm2835_mmc_transfer_dma+0x19c/0x1d4()
[6214016.918135] Modules linked in: bcm2708_rng ipv6 bcm2708_wdog i2c_dev snd_bcm2835 evdev joydev cp210x snd_soc_bcm2708_i2s regmap_mmio snd_soc_core snd_compress snd_pcm_dmaengine snd_pcm snd_seq snd_seq_device snd_timer snd pl2303 i2c_bcm2708 usbserial
[6214016.918226] CPU: 0 PID: 54 Comm: mmcqd/0 Not tainted 3.18.9-v7 #7
[6214016.918261] [<80016850>] (unwind_backtrace) from [<800127ac>] (show_stack+0x20/0x24)
[6214016.918285] [<800127ac>] (show_stack) from [<8052796c>] (dump_stack+0x88/0xd4)
[6214016.918309] [<8052796c>] (dump_stack) from [<80024c4c>] (warn_slowpath_common+0x7c/0xa0)
[6214016.918330] [<80024c4c>] (warn_slowpath_common) from [<80024c9c>] (warn_slowpath_null+0x2c/0x34)
[6214016.918348] [<80024c9c>] (warn_slowpath_null) from [<80418630>] (bcm2835_mmc_transfer_dma+0x19c/0x1d4)
[6214016.918366] [<80418630>] (bcm2835_mmc_transfer_dma) from [<80418cf8>] (bcm2835_mmc_request+0xa8/0xc0)
[6214016.918385] [<80418cf8>] (bcm2835_mmc_request) from [<8040206c>] (mmc_start_request+0xd4/0xf8)
[6214016.918404] [<8040206c>] (mmc_start_request) from [<804029f4>] (mmc_start_req+0x2a0/0x350)
[6214016.918424] [<804029f4>] (mmc_start_req) from [<80411a60>] (mmc_blk_issue_rw_rq+0xd0/0xb30)
[6214016.918444] [<80411a60>] (mmc_blk_issue_rw_rq) from [<804125cc>] (mmc_blk_issue_rq+0x10c/0x4a4)
[6214016.918462] [<804125cc>] (mmc_blk_issue_rq) from [<804131f0>] (mmc_queue_thread+0xb4/0x14c)
[6214016.918480] [<804131f0>] (mmc_queue_thread) from [<8003f9b0>] (kthread+0xdc/0xf8)
[6214016.918498] [<8003f9b0>] (kthread) from [<8000ef28>] (ret_from_fork+0x14/0x20)
[6214016.918510] ---[ end trace 462636b9436f318a ]---
[6214016.918583] mmcblk0: unknown error -5 sending read/write command, card status 0x900
[6214016.918638] blk_update_request: I/O error, dev mmcblk0, sector 2781256
[6214016.918668] blk_update_request: I/O error, dev mmcblk0, sector 2781264
[6214016.918683] blk_update_request: I/O error, dev mmcblk0, sector 2781272
[6214016.918698] blk_update_request: I/O error, dev mmcblk0, sector 2781280
[6214016.918713] blk_update_request: I/O error, dev mmcblk0, sector 2781288
[6214016.918728] blk_update_request: I/O error, dev mmcblk0, sector 2781296
[6214016.918742] blk_update_request: I/O error, dev mmcblk0, sector 2781304
[6214016.918777] blk_update_request: I/O error, dev mmcblk0, sector 2781312
[6214016.918792] blk_update_request: I/O error, dev mmcblk0, sector 2781320
[6214016.918806] blk_update_request: I/O error, dev mmcblk0, sector 2781328
[6214016.918926] Aborting journal on device mmcblk0p2-8.
[6214016.919031] EXT4-fs error (device mmcblk0p2) in ext4_reserve_inode_write:4758: Journal has aborted
[6214026.925747] mmc0: Timeout waiting for hardware interrupt.
[6214026.925907] mmcblk0: error -110 transferring data, sector 122880, nr 8, cmd response 0x900, card status 0xc00
[6214026.926268] EXT4-fs error (device mmcblk0p2): ext4_journal_check_start:56: Detected aborted journal
[6214026.926292] EXT4-fs (mmcblk0p2): Remounting filesystem read-only
[6214036.945761] mmc0: Timeout waiting for hardware interrupt.
[6214036.949016] mmcblk0: error -110 transferring data, sector 2744320, nr 8, cmd response 0x900, card status 0xc00
[6214046.965771] mmc0: Timeout waiting for hardware interrupt.
[6214046.969725] mmcblk0: error -110 transferring data, sector 122880, nr 8, cmd response 0x900, card status 0xc00
[6214046.970040] EXT4-fs error (device mmcblk0p2): ext4_journal_check_start:56: Detected aborted journal
[6214046.976087] EXT4-fs error (device mmcblk0p2): ext4_journal_check_start:56: Detected aborted journal
[6214046.976196] EXT4-fs error (device mmcblk0p2): ext4_journal_check_start:56: Detected aborted journal
[6214046.976232] EXT4-fs (mmcblk0p2): ext4_writepages: jbd2_start: 1024 pages, ino 71059; err -30
[6214046.976395] EXT4-fs error (device mmcblk0p2) in ext4_dirty_inode:4877: Journal has aborted
[6214057.005778] mmc0: Timeout waiting for hardware interrupt.
[6214057.005966] mmcblk0: error -110 transferring data, sector 122880, nr 8, cmd response 0x900, card status 0xc00
[6214067.025842] mmc0: Timeout waiting for hardware interrupt.
[6214067.029053] mmcblk0: error -110 transferring data, sector 160192, nr 8, cmd response 0x900, card status 0xc00
[6214067.029337] blk_update_request: 10 callbacks suppressed
[6214067.029354] blk_update_request: I/O error, dev mmcblk0, sector 160192
[6214067.029376] Buffer I/O error on dev mmcblk0p2, logical block 4664, lost async page write
[6214077.045815] mmc0: Timeout waiting for hardware interrupt.
[6214077.048986] mmcblk0: error -110 transferring data, sector 160216, nr 8, cmd response 0x900, card status 0xc00
[6214077.049277] blk_update_request: I/O error, dev mmcblk0, sector 160216
[6214077.049300] Buffer I/O error on dev mmcblk0p2, logical block 4667, lost async page write
So I put the card in an other computer. Smartctl can find any thing. Command fails finding headers.

Code: Select all

# e2fsck -a /dev/sdk2
e2fsck: Bad magic number in super-block while trying to open /dev/sdk2
/dev/sdk2:
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 
# e2fsck -b 8193 /dev/sdk2
e2fsck 1.41.12 (17-May-2010)
e2fsck: Bad magic number in super-block while trying to open /dev/sdk2
 
The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
I made a copy of the partition with dd. Copy went fine without any error (neither in dd console, or syslog-dmesg).

So I ended up with

Code: Select all

e2fsck -y /dev/sdk2
and this produced 3348 various files in lost+found/ (files, scripts, links, and folders). In other words, I assume the system can't boot in this state.

What is the probabylity the card is physically damaged ?

If I restore the partition backup, is there an other way to fsck and not loose so many stuff ?

I do have proper backups, but before using it, I need to know if the uSD card is still good (yes I will run a full badblock at some point), and if there is a faster way to fix this filesystem without having to do a full reinstall.

That rPi was doing average work; I took care to put all very busy files (one write per second) in /dev/shm, and record data on the uSD only once every 5mn.

unixcommando
Posts: 16
Joined: Sun Dec 04, 2016 6:08 pm

Re: broken filesystem, e2fsck can not fix things

Mon Apr 10, 2017 2:46 pm

E2FS is really robust and e2fsck can do a really nice job of fixing a damaged file system. But as you point out you have bad superblocks. The suggestion of using 8193 as an alternate is just a suggestion, you can get a list of all alternate superblocks with mke2fs -n /dev/sdk2, then try alternate blocks until you get a good one or you've run out.

As for, "Can the SD card be bad?" Sure, they're only meant to be written to so many times before failure. I've had some last for a long time and others die young.

Have you tried booting from it? I've never tried what you did in copying a damaged file system, since DD does a block to block transfer we can presume all the errors were faithfully copied and corrected by e2fsck but only booting will know for sure. Also, I'm assuming this file system is either /boot or / since you don't identify it and it isn't obvious to me with all the spurious dmesg information if it's there. If this isn't /boot or / then there should be no reason it shouldn't boot. If it is /boot or / you should be able to view it as an ordinary SD card in another computer.

Have you looked at alternative solutions to running servers from SD cards? The WD Pi drive is a good inexpensive solution as are SSD drives, not only more robust but also faster.

-Bob

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Mon Apr 10, 2017 7:59 pm

Quick reply before bed.

No, I suspect bad blocks; I do not have any proof; and quicj test have not found any.

After the fsck I did yesterday, it did not boot. I see a kernel log, but can't read bottom of screen.

sda1 (/boot) is fine; sda2 (/) is the broken one.

Never heard of alternative drives. 16GB uSD is way too much for me; I think I use only 3GB; say that ... 8GB is the max I will ever need in an rPi. Do you have links for the things you talk about ? How do you plug them ? If they are USB, I presume you still need a uSD boot zone.

Thanks

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Mon Apr 10, 2017 8:19 pm

Looks like it's very corrupted. If the superblock can't be found, it's usually beyond repair. Reformat with the SD association tool, check available capacity with F3 or somesuch, then reimage.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Tue Apr 11, 2017 8:57 am

unixcommando wrote:E2FS is really robust and e2fsck can do a really nice job of fixing a damaged file system. But as you point out you have bad superblocks. The suggestion of using 8193 as an alternate is just a suggestion, you can get a list of all alternate superblocks with mke2fs -n /dev/sdk2, then try alternate blocks until you get a good one or you've run out.
This is an rPi forum: sda2 is always /, and it have been resized by resize2fs.
-n Causes mke2fs to not actually create a filesystem, but display
what it would do if it were to create a filesystem. This can be
used to determine the location of the backup superblocks for a
particular filesystem, so long as the mke2fs parameters that
were passed when the filesystem was originally created are used
again. (With the -n option added, of course!)
Would resize2fs put backups at the same place as mkfs ? if no ... useless. Worth a try. Making a dozen of copies of the disk to try each of the backups.

Why isn't fsck trying them automaticly ?
unixcommando wrote:As for, "Can the SD card be bad?" Sure, they're only meant to be written to so many times before failure. I've had some last for a long time and others die young.
Sandisk Ultra: 10y waranty ... This is not first price crap.

Also, when I have badblocks on a plate disk, I usually loose some files one by one; one file get corrupted; and even when tile table get bad, the backup is usually good. Here I lost 3300 files and directories, and backup is also bad; the closest thing I ever had looking like this was a memory corruption causing the filesystem driver going mad, or some random process writing directly in the block (found a complete email with header once in /dev/sda ... instead of boot block and partition table; the email had been generated by cron, and should have been sent via exim; machine had been installed 8h earlier; was very fresh, and after a few reboots, BIOS complained disk did not contain any valid boot block. I have no explanation about how an email could be written in the very first sector of a disk). And when a disk has badblocks, doing a full read as I did usually generates a few messages in console and syslogs (about bad crc, or unreadable sector). And since uSD do not handle SMART ...
unixcommando wrote:Have you tried booting from it? I've never tried what you did in copying a damaged file system, since DD does a block to block transfer we can presume all the errors were faithfully copied and corrected by e2fsck but only booting will know for sure. Also, I'm assuming this file system is either /boot or / since you don't identify it and it isn't obvious to me with all the spurious dmesg information if it's there. If this isn't /boot or / then there should be no reason it shouldn't boot. If it is /boot or / you should be able to view it as an ordinary SD card in another computer.
I do it very often. This kind of raw copy usually rises up the hardware issues; in fact, for *ALL* disks, it's recommended to perfom a full read of the disk once a month, to force the disk reading the whole plate and check CRCs; this is HIGLY recommended if you have RAID (in cas of raid, do not read raw disks, but the raid volume). But I am not used to work with uSD and due to reserved spare blocks, I fear the rotation algorythm may hide a really broken block from me; a block that at some point used to contain a bad transistor, but error message does not come to the kernel due to silent rotation. I don't know which algo is used by Sandink Ultra.
unixcommando wrote:Have you looked at alternative solutions to running servers from SD cards? The WD Pi drive is a good inexpensive solution as are SSD drives, not only more robust but also faster.
WD Pi drive just looks like a classic FLASH drive; the first price one is starting on a uSD to pivoroot on USB ... why don't they just build reliable uSD ? or, is USB required to be able to use SMART ?

How do youplug a SSD ? I dislike USB for storage.

The problem is that my rPi are doing INTENSE use of USB; very very very intense. And using a USB storage is probably uncompatible with my application. I am using USB serial devices which are always bulks (by definition of USB1.0 specifications); so if a storage transaction is taking long, I may have lost frames (bulk is to USB what UDP is to IPv4: you have good hope to have your data reach destination, but when it does not, you don't get a single warning).

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Tue Apr 11, 2017 9:15 am

Code: Select all

# mkfs -n /dev/sdf2
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
944704 inodes, 3774464 blocks
188723 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=3867148288
116 block groups
32768 blocks per group, 32768 fragments per group
8144 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208

Code: Select all

s=/mnt/big/tmp/sdk2_rpi-03 ; for i in 32768 98304 163840 229376 294912 819200 884736 1605632 2654208 ; do cp -a "$s" "$s"."$i" ; e2fsck -b "$i" -y "$s"."$i" ; mkdir /tmp/tmp."$i" ; mount "$s"."$i" /tmp/tmp."$i" ; echo "***ls $i" ; ls /tmp/tmp."$i"/lost+found/ | wc ; umount /tmp/tmp."$i" ; done >/tmp/loggg 2>&1
For now, the first four backups give the same result: 3348 files in lost+found. I will leave the script running, but I don't think any backup will be better than these ones. I lost hope.

unixcommando
Posts: 16
Joined: Sun Dec 04, 2016 6:08 pm

Re: broken filesystem, e2fsck can not fix things

Tue Apr 11, 2017 3:25 pm

doublehp wrote: This is an rPi forum: sda2 is always /, and it have been resized by resize2fs.
You identify /dev/sdk2 not /dev/sda2. Are you trying to argue with me or make me look stupid? Not a great idea for someone looking for help.
doublehp wrote: Would resize2fs put backups at the same place as mkfs ? if no ... useless. Worth a try. Making a dozen of copies of the disk to try each of the backups.

Why isn't fsck trying them automaticly ?
If you want to learn perhaps being combative isn't the best approach. There are things going on here you clearly don't understand and based on the attitude you present here I have neither the time nor the inclination to educate you. There are plenty of documents on the web regarding Unix File Systems. Please avail yourself of them.
unixcommando wrote:As for, "Can the SD card be bad?" Sure, they're only meant to be written to so many times before failure. I've had some last for a long time and others die young.
doublehp wrote:Sandisk Ultra: 10y waranty ... This is not first price crap.
I didn't say it was, but SD cards have limits, 10y warranty not withstanding some will last longer than 10y some won't. It's mostly about write cycles. There's also the issue with counterfeit SD cards. Are you sure you didn't get one?

I'm going to stop here. In my estimation you are not worth helping.

-Bob

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Tue Apr 11, 2017 3:30 pm

Filesystem corruption can also come about from a weak power supply. If you're running multiple USB serial devices, make sure that the Raspberry Pi is getting enough power.

There's a chance you're trying to do too much with a small computer. All IO on the Raspberry Pi goes through the USB controller, and it can be a real bottleneck.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sun Apr 16, 2017 8:12 pm

You identify /dev/sdk2 not /dev/sda2. Are you trying to argue with me or make me look stupid?
I was lucky to be able to run dmesg inside the machine while it was already broken; so, in some logs, it's sda2; as stated, the machine never rebooted; so, all fsck are run from an other workstation where the uSD reader is sdk. sdk remains sda for boot argument, fstab, and other "live system" related settings.
There are plenty of documents on the web regarding Unix File Systems. Please avail yourself of them.
I have quoted man page that states mke2fs -n "may not be usefull". You don't know what I may know or not about filesystems. If fsck can try alternate blocks automaticaly, why doesn't it just do it ? If it can't, it's probably because it's not possible to make it, so, why mention this solution ? If it can be done manually, why not make an option to automate it ? Yes I was combative ... against ext2 and stupid e2fstools ! Is that bad to be angry because of an unreliable filesystem ?

3rd time I have a major issue on ext* since 2000 (complete system loss due to broken filesystem, without any hardware explanation). And I am trying to stop using ext* since 2004. XFS was not a happy success; now trying ZFS since 2014

rPis not allowing the use of SMART is becoming a big con for me. They are not reliable on long term. Manufacturers release couterfeit cards ... and I always got troubles with very high price cards bought from famous sellers (I am not buying Samsung from Ebay).
Filesystem corruption can also come about from a weak power supply. If you're running multiple USB serial devices, make sure that the Raspberry Pi is getting enough power.
USB serial adapters should not use much.

But supply could be a potential problem.
There's a chance you're trying to do too much with a small computer. All IO on the Raspberry Pi goes through the USB controller, and it can be a real bottleneck.
My Pis are doing constant work, but in the end, they are not using more than 15% CPU.

I have again over-estimated the reliability of a cheap device; but the next level is 10x more expensive, and would consume 10x more power.

Have already run badblocks on the card; nothing bad about it. Reinstallation on way.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Wed Jul 05, 2017 9:24 am

2 months later, same problem again ... problem is probably the supply. I had changed it a few weeks before first issue; happening again with same supply. badblocks was fine.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sun Jul 23, 2017 2:43 pm

I was overbooked these last days. I have reinstalled the rPi using a different SD, and soldered more capacitors on the PCB.

Adding capacitors on a single USB male plug did not work because at boot it causes an over-current, and triggers some protection. This simple fix was impossible for me.

This rPi is supplied via POE, so, if it burns again, it will mean the issue is related to the POE adapter, and that my SD was good.

Other rPis are also using POE, but via a different method. The unstable pi is using an adapter that directly produces 5V on a uUSB plug. The other ones use a POE-to-12V, then 12-5 adapters. POE is handy to make all small devices like rPis work on the same UPS; it would be much more complicated for me to bring UPS-240V to each rPis to be able to use a legacy 240 USB supply.

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Sun Jul 23, 2017 5:23 pm

The µUSB POE adaptors I've seen deliver up to 10 W, which is barely enough if everything's working and stable.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sat Nov 04, 2017 3:45 pm

On the same rPi, the filesystem is broken again. I am getting on nerves. I had solder a huge capacitor o the supply (one chemical, and a huge ceramic); had bought a new SD card (Sandisk ultra white-grey), configured fstrim ... don't know what to do.

The only external devices on this rPi are two USB-serial dongles; don't tell me they need more than 30mA each ...

Other rPis with similar configuration (hard and soft) work fine.

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Sat Nov 04, 2017 10:56 pm

I'm not sure if soldering extra capacitors will help much. Are you using a dedicated power supply (so, not a phone charger + USB cable) that's able to deliver at least 2 A? What is the voltage seen across the Raspberry Pi's test points?

You can check the maximum current that a USB is allowed to draw using:

Code: Select all

lsusb -D /dev/bus/usb/00x/00y | grep MaxPower
The two USB-serial cables on this machine are allowed to draw around 100 mA each.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sat Nov 04, 2017 11:39 pm

ATM, the Pi is dead untill webnesday. Amazon should have delivered me a High Endurance card this morning, but, the postman was lasy, and thought he could take a day off; so delivery is delayed for mid-next-week.

I have solder the capacitor in the input ceramic capa of the pi, on the 5V side. This can not affect the behavour of the circuit, or weaken it in any way, because it's almost on the power plug, before the 3V regulator.

The PSU is a POE adapter; I forgot which model. Probably about this model:
https://www.ebay.fr/itm/Noir-Actif-Poe- ... SwAPVZGlTW

From memory, the voltage on USB plugs was above 4.8V (maybe 5.0 or 5.1, I forgot).

But, I don't know what to do next.

Sandisk told me that Ultra cards loose their waranty when inserted in rPis, because their cards are not designed to be bootable storage (I can provide chat logs on this topic). So, I am trying High endurance; not sure it will be better. Also, I tried
https://www.amazon.fr/gp/product/B01DNV ... QH2NG2V087
but, not sure it will be better than classic SD cards. This item is said to be a true MMC; so, it *should* have better quality, in particular about wear leveling, and garbage collection. Also, I have configured all my Pis to use fstrim since last month. And obviously not enough.

The pi that died last month, the one for which I bought the true MMC card ... is using the official expensive white PSU from the official store. And still, the card died (FS corruption).

Of course, when I run badblocks, cards always pretend to be 100% fine.

An acceptable solution for me would be to find a reliable USB storage; something below 30€, and that would not need a second supply, and possibly, not too bulky. All I need is a *reliable* 4GB storage. I have found some true SSD USB storage devices, but, they were using USB signals over HE10 connectors; they were not using the official USB plug. I could probably buy http://wdlabs.wd.com/products/wd-pidriv ... ion/#flash ... but it's still "1-year Limited Warranty" ... does not sound better than Sandisk.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Tue Nov 14, 2017 4:59 pm

Code: Select all

# lsusb
Bus 001 Device 002: ID 0424:9514 Standard Microsystems Corp.
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 10c4:ea60 Cygnal Integrated Products, Inc. CP210x UART Bridge / myAVR mySmartUSB light
Bus 001 Device 005: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port

Code: Select all

# for i in /dev/bus/usb/*/* ; do lsusb -D $i | grep MaxPower ; done
    MaxPower                0mA
    MaxPower                2mA
    MaxPower                2mA
    MaxPower              100mA
    MaxPower              100mA
At the output of the serial adapter: 5.03V. So probably a bit more at the input of the pi.

Now using a Sandisk High Endurance in this rpi (number 03) since one week, and the eMMC+adapter in number 05 since one month.

Next month, I will receive an Orange Pi PLUS; the PLUS versions include eMMC soldered at factory, for only 5€ more than nude version. Should be much more reliable, for much cheaper than an other pi.

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Wed Nov 15, 2017 6:45 pm

I see you're using a PL2303-based serial adapter. My experience with them on Raspberry Pi has not been great. My ones work okay in undemanding serial applications (talking to an X10 controller a couple of times a day) but heavy transfers would sometimes cause kernel panics. I've had more luck with FTDI and HL-340-based adaptors.

(other users of this forum report deep unhappiness with HL-340 serial devices on the Raspberry Pi. For me, they just work, and I can't see anything I'm doing differently with them.)
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Thu Nov 16, 2017 1:14 pm

I have had problems with other serial adapters.

For this project (which is duplicated 8 times), I need very specific serial settings, to comply an old industrial standard (forgot the name), where, in short, the port needs to be 1200 baud, 7 bits, with parity, and something else inverted. I have tried several serial ports; integrated serial ports usually work fine, but USB dongle usually fail doing stty properly; and only this very specific http://DX.com/p/149859 reference does the job. And it's cheap.

And yes, I am intensively using it: I am doing continuous reception, at full rate, all day long. For a French electric meter. I have to pi doing this which have not failed since december 2016; this pi is handling two meters (thus has two serial adapters), and thus does twice more work than the other ones.

So, the pi which works twice more ... crashes more frequently. And a very violent KP could explain a FS corruption.

The main purpose of my pis is to receive data over serial adapters; but in some case, the work can be done using the pi internal port (avoiding the USB dongle, when a pi reads only one meter).

As you see, I take your words seriously, and will think more about that aspect. Maybe change filesystem, and try to compare the reliability of my pi depending on how many USB dongles they have (0, 1 or 2).

In particular http://dx.com/p/398436 was unable to receive data. I don't know if the problem is inherent to the chipset, or if the PCB around it could impact; I did not dig very much on the topic, because I only have a scope, and digging would be easier with a (forgot the name) data analyser. I lost only 12€, so, spent 12€ buying a new pack of dongles, and things worked immediately. I am not accusing the chipset, because the problem could be due to some badly tuned capacitor on THIS item. I just know that all 398436 have failed, while all 149859 work fine.

I have kept the last broken SD card, and image of the previous one; so, if you know how to analyse the reason of FS corruption, we could study them to check if the corruption could be due to a KP (serial driver writing data in the memory space of the ext3 or disk driver; and next sync corrupts the disk). It would definitely make sens.
Last edited by doublehp on Thu Nov 16, 2017 5:24 pm, edited 1 time in total.

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Thu Nov 16, 2017 5:03 pm

Ah: the 7-bit protocol might be a problem for the HL-340. I'm lucky enough to only need 8N1 serial transfer.

Even multiple 1200 baud inputs running flat out shouldn't tax a Raspberry Pi. Though I don't know about the data processing you need to do, you might be able to do this on an embedded board like an Arduino or ESP-8266. They don't have filesystems to crash, and are strictly real-time.

I've worked with meter data before (used to run medium-sized renewable energy generation stations, all grid connected) and metering standards were always a lot of work. Some of the hardware, even on new installations, had to use 25 year old comms boards, as they're the regional standard here. Sigh.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Thu Nov 16, 2017 5:27 pm

Most of the time, the average iddle time is 92%, and the load is 1.35. Every 5 mn, cron does heavy work, and uses over 50%. My code is not perfect, but I have done speed optimisation on the most important loops. It's not what I would call an intense work.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sat Dec 23, 2017 12:14 am

CH340G ( http://dx.com/p/398436 ) was not able to talk with my electric meter (1200B 7b and some other funny things). I have used PL2303 because it was cheaper, and working (better) ( http://dx.com/p/149859 ). Since you say the filesystem corruption could be related with the 2303 driver, I will try to plug an FTDI; if it works correctly with my device, then I will leave it for long term.

But, since the bug happens only every 6-12 months (with a continuous use of the serial port: I am reading it all day long), it will take about 2 years to know if the FTDI is better than 2303.

It could be possible that an other 340 board may work with my device; I could compare several models if you think their driver may be more stable.

I have to admit that the corruption always happened on rPi that were using PL2303 (3 crashed out of 4 devices in production; one of them crashed twice); I never had FS corruption on rPi which did not have it (2 in productions are working without PL2303). So ... these statistics point in favour of your guess.

The computation is very light, and could EASILY fit an arduino. But I am too lasy to learn the Arduino langage ...

An other method would be simpler for me, and for the same price: arduinos cost me 4€, while orange pi zero cost me 6€. Orange pi can be used either like an rpi (insert an SD and boot Armbian), or, if I am brave, burn a small system in the SPI-EEPROM (saving the cost of an SD card).

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sun Feb 04, 2018 1:38 pm

An other system have broken: while the system was live, I have found this in syslog (or messages, I never know):

Code: Select all

2018-01-24T12:52:27.998410+01:00 rpi-08-locataire-42-etage-droit kernel: [86481.017530] EXT4-fs (mmcblk0p2): error count since last fsck: 4
2018-01-24T12:52:27.998469+01:00 rpi-08-locataire-42-etage-droit kernel: [86481.017553] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-24T12:52:27.998482+01:00 rpi-08-locataire-42-etage-droit kernel: [86481.017574] EXT4-fs (mmcblk0p2): last error at time 1516771509: htree_dirblock_to_tree:987: inode 258157: block 1056784
[...]
2018-01-25T12:54:15.508480+01:00 rpi-08-locataire-42-etage-droit kernel: [172989.684696] EXT4-fs (mmcblk0p2): error count since last fsck: 8
2018-01-25T12:54:15.508537+01:00 rpi-08-locataire-42-etage-droit kernel: [172989.684732] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-25T12:54:15.508568+01:00 rpi-08-locataire-42-etage-droit kernel: [172989.684758] EXT4-fs (mmcblk0p2): last error at time 1516861398: htree_dirblock_to_tree:987: inode 258158: block 1056785
[...]
2018-01-26T12:56:03.038416+01:00 rpi-08-locataire-42-etage-droit kernel: [259498.385475] EXT4-fs (mmcblk0p2): error count since last fsck: 9
2018-01-26T12:56:03.038477+01:00 rpi-08-locataire-42-etage-droit kernel: [259498.385498] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-26T12:56:03.038490+01:00 rpi-08-locataire-42-etage-droit kernel: [259498.385535] EXT4-fs (mmcblk0p2): last error at time 1516944329: htree_dirblock_to_tree:987: inode 258157: block 1056784
Moved the SDcard to workstation, mounted it, and copy the content: cp -a /mnt/tmp /mnt/archive/ and I spotted this in logs:

Code: Select all

2018-02-04T14:07:29+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #258043: rec_len is too small for name_len - block=1056765offset=0(0), inode=258043, rec_len=12, name_len=7
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #257980: inode out of bounds - block=1056762offset=180(180), inode=537128899, rec_len=44, name_len=33
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): ext4_lookup: deleted inode referenced: 256958
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #257965: directory entry across blocks - block=1056761offset=80(80), inode=257969, rec_len=65564, name_len=18
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): ext4_lookup: deleted inode referenced: 126749
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #257950: rec_len is too small for name_len - block=1056760offset=92(92), inode=258018, rec_len=32, name_len=30
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #257995: rec_len is too small for name_len - block=1056763offset=352(352), inode=258009, rec_len=16, name_len=16
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): ext4_lookup: deleted inode referenced: 256983
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): ext4_lookup: deleted inode referenced: 520142
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #258022: inode out of bounds - block=1056764offset=180(180), inode=268693486, rec_len=28, name_len=17
2018-02-04T14:07:30+01:00 uranus kernel: EXT4-fs error (device sdk2): ext4_lookup: deleted inode referenced: 782317
2018-02-04T14:07:32+01:00 uranus kernel: EXT4-fs error (device sdk2): htree_dirblock_to_tree: bad entry in directory #258054: directory entry across blocks - block=1056767offset=56(56), inode=258089, rec_len=65568, name_len=24
That pi was still using a PL2303 . There had been an issue with a service 10 days ago, a service started to generate huge amount of logs (which were lost, because they were in /dev/shm ... to not kill the SD; but I have updated that part so that a part of them are now kept in /root), what leaded to a reboot (scripts monitor growing speed of logs; if they grow too fast, I reboot automaticaly). Since that automated reboot, most services were running fine, but some refused to start. Took me 10 days to figure the pi was broken.

These ext4 messages (on Uranus) came out during manual backup:

Code: Select all

# cp -a /media/Hmmm_-_sdk2 /media/boot_-_sdk1 rpi-08-locataire-42-etage-droit_2018-02/
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/php5/common/pd/,ini': No such file or directory
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/keymaps/i386/include/mac-linux-keys-bare.inc/gz': No such file or directory
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/keymaps/i386/include/linux-with-alt-and-alt\'r.inc.g{': Input/output error
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/keymaps/i386/azerty/&.': Input/output error
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/keymaps/mac/mac-ifook-de.kmap.gz': Input/output error
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/keymaps/mac/mac-macBook-fr.kmip.gz': Input/output error
cp: cannot stat `/media/Hmmm_-_sdk2/usr/share/keymaps/sun/sunt6-uk.kíap.gz': Input/output error
cp: will not create hard link `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/console/&.' to directory `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share'
cp: will not create hard link `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/console/*' to directory `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/console'
cp: will not create hard link `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/doc/libterm-readkey-perl/.' to directory `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/doc/libterm-readkey-perl'
cp: will not create hard link `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/doc/minicom/dxampleó/*.' to directory `rpi-08-locataire-42-etage-droit_2018-02/Hmmm_-_sdk2/usr/share/doc/minicom'
Now trying to dig:

Code: Select all

# fsck -f /dev/sdk2
fsck from util-linux 2.24.1
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Directory inode 200856, block #0, offset 0: directory corrupted
Salvage<y>? yes

Missing '.' in directory inode 200856.
Fix<y>? yes

Setting filetype for entry '.' in ??? (200856) to 2.
Missing '..' in directory inode 200856.
Fix<y>? yes

Setting filetype for entry '..' in ??? (200856) to 2.
Directory inode 257038, block #0, offset 12: directory corrupted
Salvage<y>? yes

Directory inode 257687, block #0, offset 196: directory corrupted
Salvage<y>? yes

Invalid inode number for '.' in directory inode 257688.
Fix<y>? yes

Directory inode 257688, block #0, offset 12: directory corrupted
Salvage<y>? yes

First entry '.^@^@' (inode=257689) in directory inode 257689 (???) should be '.'
Fix<y>? yes

Setting filetype for entry '.' in ??? (257689) to 2.
Entry '..' in ??? (257689) has invalid inode #: 50589336.
Clear<y>? yes

Directory inode 257690, block #0, offset 0: directory corrupted
Salvage<y>? yes

Directory entry for '.' in ??? (257690) is big.
Split<y>? yes

Missing '..' in directory inode 257690.
Fix<y>? yes

Setting filetype for entry '..' in ??? (257690) to 2.
Entry '..' in ??? (257704) has invalid inode #: 16924916.
Clear<y>? yes

Invalid inode number for '.' in directory inode 257715.
Fix<y>? yes

Entry 'plymo}th' in /usr/share/initramfs-tools/scripts/panic (257715) references inode 256692 in group 31 where _INODE_UNINIT is set.
Fix<y>? yes

Entry 'plymo}th' in /usr/share/initramfs-tools/scripts/panic (257715) has an incorrect filetype (was 65, should be 0).
Fix<y>? yes

Directory inode 257722, block #0, offset 12: directory corrupted
Salvage<y>? yes

Second entry 'chqngelog.ez' (inode=257724) in directory inode 257722 should be '..'
Fix<y>? yes
The puzzeling part is that ... corrupted files and directories are not frequently used files; the frequent works happens in /root and /var; but most damages occur in /usr ... which have not been updated, or upgraded in any way since installation (months ago).

I ran "e2fsck -f -y /dev/sdk2" 5 times; the 3 first times found problems. After remounting the volume, lostfound is not empty:

Code: Select all

# ls /media/Hmmm_-_sdk2/lost+found/
#197792  #257689  #257734  #257954  #257985  #258005  #258018  #258041  #258057  #258180  #258193  #258217  #258268
#197793  #257696  #257738  #257955  #257988  #258006  #258019  #258042  #258058  #258181  #258194  #258218  #258269
#197795  #257697  #257739  #257956  #257990  #258007  #258020  #258044  #258059  #258182  #258202  #258219  #258270
#197796  #257698  #257740  #257960  #257991  #258008  #258021  #258045  #258060  #258183  #258207  #258257  #258298
#197797  #257705  #257745  #257961  #257992  #258009  #258023  #258046  #258061  #258184  #258208  #258258  #258313
#197798  #257706  #257746  #257962  #257993  #258010  #258027  #258047  #258062  #258185  #258209  #258260  #258323
#197877  #257707  #257747  #257963  #257994  #258011  #258028  #258048  #258063  #258186  #258210  #258261  #260856
#197878  #257708  #257763  #257964  #257996  #258012  #258029  #258049  #258064  #258187  #258211  #258262  #260861
#257679  #257709  #257764  #257972  #257998  #258013  #258030  #258050  #258175  #258188  #258212  #258263  #260875
#257680  #257710  #257765  #257977  #258001  #258014  #258031  #258051  #258176  #258189  #258213  #258264  #260880
#257681  #257716  #257766  #257981  #258002  #258015  #258034  #258052  #258177  #258190  #258214  #258265  #260884
#257682  #257723  #257767  #257982  #258003  #258016  #258037  #258055  #258178  #258191  #258215  #258266
#257688  #257724  #257951  #257983  #258004  #258017  #258040  #258056  #258179  #258192  #258216  #258267
and according to my personnal criteria, this means "volume is dead, format it all".

So, I end up with the same issue as previously; but, I have gained two steps:
- 10d before the volume dies, one service started to break, and generate logs
- after automated reboot, I had "EXT4-fs (mmcblk0p2): error count since last fsck" growing by 1 to 4 units per day.


Complete logs:

Code: Select all

# cat messages.1 | grep EXT4-fs 
2018-01-23T12:51:06.016420+01:00 rpi-08-locataire-42-etage-droit kernel: [    2.645804] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
2018-01-23T12:51:06.017346+01:00 rpi-08-locataire-42-etage-droit kernel: [    7.511608] EXT4-fs (mmcblk0p2): re-mounted. Opts: commit=60
2018-01-24T12:52:27.998410+01:00 rpi-08-locataire-42-etage-droit kernel: [86481.017530] EXT4-fs (mmcblk0p2): error count since last fsck: 4
2018-01-24T12:52:27.998469+01:00 rpi-08-locataire-42-etage-droit kernel: [86481.017553] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-24T12:52:27.998482+01:00 rpi-08-locataire-42-etage-droit kernel: [86481.017574] EXT4-fs (mmcblk0p2): last error at time 1516771509: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-01-25T12:54:15.508480+01:00 rpi-08-locataire-42-etage-droit kernel: [172989.684696] EXT4-fs (mmcblk0p2): error count since last fsck: 8
2018-01-25T12:54:15.508537+01:00 rpi-08-locataire-42-etage-droit kernel: [172989.684732] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-25T12:54:15.508568+01:00 rpi-08-locataire-42-etage-droit kernel: [172989.684758] EXT4-fs (mmcblk0p2): last error at time 1516861398: htree_dirblock_to_tree:987: inode 258158: block 1056785
2018-01-26T12:56:03.038416+01:00 rpi-08-locataire-42-etage-droit kernel: [259498.385475] EXT4-fs (mmcblk0p2): error count since last fsck: 9
2018-01-26T12:56:03.038477+01:00 rpi-08-locataire-42-etage-droit kernel: [259498.385498] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-26T12:56:03.038490+01:00 rpi-08-locataire-42-etage-droit kernel: [259498.385535] EXT4-fs (mmcblk0p2): last error at time 1516944329: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-01-27T12:57:50.558441+01:00 rpi-08-locataire-42-etage-droit kernel: [346007.088816] EXT4-fs (mmcblk0p2): error count since last fsck: 10
2018-01-27T12:57:50.558508+01:00 rpi-08-locataire-42-etage-droit kernel: [346007.088838] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-27T12:57:50.558521+01:00 rpi-08-locataire-42-etage-droit kernel: [346007.088889] EXT4-fs (mmcblk0p2): last error at time 1517030736: htree_dirblock_to_tree:987: inode 258157: block 1056784
# cat messages | grep EXT4-fs 
2018-01-28T12:59:38.078450+01:00 rpi-08-locataire-42-etage-droit kernel: [432515.809898] EXT4-fs (mmcblk0p2): error count since last fsck: 11
2018-01-28T12:59:38.078512+01:00 rpi-08-locataire-42-etage-droit kernel: [432515.809920] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-28T12:59:38.078551+01:00 rpi-08-locataire-42-etage-droit kernel: [432515.809941] EXT4-fs (mmcblk0p2): last error at time 1517117130: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-01-29T13:01:25.598430+01:00 rpi-08-locataire-42-etage-droit kernel: [519024.498595] EXT4-fs (mmcblk0p2): error count since last fsck: 12
2018-01-29T13:01:25.598495+01:00 rpi-08-locataire-42-etage-droit kernel: [519024.498617] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-29T13:01:25.598508+01:00 rpi-08-locataire-42-etage-droit kernel: [519024.498638] EXT4-fs (mmcblk0p2): last error at time 1517203529: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-01-30T13:03:13.108594+01:00 rpi-08-locataire-42-etage-droit kernel: [605533.179754] EXT4-fs (mmcblk0p2): error count since last fsck: 16
2018-01-30T13:03:13.108652+01:00 rpi-08-locataire-42-etage-droit kernel: [605533.179813] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-30T13:03:13.108665+01:00 rpi-08-locataire-42-etage-droit kernel: [605533.179838] EXT4-fs (mmcblk0p2): last error at time 1517289926: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-01-31T13:05:00.638445+01:00 rpi-08-locataire-42-etage-droit kernel: [692041.856768] EXT4-fs (mmcblk0p2): error count since last fsck: 17
2018-01-31T13:05:00.638518+01:00 rpi-08-locataire-42-etage-droit kernel: [692041.856790] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-01-31T13:05:00.638530+01:00 rpi-08-locataire-42-etage-droit kernel: [692041.856811] EXT4-fs (mmcblk0p2): last error at time 1517376328: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-02-01T13:06:48.148619+01:00 rpi-08-locataire-42-etage-droit kernel: [778550.577667] EXT4-fs (mmcblk0p2): error count since last fsck: 18
2018-02-01T13:06:48.148705+01:00 rpi-08-locataire-42-etage-droit kernel: [778550.577711] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-02-01T13:06:48.148718+01:00 rpi-08-locataire-42-etage-droit kernel: [778550.577737] EXT4-fs (mmcblk0p2): last error at time 1517462734: htree_dirblock_to_tree:987: inode 258157: block 1056784
2018-02-02T13:08:35.678444+01:00 rpi-08-locataire-42-etage-droit kernel: [865059.295129] EXT4-fs (mmcblk0p2): error count since last fsck: 22
2018-02-02T13:08:35.678553+01:00 rpi-08-locataire-42-etage-droit kernel: [865059.295151] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-02-02T13:08:35.678567+01:00 rpi-08-locataire-42-etage-droit kernel: [865059.295200] EXT4-fs (mmcblk0p2): last error at time 1517556895: htree_dirblock_to_tree:987: inode 258158: block 1056785
2018-02-03T13:10:23.198439+01:00 rpi-08-locataire-42-etage-droit kernel: [951568.040311] EXT4-fs (mmcblk0p2): error count since last fsck: 23
2018-02-03T13:10:23.198515+01:00 rpi-08-locataire-42-etage-droit kernel: [951568.040334] EXT4-fs (mmcblk0p2): initial error at time 1516708266: htree_dirblock_to_tree:987: inode 257980: block 1056762
2018-02-03T13:10:23.198528+01:00 rpi-08-locataire-42-etage-droit kernel: [951568.040355] EXT4-fs (mmcblk0p2): last error at time 1517635527: htree_dirblock_to_tree:987: inode 258157: block 1056784
Note that the time laps between reports is exactly 1d+107s (one day, one minute, and 47s), starting after boot time. Which service runs at this frequency ? does the EXT4 driver include a daily report ?

The point where my script detected an issue:

Code: Select all

2018-01-23T12:50:01.525255+01:00 rpi-08-locataire-42-etage-droit logger: /usr/local/bin/edf-teleinformation-cron.sh : some log file drew very big; rebooting.
2018-01-23T12:50:01.540868+01:00 rpi-08-locataire-42-etage-droit logger: 1#011/dev/shm/TeleInformation/cron.Locataire42EtageDroit.dblive.tmp
2018-01-23T12:50:01.542270+01:00 rpi-08-locataire-42-etage-droit logger: 1#011/dev/shm/TeleInformation/cron.schema.Locataire42EtageDroit.main
2018-01-23T12:50:01.542861+01:00 rpi-08-locataire-42-etage-droit logger: 1#011/dev/shm/TeleInformation/cron.schema.Locataire42EtageDroit.tmp
2018-01-23T12:50:01.543605+01:00 rpi-08-locataire-42-etage-droit logger: 0#011/dev/shm/TeleInformation/do_cron-Locataire42EtageDroit
2018-01-23T12:50:01.544062+01:00 rpi-08-locataire-42-etage-droit logger: 21#011/dev/shm/TeleInformation/Locataire42EtageDroit
2018-01-23T12:50:01.544522+01:00 rpi-08-locataire-42-etage-droit logger: 0#011/dev/shm/TeleInformation/probe
2018-01-23T12:50:01.545182+01:00 rpi-08-locataire-42-etage-droit logger: 0#011/dev/shm/TeleInformation/show-all
2018-01-23T12:50:01.620023+01:00 rpi-08-locataire-42-etage-droit rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="476" x-info="http://www.rsyslog.com"] exiting on signal 15.
2018-01-23T12:51:06.012487+01:00 rpi-08-locataire-42-etage-droit rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="471" x-info="http://www.rsyslog.com"] start
2018-01-23T12:51:06.013684+01:00 rpi-08-locataire-42-etage-droit kernel: [    0.000000] Booting Linux on physical CPU 0x0
2018-01-23T12:51:06.013702+01:00 rpi-08-locataire-42-etage-droit kernel: [    0.000000] Initializing cgroup subsys cpuset
2018-01-23T12:51:06.013713+01:00 rpi-08-locataire-42-etage-droit kernel: [    0.000000] Initializing cgroup subsys cpu
2018-01-23T12:51:06.013722+01:00 rpi-08-locataire-42-etage-droit kernel: [    0.000000] Initializing cgroup subsys cpuacct
2018-01-23T12:51:06.013730+01:00 rpi-08-locataire-42-etage-droit kernel: [    0.000000] Linux version 4.4.11-v7 ([email protected]) (gcc version 4.8.3 20140303 (prerelease) (crosstool-NG linaro-1.13.1+bzr2650 - Linaro GCC 2014.03) ) #47 SMP Thu Jun 16 21:57:07 CST 2016
The code that detects issues:

Code: Select all

                /usr/bin/du -lsm "${TeleInformation_tmp}/"* | /usr/bin/awk '{print $1}' | while read line ; do [ $line -gt 20 ] 2>/dev/null && {
                        date
                        echo "There seem to be a big log; rebooting."
                        echo "$0 : some log file drew very big; rebooting." | /usr/bin/logger
                        /usr/bin/du -lsm "${TeleInformation_tmp}/"* | /usr/bin/logger
                        /sbin/reboot
                        }
I have added a new line to backup those logs before reboot (copy them from /dev/shm into /root). But, they will not help.

The most usefull thing to do is to stop using PL2303 ASAP.

Note that since a few weeks, all my machines run fstrim weekly.

Edit: while trying to stop using the PL2303, my application was not detected on /dev/ttyAMA0, so I am using an FTDI on this rpi-08. I know for sure than an other rpi was able to communicate with my application directly via the internal serial port; maybe it was an other board version, or using an other Raspbian image ... not digging this detail now.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sat Mar 03, 2018 10:44 pm

Filesystem crashed again with an FT232 (FTDI).

User avatar
scruss
Posts: 2542
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON
Contact: Website

Re: broken filesystem, e2fsck can not fix things

Sun Mar 04, 2018 4:28 pm

would logrotate help you? It allows logs to be mailed to another user/another machine, so at least you wouldn't lose old logs when the Pi crashed. But I'm still mystified why a 1200 baud connection seems to soak your equipment.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.

doublehp
Posts: 77
Joined: Wed May 02, 2012 1:11 am

Re: broken filesystem, e2fsck can not fix things

Sun Mar 04, 2018 6:55 pm

I am not going to send logrotates over email, but use a network sysloger (all local syslog can send output to a network server; it's the best way to track hard disk and NFS issues; on most hardware problems, the network card is usually the last working component; usually much longer than the video card).

Return to “Troubleshooting”