[Discuss] Update on Raspberry Pi4 ZFS Problems

Sat Sep 3 16:18:01 EDT 2022

On 9/3/22 12:58, Kent Borg wrote:
> For what I am doing, the slow ports will plenty fast, I need to get 
> this thing working. I'm going to try to plow forward.

I decided to try *one* more experiment, before installing and 
configuring mail server software: I used fdisk to repartition the two 
disks, a single Linux partition on each, put XFS on each, re-plugged 
them into a fast USB port, mounted them, and fired up my file copying 
torture test, in stereo:

   Copy in /usr, then run 8 backgrounded "rsync -a"s to make copies,
   do that on both disks at once. Once all of that was done, move
   each gaggle of directories into a single directory, then fire up 7
   more backgrounded "rsync -a"s to copy it, again on both disks at
   once…

The amount of RAM "used" seems to be pretty stable at around 2.5GB, no, 
I guess climbing slowly (those 42 long-lived rsync processes are maybe 
leaking a little, or maybe efficiently using RAM). Somewhat less than 
5GB for "buff/cache", which seems stable, or actually falling. (Using 
less cache as the cache gradually synchronizes the processes…? And, as 
they synchronize, and performance details change various things rsync 
could do to optimize performance might legitimately need more RAM. But 
it keeps climbing, smells like a leak to me.)

Watching /var/log/syslog for sometime now I see boring slowly stuff go 
by...until I finally see an error!

  Sep  3 12:28:01 la kernel: [ 4188.087156] NOHZ tick-stop error: 
Non-RCU local softirq work is pending, handler #10!!!

And looking back in syslog I see three more of those in the last few 
days, always handler #10, too. I guess doing NOHZ on every supported CPU 
is hard to get right.

But I'm not getting any I/O errors: XFS, talking to spinning WD disks, 
over fast USB, on a Pi 4, seems solid as hell.

But ZFS can't do it. ZFS might be mature and production ready and 
reliable as hell...but not on this hardware with this OS.

I guess it is back to XFS on top of Linux SW raid 1 for this project.

My test is coming up on putting 300GB onto these disks, I've seem two 
hourly cron messages in syslog, still no I/O errors, time to hit send on 
this message.

-kb, the Kent who is disappointed it doesn't work, and that he had to 
spend so much time to get to that conclusion.