Living life on the edge

Whoosh! There goes my redundancy! Fingers crossed I don’t lose another disk before the replacements arrive!

-------------------
Tue Nov 29 17:01:01 [bash:5.1.16 jobs:0 error:0 time:6361]
root@love:/home/jj5
# zpool status
  pool: data
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0B in 1 days 10:20:21 with 0 errors on Mon Nov 14 10:44:22 2022
config:

        NAME                     STATE     READ WRITE CKSUM
        data                     DEGRADED     0     0     0
          mirror-0               DEGRADED     0     0     0
            sda                  ONLINE       0     0     0
            9460704850353196665  OFFLINE      0     0     0  was /dev/sdb1
          mirror-1               DEGRADED     0     0     0
            2467357469475118468  OFFLINE      0     0     0  was /dev/sdc1
            sdd                  ONLINE       0     0     0
        cache
          nvme0n1p4              ONLINE       0     0     0

errors: No known data errors

  pool: fast
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: resilvered 2.70G in 00:01:35 with 0 errors on Tue Nov 29 13:28:02 2022
config:

        NAME         STATE     READ WRITE CKSUM
        fast         ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            sde      ONLINE       0     0     0
            sdf      ONLINE       0     0     1
        cache
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

  pool: temp
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:21:59 with 0 errors on Sun Nov 13 00:46:02 2022
config:

        NAME         STATE     READ WRITE CKSUM
        temp         ONLINE       0     0     0
          nvme0n1p5  ONLINE       0     0     0

errors: No known data errors
-------------------

I love being a programmer

My ZFS RAID array is resilvering. It’s a long recovery process. A report on progress looks like this:

Every 10.0s: zpool status                                             love: Tue May  4 22:32:27 2021

  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun May  2 20:26:52 2021
        1.89T scanned out of 5.03T at 11.0M/s, 83h19m to go
        967G resilvered, 37.54% done
config:

        NAME                       STATE     READ WRITE CKSUM
        data                       DEGRADED     0     0     0
          mirror-0                 ONLINE       0     0     0
            sda                    ONLINE       0     0     0
            sdb                    ONLINE       0     0     0
          mirror-1                 DEGRADED     0     0     0
            replacing-0            DEGRADED     0     0     0
              4616223910663615641  UNAVAIL      0     0     0  was /dev/sdc1/old
              sdc                  ONLINE       0     0     0  (resilvering)
            sdd                    ONLINE       0     0     0
        cache
          nvme0n1p4                ONLINE       0     0     0

errors: No known data errors

So that 83h19m to go wasn’t in units I could easily grok, what I wanted to know was how many days. Lucky for me, I’m a computer programmer!

First I wrote watch-zpool-status.php:

#!/usr/bin/env php
<?php

main( $argv );

function main( $argv ) {

  $status = fread( STDIN, 999999 );

  if ( ! preg_match( '/, (\d+)h(\d+)m to go/', $status, $matches ) ) {

    die( "could not read zpool status.\n" );

  }

  $hours = intval( $matches[ 1 ] );
  $minutes = intval( $matches[ 2 ] );

  $minutes += $hours * 60;

  $days = $minutes / 60.0 / 24.0;

  $report = number_format( $days, 2 );

  echo "days remaining: $report\n";

}

And then I wrote watch-zpool-status.sh to run it:

#!/bin/bash

watch -n 10 'zpool status | ./watch-zpool-status.php'

So now it reports that there are 3.47 days remaining, good to know!