System freeze

My backup Kubuntu workstation began freezing shortly after boot recently. The problem was recurrent for a week or so and then I just switched the damn thing off. Have picked this up again recently to try and fix, because I would like to make use of this machine’s disks.

I ran MemTest86, and that surfaced no errors. Seems as though the RAM is okay and probably the CPU too.

The first thing I tried after the memory test was to remove the WD Blue 250GB SATA SSD M.2 which was in use as a ZFS cache. And this seems to have fixed the problem! Can’t be sure yet, but so far so good. Will continue to keep an eye on things…

Here are some photos for those playing along at home:

Resolving ZFS issue on ‘trick’

I’m gonna follow these instructions to replace a disk in one of my ZFS arrays on my workstation ‘trick’. Have ordered myself a new 6TB Seagate Barracuda Hard Drive for $179. I hope nobody minds if I make a few notes for myself here…

-------------------
Mon Sep 12 19:24:53 [bash:5.0.17 jobs:0 error:0 time:179]
root@trick:/home/jj5
# zpool status data
  pool: data
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub canceled on Mon Sep 12 19:24:53 2022
config:

        NAME                                     STATE     READ WRITE CKSUM
        data                                     DEGRADED     0     0     0
          mirror-0                               DEGRADED     0     0     0
            scsi-SATA_ST6000VN0041-2EL_ZA16N49H  DEGRADED 11.1K     0 54.6K  too many errors
            scsi-SATA_ST6000VN0041-2EL_ZA16N4ZH  ONLINE       0     0     0

errors: No known data errors
-------------------
Mon Sep 12 19:40:13 [bash:5.0.17 jobs:0 error:0 time:1099]
root@trick:/home/jj5
# zdb
data:
    version: 5000
    name: 'data'
    state: 0
    txg: 2685198
    pool_guid: 1339265133722772877
    errata: 0
    hostid: 727553668
    hostname: 'trick'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 1339265133722772877
        create_txg: 4
        children[0]:
            type: 'mirror'
            id: 0
            guid: 802431090802465148
            metaslab_array: 256
            metaslab_shift: 34
            ashift: 12
            asize: 6001160355840
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 129
            children[0]:
                type: 'disk'
                id: 0
                guid: 9301639020686187487
                path: '/dev/disk/by-id/scsi-SATA_ST6000VN0041-2EL_ZA16N49H-part1'
                devid: 'ata-ST6000VN0041-2EL11C_ZA16N49H-part1'
                phys_path: 'pci-0000:00:17.0-ata-3'
                whole_disk: 1
                DTL: 28906
                create_txg: 4
                com.delphix:vdev_zap_leaf: 130
                degraded: 1
                aux_state: 'err_exceeded'
            children[1]:
                type: 'disk'
                id: 1
                guid: 4734211194602915183
                path: '/dev/disk/by-id/scsi-SATA_ST6000VN0041-2EL_ZA16N4ZH-part1'
                devid: 'ata-ST6000VN0041-2EL11C_ZA16N4ZH-part1'
                phys_path: 'pci-0000:00:17.0-ata-4'
                whole_disk: 1
                DTL: 28905
                create_txg: 4
                com.delphix:vdev_zap_leaf: 131
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data

I think the commands I’m gonna need are:

# zpool offline data 9301639020686187487
# zpool status data
# shutdown # and replace disk
# zpool replace data 9301639020686187487 /dev/disk/by-id/scsi-SATA_ST6000DM003-2CY1_WSB076SN
# zpool status data

Ethernet on ‘trick’

Note to self: I’ve disabled my second NIC enp7s0 for now, I can enable it when its cable arrives.

-------------------
Mon Mar 28 16:34:31 [bash:5.0.17 jobs:0 error:0 time:1505]
root@trick:/home/jj5
# cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    enp10s0:
      addresses:
      - 10.3.2.5/16
      gateway4: 10.3.1.1
      nameservers:
        addresses:
        - 10.1.1.113
        search: []
    #enp7s0:
    #  addresses:
    #  - 10.1.2.5/16
    #  nameservers:
    #    addresses: []
    #    search: []
  version: 2

-------------------