How to mirror the [RedHat] Linux root (/) partition?

by Andy Polyakov <appro@fy.chalmers.se>, November 2000


To begin with I have to forward you to the Software RAID HOWTO for background information... Now when you're back I want to point out that even though the instructions provided below are RedHat specific (e.g. they depend on kernel support for RAID autodetection(*), raidtools being available during the installation/rescue, lilo being patched to support RAID), you should be able to adapt them for other Linux flavors as well. Moreover, if the root partition is followed by a swap partition (or any other you can sacrifice and shrink by couple of cylinders), it's perfectly possible to adapt the intructions and mirror the current root partition without reinstalling OS or shuffling any data around. The latter by the way is what distinguishes this procedure from the ones described in section 4 of already mentioned HOWTO document.


  1. start RedHat installation procedure and when it comes to partitioning, choose the [fdisk] option:
    # fdisk /dev/sda
    
    The number of cylinders for this disk is set to 1106.
    There is nothing wrong with that, but this is larger than 1024,
    and could in certain setups cause problems with:
    1) software that runs at boot time (e.g., LILO)
    2) booting and partitioning software from other OSs
       (e.g., DOS FDISK, OS/2 FDISK)
    
    Command (m for help): p
    
    Disk /dev/sda: 255 heads, 63 sectors, 1106 cylinders
    Units = cylinders of 16065 * 512 bytes
    
    

    First of all note how large is one cylinder (Units line above). The catch is that it should be large enough to accomodate so called RAID persistent block. It takes up to 127.5KB or 255 * 512 bytes to accomodate one. Well, I find it very hard to believe that you'll find a disk with cylinders smaller than 255 blocks, but anyway...

  2. partition the disk leaving 1 cylinder (or more, enough to accomodate the multi-device superblock mentioned above) gaps(**) between partitions as well as between the last partition and the end of the disk (or extended partition edge(s) for that matter):

       Device Boot    Start       End    Blocks   Id  System
    /dev/sda1   *         1       131   1052226   83  Linux
    /dev/sda2           133       165    265072+  82  Linux swap
    /dev/sda3           167      1105   7542517+  83  Linux
    

  3. proceed with installation and bring the system up;

  4. create an initial ramdisk with preloaded RAID-1 module:
    # mkinitrd --preload raid1 /boot/initrd-`uname -r`.img `uname -r`
    

    and preload the ramdisk with lilo:

    # cat /etc/lilo.conf
    boot=/dev/sda
    map=/boot/map
    install=/boot/boot.b
    #prompt
    timeout=50
    linear
    default=linux
    
    image=/boot/vmlinuz-2.2.16-3
            label=linux
            initrd=/boot/initrd-2.2.16-3.img
            read-only
            root=/dev/sda1
    # lilo
    Added linux *
    

    Needless to mention that compiling customized kernel (beware (*)) with RAID-1 code permanently compiled into it shall do the trick as well. In either case it's more than appropriate to reboot at this point in order to verify that the system is bootable and the raid1.o module is preloded;

  5. now reboot (yes, again) from the CD choosing linux rescue option;

  6. start with creating some raw devices:
    # cat /proc/partitions
    major minor  #blocks  name
    
       8     0    8886750 sda
       8     1    1052226 sda1
       8     2     265072 sda2
       8     3    7542517 sda3
    # mknod /dev/sda  b 8 0
    # mknod /dev/sda1 b 8 1
    # mknod /dev/sda2 b 8 2
    

  7. invoke fdisk, delete all the patitions and recreate 'em with the very same initial offsets, but without any gaps between partitions and toggling the partition system ids to fd (compare with the table above):
       Device Boot    Start       End    Blocks   Id  System
    /dev/sda1   *         1       132   1060258+  fd  Linux raid autodetect
    /dev/sda2           133       166    273105   fd  Linux raid autodetect
    /dev/sda3           167      1106   7550550   fd  Linux raid autodetect
    
    Command (m for help): w
    

  8. create /etc/raidtab (with pico, arrrgh!) and initialize RAID devices:
    # cat /etc/raidtab
    raiddev	/dev/md0
    	raid-level	1
    	nr-raid-disks	2
    	nr-spare-disks	0
    	chunk-size	4
    	persistent-superblock	1
    	device		/dev/sda1
    	raid-disk	0
    	device		/dev/sdp1
    	failed-disk	1
    raiddev	/dev/md1
    	raid-level	1
    	nr-raid-disks	2
    	nr-spare-disks	0
    	chunk-size	4
    	persistent-superblock	1
    	device		/dev/sda2
    	raid-disk	0
    	device		/dev/sdp2
    	failed-disk	1
    # grep md\$ /proc/devices
      9 md
    # mknod /dev/md0 b 9 0
    mknod: /dev/md0: File exists
    # mkraid --force /dev/md0
    # mknod /dev/md1 b 9 1
    # mkraid --force /dev/md1
    # cat /proc/mdstat
    Personalities : [raid1] 
    read_ahead 1024 sectors
    md0 : active raid1 sda1[0] 1060160 blocks [2/1] [U_]
    md1 : active raid1 sda2[0] 273024 blocks [2/1] [U_]
    unused devices: <none>
    

    Well, it's not much of a RAID-1, huh? No, as you see there're no mirrors. Meaningless from first glance but still usefull (well, in my opinion) during installation and especially upgrade when you might choose to detach the old mirror partition and keep it as a "roll-back-quickly" backup.

  9. at this point you should be able to mount the root file system and copy /etc/raidtab to it:
    # mount /dev/md0 /tmp/a
    # cp -i /etc/raidtab /tmp/a/etc/raidtab.root
    

  10. modify fstab replacing /dev/sdX with corresponding /dev/mdY:
    # cat /tmp/a/etc/fstab
    /dev/md0                /                       ext2    defaults        1 1
    /dev/md1                swap                    swap    defaults        0 0
    ...
    

  11. modify lilo.conf accordingly and execute lilo:
    # cat /tmp/a/etc/lilo.conf
    boot=/dev/md0
    map=/boot/map
    install=/boot/boot.b
    #prompt
    timeout=50
    linear
    default=linux
    
    image=/boot/vmlinuz-2.2.16-3
            label=linux
            initrd=/boot/initrd-2.2.16-3.img
            read-only
            root=/dev/md0
    # /tmp/a/sbin/lilo -r /tmp/a
    boot = /dev/sda, map = /boot/map.0801
    Added linux *
    

  12. do unmount the file system and do stop the RAID devices:
    # umount /tmp/a
    # raidstop /dev/md0
    # raidstop /dev/md1
    

  13. reboot and verify that the system comes up while the / is mounted on /dev/md0;

  14. throw the mirror partitions in:
    # raidhotadd -c /dev/null /dev/md0 /dev/sdb1
    # raidhotadd -c /dev/null /dev/md1 /dev/sdb2
    # cat /proc/mdstat
    Personalities : [raid1] 
    read_ahead 1024 sectors
    md0 : active raid1 sdb1[1] sda1[0] 1060160 blocks [2/2] [UU]
    md1 : active raid1 sdb2[1] sda2[0] 273024 blocks [2/2] [UU]
    unused devices: <none>
    

    It's definitely good idea to wait till recovery disappers from mdX line before you proceed with mdX+1 if some of the involved partitions reside on the same physical disk. For better performance that is...

  15. rerun lilo (and configure your BIOS to fall down to /dev/sdb when looking for bootable devices);


Just few words about upgrade procedure. The keyword is "break the mirror before upgrading." Depending on how smart upgrade program is you might also have to change partition system ids to non-raid type as well as fix-up /etc/fstab. If you plan to rebuild some of the file systems you also want to shrink corresponding patition(s) by one cylinder (unless (**) below comes true). I hope you get my drift...


(*)
Note that support for RAID autodetection-n-start makes its first appearance only in some 2.3.x network kernel release, while RedHat provides this functionality already in their customized 2.2>12 kernels. However, even if your 2.2.x kernel "vendor" doesn't support this (and you don't feel like adopting RedHat patches), you still have the option to pass the first RAID-1 partition as root kernel argument in /etc/lilo.conf and modify your startup script(s) so that at least /dev/md0 gets "spinned up" before / is remounted read-write on it:
# cat /etc/lilo.conf
boot=/dev/sda
map=/boot/map
install=/boot/boot.b
#prompt
timeout=50
linear
default=linux

image=/boot/vmlinuz-2.2.16-3
        label=linux
        initrd=/boot/initrd-2.2.16-3.img
        read-only
        root=/dev/sda1
# cat /etc/fstab
/dev/md0                /                       ext2    defaults        1 1
...

In this case you also want to run lilo twice in order to disseminate the boot block, e.g. lilo -b /dev/sdb (keep in mind that /dev/sdb becomes /dev/sda if original /dev/sda fails and the system is rebooted).


(**)
Well, if mke2fs and mkswap would limit themselves to (sizeof(partition)/64K-1)*64K, then we wouldn't have to waste a cylinder per every partition we want to mirror. It shouldn't be impossible to make e2fsck (silently?) "convert" even current filesystems by creating a hidden file covering the area used by the md superblock either... Something to consider for next e2fsprogs revision:-)