IDE Raid, ServerWorks OSB4 chip: Stable and Fast?

Tuesday December 24, 2002 | Permalink | Comments (7)

Update: 02.12.27: DON'T DO IT!. I was getting data corruption when copying mp3's. I tried it without raid and lvm. The only thing that fixed it was to skip the hdparm that changed the hardrive modes. I'm now back to a non-dma mode. It's slow as hell, but *very* stable. The main goal here is a large safe storage area. So, the slow down is not a big deal, but disappointing.

My new raid setup is still shaking out gremlins for me to thonk with a stick. The raid & lvm components of the 2.4 kernel seem pretty reliable, the serverworks OSB4 chipset drivers are just plain scary. Whether bugs in the chip or the driver, I don't know, but UDMA doesn't work right, but without it performance suffers. Here are my notes and work arounds to getting good dma performance on IDE raid using ServerWorks ide controllers.

Init script

I can't figure out how to set the drives to MDMA2(multiword DMA mode2) from boot. hdparm -K1 doesn't work, nor does idebus=34, and antyhing else i've tried results in UDMA33mode which leads to hard locks. So, I'm using a startup script. The kernel enables the raid arrays automagically (before init). So we have to bring them down before changing hdparms. It runs in rcS.d right after modules, but before the lvm. So here's my boot script.


---------------------------------------------------------------------
#!/bin/bash

# this script is an unfortunate necessity becuase of problem with the
# osb4 chipset and linux. UDMA33 locks the computer. The best transfer
# mode that isn't UDMA33 is MDM2. Still very fast. I can't figure out
# how to set it at the lilo prompt and the -K1 switch to hdparm doesn't
# remember settings across reboot. So here is a custom hand job.
# -X34 is MDMA2 (32 plus the mode)
# -d1 DMA on

# if the raid1 arrays are up then and syncing then hdparms can be
destructive
# bring them down first
echo "JIM: stopping ide raid arrays"
/sbin/raidstop /dev/md1
/sbin/raidstop /dev/md2

# do the hdparm magic. I love this tool!
echo "JIM: setting ide drives to MDMA2"
/sbin/hdparm -X34 -d1 /dev/hd{a,b,c,d}

# bring the arrays back up. They'll sync faster and be ready for lvm
echo "JIM: restarting ide raid arrays"
/sbin/raidstart /dev/md1
/sbin/raidstart /dev/md2
---------------------------------------------------------------------

Performance for raid syncing

"Normal Mode" (not sure what to call it, PIO4?). I get pitiful performance about 1000K/sec with 50-70% CPU(s) utilization. I/O starvation, but raidsyncing very reliable.

UDMA33 (UDMA2) Mode I got about 8000K/sec(!) at 30% cpu. But who cares if the system locks in 30 seconds.

MWDMA2 (MDMA2) Mode I get about 5000K/sec. But only like 5% cpu utilized! Very reliable! The cost to benefit winner!


styx:~# hdparm -tT /dev/hd{a,b,c,d}

/dev/hda:
 Timing buffer-cache reads:   128 MB in  0.58 seconds =220.69 MB/sec
 Timing buffered disk reads:  64 MB in  5.45 seconds = 11.74 MB/sec

/dev/hdb:
 Timing buffer-cache reads:   128 MB in  0.59 seconds =216.95 MB/sec
 Timing buffered disk reads:  64 MB in  4.36 seconds = 14.68 MB/sec

/dev/hdc:
 Timing buffer-cache reads:   128 MB in  0.59 seconds =216.95 MB/sec
 Timing buffered disk reads:  64 MB in  5.48 seconds = 11.68 MB/sec

/dev/hdd:
 Timing buffer-cache reads:   128 MB in  0.58 seconds =220.69 MB/sec
 Timing buffered disk reads:  64 MB in  4.55 seconds = 14.07 MB/sec

styx:~# hdparm -i /dev/hd{a,b,c,d}

/dev/hda:

 Model=SAMSUNG SV1204H, FwRev=RK100-12, SerialNo=0504J1ETB06345
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234493056
 IORDY=yes, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 
 AdvancedPM=no WriteCache=enabled
 Drive Supports : fastATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 


/dev/hdb:

 Model=Maxtor 4W100H6, FwRev=AAH01310, SerialNo=W6H265ZC
 Config={ Fixed }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=195711264
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive Supports : ATA/ATAPI-6 T13 1410D revision 0 : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 


/dev/hdc:

 Model=SAMSUNG SV1204H, FwRev=RK100-12, SerialNo=0504J1ETB01647
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=34902, SectSize=554, ECCbytes=4
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234493056
 IORDY=yes, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 *mdma2 udma0 udma1 udma2 
 AdvancedPM=no WriteCache=enabled
 Drive Supports : fastATA-1 ATA-2 ATA-3 ATA-4 ATA-5 ATA-6 


/dev/hdd:

 Model=WDC WD1000BB-00CCB0, FwRev=22.04A22, SerialNo=WD-WMA9P1031305
 Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
 RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40
 BuffType=DualPortCache, BuffSize=2048kB, MaxMultSect=16, MultSect=16
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=195371568
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4 
 DMA modes: mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5 
 AdvancedPM=no WriteCache=enabled
 Drive Supports : Reserved : ATA-1 ATA-2 ATA-3 ATA-4 ATA-5

Notes

I was getting this error with UDMA33. The system hardlocked
"Serverwrosk osb4 in impossible state. disable udma or if you are using seagate then try switching disk types on this controller. Please report this event to osb4-bug@ide.cabal.tm."

alan cox says to go to (MWDMA2) because of these hardlocks

people bitching and disussing 4 bit shift problem

look under dma mode for bios setting MWDMA2. I can't use b/c bios ide disabled to facilitate scsi boot

from the hdparm man page
"[...]where -X34 is used to select multiword DMA mode2 transfers[...]"
hdparm -X34 -d1 /dev/hd{a,b,c,d}

7 Comments

By jim on December 24, 2002 3:23 PM

One problem I just ran into is that the superblocks on my raid arrays got screwed up because of my scewing around on successive reboots with the ide controller. I had to drop the box a couple of times. This led to out of sync raid disks. They way you handle this is to mark the worst disk as failed in raidtab and then do a mkraid --force.

It worked. I didn't lose any data. But one lame side effect that is taking a lot of my time is the disk sizes. The disk that I marked as bad on the second raid1 array is smaller that the disk other disk. So my mkraid --force made an array that was to big to incorporate that other disk!

The only non-destructive way I can see around this is to backup the data else-where (thankfully I got my data back with the failed disk trick). So, I'm backing up to a spanned drive (raid0) on windows. Then I'll recreate the whole damn lot of raid1 arrays and logical volumes. This time though I'm going to make certain that the partitions are the exact same size.

Anybody who loses BOTH disks of a raid array should read the software raid howto and this (http://www.geocrawler.com/archives/3/57/2000/6/50/3936208/
). I'm going to paste the contents into the next reply because I think the concise informatio on recovering from a corrupted superblock is so important.

By jim on December 24, 2002 3:26 PM

I said I'd post it...



FROM: rvt

DATE: 06/24/2000 12:57:30

SUBJECT:  two disk failure, proposed howto changes Hello Jakob,

I recently had a two disk failure, and found your howto as well as a message
from Martin Bene very helpful in resolving this. Below you find a modified
version of chapter 6.1 of your faq, where I merged your and Martin´s version
in order to make things somewhat more detailed and explicit.

The new version may make the procedure for recovery more clear for people
experiencing the problem. And in that situation, they of course are happy
about all help they can get...

The two-disk-failed situation seems to happen relatively often (due to
controller/hardware failure/hickups), and this is where raid is effectively
more dangerous than non-raid (one large disk). A more automated and fool-proof
tool for resolving this might be the ideal solution (but more than I can
deliver currently).

If someone on the mailing list finds some mistake (am I really right about the
spare-disk?) or has an improved version, please post!

========= my proposed howto version:

6.1 Recovery from a multiple disk failure

The scenario is:

A controller dies and takes two disks offline at the same time,
All disks on one scsi bus can no longer be reached if a disk dies,
A cable comes loose...
In short: quite often you get a temporary failure of several disks at once;
afterwards the RAID superblocks are out of sync and you can no longer init
your RAID array.
One thing left: rewrite the RAID superblocks by mkraid --force

To get this to work, you`ll need to have an up to date /etc/raidtab - if it
doesn`t EXACTLY match devices and ordering of the original disks this won`t
work.

Look at the syslog produced by trying to start the array, you`ll see the event
count for each superblock. Usually it`s best to leave out the disk with the
lowest event count, i.e the one that failed first (by using "failed-disk").

It´s important that you replace "raid-disk" by "failed-disk" for that drive in
your raidtab. If you mkraid without that "failed-disk"-change, the recovery
thread will kick in immediately and start rebuilding the parity blocks. If you
got something wrong this will definitely kill your data. So, you mark one disk
as failed and create the array in degraded mode (the kernel won´t try to
recover/resync the array then).

With "failed-disk" you can specify exactly which disks you want to be active
and perhaps try different combinations for best results. BTW, only mount the
filesystem read-only while trying this out...

If you have a spare-disk, you should mark that as "failed-disk", too.

* Check your raidtab against the info you get in the logs from the failed
startup (correct sequence of partitions).
* mark one of the disks with the lowest event count as a "failed-disk"
instead of "raid-disk" in /etc/raidtab
* recreate the raid superblocks using mkraid
* try to mount readonly, check if all is OK
* if it doesn`t work, recheck raidtab, perhaps mark a different drive as
failed, go back to the mkraid step.
* unmount, so you can fsck your raid drive (which you probably want to do)
* add the last disk using raidhotadd
* mount normally
* remove the failed-disk stuff from your raidtab.

========= your original version at
http://www.ostenfeld.dk/~jakob/Software-RAID.HOWTO/

6.1 Recovery from a multiple disk failure

The scenario is:

To get this to work, you`ll need to have an up to date /etc/raidtab - if it
doesn`t EXACTLY match devices and ordering of the original disks this won`t
work.

Look at the sylog produced by trying to start the array, you`ll see the event
count for each superblock; usually it`s best to leave out the disk with the
lowest event count, i.e the oldest one.

If you mkraid without failed-disk, the recovery thread will kick in
immediately and start rebuilding the parity blocks - not necessarily what you
want at that moment.

With failed-disk you can specify exactly which disks you want to be active and
perhaps try different combinations for best results. BTW, only mount the
filesystem read-only while trying this out... This has been successfully used
by at least two guys I`ve been in contact with.

By jim on December 24, 2002 4:21 PM

OMG! 'split' is the unix tool of the week. It splits a file or stream into chunks of the specified byte count. Basically you can use tar, pipe it to split and get a huge tar file broken into chunks. Then you can untar the chunks with cat.

create (in 100 meg chunks)

tar -cvf - /some/directory | split -b 104857600 /place/prefix.tar.part.

restore

cat /place/prefix.tar.part.* | tar -xvf -

Now I wish they'd just prefix them numerically, but at least they sort easily.

By jim on December 26, 2002 10:09 AM

I ditched the raidtools2 and mdadm debian packages and am now using a compiled mdadm. Handy program. I use the --follow --scan with mdadm in an init script to email to an address specified in my mdadm.conf file (my cellphone). So I get a page if a disk fails. Handy that.

By jim on January 5, 2003 6:33 PM

I've not sampled the other revisions of the server works chipsets, but the OSB4 that I have is a piece of shiznite under linux. It's not supported under 2.2 and there is a tail of woe from Alan Cox on the kernel mailing list. Maybe it will come around to using a modern. I've moved to a promise card.

By jim on April 30, 2004 3:52 PM

I agree it is the shiznite!

Wiit

---
theshiznite.com - the site for free thinkers

By jim on April 30, 2004 3:53 PM

Anybody heard about this new RAID 6 shiznite?

Wiit

---
theshiznite.com - the site for free thinkers

IDE Raid, ServerWorks OSB4 chip: Stable and Fast?

Categories:

7 Comments