SMART error からソフトウェア RAID を復旧

Home » Linux » この記事

2010-3-19 01:40

CentOS 4.5 の HDD 2つをソフトウェア RAID 1 にしているサーバで HDD のセクターエラーが発生した模様。OS 再起動時に S.M.A.R.T. のデーモンから次のようなエラーメールが届きました。

Subject: SMART error (CurrentPendingSector) detected on host: server1

This email was generated by the smartd daemon running on:

   host name: server1
  DNS domain: [Unknown]
  NIS domain: (none)

The following warning/error was logged by the smartd daemon:

Device: /dev/hdc, 1 Currently unreadable (pending) sectors

For details see host's SYSLOG (default: /var/log/messages).

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.

dmesg コマンドで確認すると

dmesg
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=4326080, sector=4326076
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 4326076
raid1: Disk failure on hdc3, disabling device.
        Operation continuing on 1 devices
raid1: hdc3: rescheduling sector 20656

のようなエラーが。/dev/hdc3 のパーティションにセクターエラーが検知されたようです。ためしに /proc/mdstat を見てみると

 # cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 hdd2[1] hdc2[0]
      2048192 blocks [2/2] [UU]

md1 : active raid1 hdd3[1] hdc3[2](F)
      78260544 blocks [2/1] [_U]

md0 : active raid1 hdd1[1] hdc1[0]
      104320 blocks [2/2] [UU]

unused devices: <none>

のように md1 の RAID が片肺状態に。ここで e2fsck コマンドを使ってチェックする。

# e2fsck -p /dev/hdc3
/dev/hdc3: clean, 296347/9797632 files, 4469144/19565136 blocks

特に問題は無さそうなので、mdadm コマンドでパーティションを一度外し、再度加えることで RAID を再構築させる。

# mdadm /dev/md1 -r /dev/hdc3
mdadm: hot removed /dev/hdc3
# mdadm /dev/md1 -a /dev/hdc3
mdadm: hot added /dev/hdc3

/proc/mdstat を見ると再構築中であることが分かる。これが 100% になれば復旧は完了。

上記の方法はソフトウェア RAID で静的なパーティション設定をしていた例です。LVM を使っている場合は上記の方法とは異なる対処方法となります。

ブックマーク : アクセス: 13,687回
カテゴリー : Linux
キーワード : HDD, RAID, SMART

コメントはまだありません

No comments yet.

Sorry, the comment form is closed at this time.

futuremix