ログ日記

作業ログと日記とメモ

ataエラー調査

エラーで時々固まる。

Jun 29 18:42:18 node2 kernel: [156554.787521] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jun 29 18:42:18 node2 kernel: [156554.787555] ata1.00: failed command: WRITE DMA
Jun 29 18:42:18 node2 kernel: [156554.787585] ata1.00: cmd ca/00:08:21:d1:f9/00:00:00:00:00/e3 tag 0 dma 4096 out
Jun 29 18:42:18 node2 kernel: [156554.787586]          res 40/00:03:00:fe:00/00:00:00:00:00/b0 Emask 0x4 (timeout)
Jun 29 18:42:18 node2 kernel: [156554.787676] ata1.00: status: { DRDY }
Jun 29 18:42:18 node2 kernel: [156554.787710] ata1.00: hard resetting link
Jun 29 18:42:18 node2 kernel: [156555.107483] ata1.01: hard resetting link
Jun 29 18:42:23 node2 kernel: [156560.623472] ata1.00: link is slow to respond, please be patient (ready=0)
Jun 29 18:42:28 node2 kernel: [156564.823474] ata1.00: SRST failed (errno=-16)
Jun 29 18:42:28 node2 kernel: [156564.823512] ata1.00: hard resetting link
Jun 29 18:42:28 node2 kernel: [156565.143483] ata1.01: hard resetting link
Jun 29 18:42:33 node2 kernel: [156570.659473] ata1.00: link is slow to respond, please be patient (ready=0)
Jun 29 18:42:38 node2 kernel: [156574.859473] ata1.00: SRST failed (errno=-16)
Jun 29 18:42:38 node2 kernel: [156574.859511] ata1.00: hard resetting link
Jun 29 18:42:38 node2 kernel: [156575.179483] ata1.01: hard resetting link
Jun 29 18:42:44 node2 kernel: [156580.695472] ata1.00: link is slow to respond, please be patient (ready=0)
Jun 29 18:42:49 node2 kernel: [156585.679562] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 29 18:42:49 node2 kernel: [156585.679611] ata1.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 29 18:42:49 node2 kernel: [156585.711980] ata1.00: configured for UDMA/133
Jun 29 18:42:49 node2 kernel: [156585.727732] ata1.01: configured for UDMA/100
Jun 29 18:42:49 node2 kernel: [156585.730438] ata1.00: device reported invalid CHS sector 0
Jun 29 18:42:49 node2 kernel: [156585.730474] ata1: EH complete

この辺に議論がある。

Gaetan Cambier 2010-05-14 16:08:26 EDT
i have found an solution :
add the option line to grub to disable ncq : libata.force=noncq

for me, with this, i have no froze

https://bugzilla.redhat.com/show_bug.cgi?id=549981

https://forums.ubuntulinux.jp/viewtopic.php?id=9813


とりあえず /etc/default/grub

GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=1024M"

この部分を

GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=1024M libata.force=noncq"

このように変更して様子を見てみる。ここに書けるのかは不明。起動ログを見るとxenのゲスト起動時はlibataの項目が無いので違うかもしれない。

GRUB_CMDLINE_LINUX_DEFAULT="maxcpus=1"

こっちに追記?両方に書いて様子見。


7/1 追記
まだエラーが出る。同じハードの他のサーバーでエラーが出ないってことは、ケーブルかコントローラーが悪いんだろうか…。


7/5 追記
ケーブルを抜き差ししたついでに違うポートに差し替えたが…。

Jul  3 20:45:57 node2 kernel: [174098.017175] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul  3 20:45:57 node2 kernel: [174098.024583] ata3.00: failed command: WRITE DMA
Jul  3 20:45:57 node2 kernel: [174098.031975] ata3.00: cmd ca/00:08:38:b0:61/00:00:00:00:00/e1 tag 0 dma 4096 out
Jul  3 20:45:57 node2 kernel: [174098.031976]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul  3 20:45:57 node2 kernel: [174098.063013] ata3.00: status: { DRDY }
Jul  3 20:45:57 node2 kernel: [174098.071262] ata3: hard resetting link
Jul  3 20:46:02 node2 kernel: [174103.585138] ata3: link is slow to respond, please be patient (ready=0)
Jul  3 20:46:07 node2 kernel: [174108.065138] ata3: SRST failed (errno=-16)
Jul  3 20:46:07 node2 kernel: [174108.073841] ata3: hard resetting link
Jul  3 20:46:12 node2 kernel: [174113.589137] ata3: link is slow to respond, please be patient (ready=0)
Jul  3 20:46:17 node2 kernel: [174118.069138] ata3: SRST failed (errno=-16)
Jul  3 20:46:17 node2 kernel: [174118.078491] ata3: hard resetting link
Jul  3 20:46:22 node2 kernel: [174123.593138] ata3: link is slow to respond, please be patient (ready=0)
Jul  3 20:46:25 node2 kernel: [174126.113200] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul  3 20:46:25 node2 kernel: [174126.145649] ata3.00: configured for UDMA/133
Jul  3 20:46:25 node2 kernel: [174126.155127] ata3.00: device reported invalid CHS sector 0
Jul  3 20:46:25 node2 kernel: [174126.164583] ata3: EH complete
Jul  3 20:50:57 node2 kernel: [174398.017175] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul  3 20:50:57 node2 kernel: [174398.026546] ata3.00: failed command: WRITE DMA
Jul  3 20:50:57 node2 kernel: [174398.035913] ata3.00: cmd ca/00:10:10:30:1c/00:00:00:00:00/e0 tag 0 dma 8192 out
Jul  3 20:50:57 node2 kernel: [174398.035914]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul  3 20:50:57 node2 kernel: [174398.073717] ata3.00: status: { DRDY }
Jul  3 20:50:57 node2 kernel: [174398.083271] ata3: hard resetting link
Jul  3 20:51:02 node2 kernel: [174403.597137] ata3: link is slow to respond, please be patient (ready=0)
Jul  3 20:51:07 node2 kernel: [174408.021138] ata3: SRST failed (errno=-16)
Jul  3 20:51:07 node2 kernel: [174408.030249] ata3: hard resetting link
Jul  3 20:51:07 node2 kernel: [174408.505205] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jul  3 20:51:07 node2 kernel: [174408.537648] ata3.00: configured for UDMA/133
Jul  3 20:51:07 node2 kernel: [174408.546387] ata3.00: device reported invalid CHS sector 0
Jul  3 20:51:07 node2 kernel: [174408.555024] ata3: EH complete

ケーブル自体またはHDDの方の接続部分が悪い?