ataエラー調査
エラーで時々固まる。
Jun 29 18:42:18 node2 kernel: [156554.787521] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jun 29 18:42:18 node2 kernel: [156554.787555] ata1.00: failed command: WRITE DMA Jun 29 18:42:18 node2 kernel: [156554.787585] ata1.00: cmd ca/00:08:21:d1:f9/00:00:00:00:00/e3 tag 0 dma 4096 out Jun 29 18:42:18 node2 kernel: [156554.787586] res 40/00:03:00:fe:00/00:00:00:00:00/b0 Emask 0x4 (timeout) Jun 29 18:42:18 node2 kernel: [156554.787676] ata1.00: status: { DRDY } Jun 29 18:42:18 node2 kernel: [156554.787710] ata1.00: hard resetting link Jun 29 18:42:18 node2 kernel: [156555.107483] ata1.01: hard resetting link Jun 29 18:42:23 node2 kernel: [156560.623472] ata1.00: link is slow to respond, please be patient (ready=0) Jun 29 18:42:28 node2 kernel: [156564.823474] ata1.00: SRST failed (errno=-16) Jun 29 18:42:28 node2 kernel: [156564.823512] ata1.00: hard resetting link Jun 29 18:42:28 node2 kernel: [156565.143483] ata1.01: hard resetting link Jun 29 18:42:33 node2 kernel: [156570.659473] ata1.00: link is slow to respond, please be patient (ready=0) Jun 29 18:42:38 node2 kernel: [156574.859473] ata1.00: SRST failed (errno=-16) Jun 29 18:42:38 node2 kernel: [156574.859511] ata1.00: hard resetting link Jun 29 18:42:38 node2 kernel: [156575.179483] ata1.01: hard resetting link Jun 29 18:42:44 node2 kernel: [156580.695472] ata1.00: link is slow to respond, please be patient (ready=0) Jun 29 18:42:49 node2 kernel: [156585.679562] ata1.00: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jun 29 18:42:49 node2 kernel: [156585.679611] ata1.01: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Jun 29 18:42:49 node2 kernel: [156585.711980] ata1.00: configured for UDMA/133 Jun 29 18:42:49 node2 kernel: [156585.727732] ata1.01: configured for UDMA/100 Jun 29 18:42:49 node2 kernel: [156585.730438] ata1.00: device reported invalid CHS sector 0 Jun 29 18:42:49 node2 kernel: [156585.730474] ata1: EH complete
この辺に議論がある。
Gaetan Cambier 2010-05-14 16:08:26 EDT
i have found an solution :
add the option line to grub to disable ncq : libata.force=noncqfor me, with this, i have no froze
https://bugzilla.redhat.com/show_bug.cgi?id=549981
https://forums.ubuntulinux.jp/viewtopic.php?id=9813
とりあえず /etc/default/grub の
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=1024M"
この部分を
GRUB_CMDLINE_XEN_DEFAULT="dom0_mem=1024M libata.force=noncq"
このように変更して様子を見てみる。ここに書けるのかは不明。起動ログを見るとxenのゲスト起動時はlibataの項目が無いので違うかもしれない。
GRUB_CMDLINE_LINUX_DEFAULT="maxcpus=1"
こっちに追記?両方に書いて様子見。
7/1 追記
まだエラーが出る。同じハードの他のサーバーでエラーが出ないってことは、ケーブルかコントローラーが悪いんだろうか…。
7/5 追記
ケーブルを抜き差ししたついでに違うポートに差し替えたが…。
Jul 3 20:45:57 node2 kernel: [174098.017175] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 3 20:45:57 node2 kernel: [174098.024583] ata3.00: failed command: WRITE DMA Jul 3 20:45:57 node2 kernel: [174098.031975] ata3.00: cmd ca/00:08:38:b0:61/00:00:00:00:00/e1 tag 0 dma 4096 out Jul 3 20:45:57 node2 kernel: [174098.031976] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 3 20:45:57 node2 kernel: [174098.063013] ata3.00: status: { DRDY } Jul 3 20:45:57 node2 kernel: [174098.071262] ata3: hard resetting link Jul 3 20:46:02 node2 kernel: [174103.585138] ata3: link is slow to respond, please be patient (ready=0) Jul 3 20:46:07 node2 kernel: [174108.065138] ata3: SRST failed (errno=-16) Jul 3 20:46:07 node2 kernel: [174108.073841] ata3: hard resetting link Jul 3 20:46:12 node2 kernel: [174113.589137] ata3: link is slow to respond, please be patient (ready=0) Jul 3 20:46:17 node2 kernel: [174118.069138] ata3: SRST failed (errno=-16) Jul 3 20:46:17 node2 kernel: [174118.078491] ata3: hard resetting link Jul 3 20:46:22 node2 kernel: [174123.593138] ata3: link is slow to respond, please be patient (ready=0) Jul 3 20:46:25 node2 kernel: [174126.113200] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 3 20:46:25 node2 kernel: [174126.145649] ata3.00: configured for UDMA/133 Jul 3 20:46:25 node2 kernel: [174126.155127] ata3.00: device reported invalid CHS sector 0 Jul 3 20:46:25 node2 kernel: [174126.164583] ata3: EH complete Jul 3 20:50:57 node2 kernel: [174398.017175] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Jul 3 20:50:57 node2 kernel: [174398.026546] ata3.00: failed command: WRITE DMA Jul 3 20:50:57 node2 kernel: [174398.035913] ata3.00: cmd ca/00:10:10:30:1c/00:00:00:00:00/e0 tag 0 dma 8192 out Jul 3 20:50:57 node2 kernel: [174398.035914] res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Jul 3 20:50:57 node2 kernel: [174398.073717] ata3.00: status: { DRDY } Jul 3 20:50:57 node2 kernel: [174398.083271] ata3: hard resetting link Jul 3 20:51:02 node2 kernel: [174403.597137] ata3: link is slow to respond, please be patient (ready=0) Jul 3 20:51:07 node2 kernel: [174408.021138] ata3: SRST failed (errno=-16) Jul 3 20:51:07 node2 kernel: [174408.030249] ata3: hard resetting link Jul 3 20:51:07 node2 kernel: [174408.505205] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 3 20:51:07 node2 kernel: [174408.537648] ata3.00: configured for UDMA/133 Jul 3 20:51:07 node2 kernel: [174408.546387] ata3.00: device reported invalid CHS sector 0 Jul 3 20:51:07 node2 kernel: [174408.555024] ata3: EH complete
ケーブル自体またはHDDの方の接続部分が悪い?