1

I have a server that Kernel Panics every once in a while, and I'm trying to get all of the output in order to debug it further.

It is running RedHat 4, so the kernel version is only 2.6.9, too old for kexec or kdump. Thus, I have a serial cable between servers, running screen on each server to connect to /dev/ttyS0.

Whenever the problematic server kernel panics, the output is completely garbled. It appears that something else is being written out, and the ascii values are being added (or subtracted?) to come up with some other value. Example follows;

invAlidoperanD: 0000 [1]SMP
odqles liNked in8 nfs nfsd expoppfshocKd nFs_acl Arport_pc lp parpopp aupofs0 i0c_dEv i0c_cOre sunppc ds yEnta_soCketpcmCia_corE dm_mipror Dm_modqttOn bAptery aCmd5 ipv4 joydetehcI_hC` uhci_Hcd hw_rAndoi bLx2 ext3jbd Ata_piix liBatacciss mptsCsihiptsas Mptspi mptsCsi MptbAse sd_Mod scsi_moD
PId: 6839 cottaInteD 2.6.9-15.smp
RIP: 0010:[]{unmap_hugepAge_ranGe+32}
RSP:0018:00000102067d9c38 EFLAGS: 00210006
AX: 086d344780c22Ef1 RX:0000010001079360 RCX: 086d304780a22ecf
RDX: 086d304780c22ef1 RSI: 0000000000200022DI8 000001018d919068
RBP8 086d344780c2"Ef1 R08:0000000000000000 B09: 00000000fffDfffA

As a result, this information is nearly worthless to me, especially considering addresses and values could be tainted as a result.

I sent 10 paragraphs of Lorem Ipsum sentences into /dev/ttyS0, and it came out exactly as it went in. I've also tried reversing the Serial Cable just out of curiosity, nothing seems to have changed.

I'm at a loss for what else I could do, both to diagnose the kernel panic's, and to clean up the output so that we can ensure we're looking at correct information.

VxJasonxV
  • 911
  • 1
  • 16
  • 29
  • The corruption is remarkably regular. I'm inclined to believe that the kernel's internal serial port setting is different than whatever you're using when you open ttyS0 yourself. Might try installing minicom and using it to open ttyS0 with various data/stop bits and parity settings and send text to the other computer while you keep the same settings on the receiving end. If you get the data to garble the same way, then that tells you what settings the other side should use to read the line. – DerfK May 05 '11 at 19:20
  • I thought that too, because many letters are just being capitalized (-32 values), which could mean spaces being injected, but that's not the whole story. hw_random -> hw_rAndoi. bnx2 -> bLx2. not tainted -> cottaInteD. You mention "the kernel's internal serial port setting". I don't suppose there's a place to read that setting in? Perhaps in `/proc/sys` or elsewhere? – VxJasonxV May 05 '11 at 20:03
  • The kernel console setting is a boot parameter http://www.mjmwired.net/kernel/Documentation/serial-console.txt Come to think of it, the missing characters are mostly the ones immediately after the corrupt characters. Could be something like sending at 10800 baud while reading at 9600 (at 1/8th too fast, 7 of the characters could be "close enough" to the clock, then two get read at once and it drifts close enough again) but that's a pretty weird setting. – DerfK May 05 '11 at 20:29
  • Oh yeah. I completely forgot that I added that to `grub.conf`. Now that you've reminded me of that, I remember all the concerns I had when I set it up. I added `console=ttyS0,115200` before `console=tty0`. And when I run screen, I run `screen /dev/ttyS0 115200`. My concern is: I added the /dev/ttyS0 line to grub.conf in BOTH servers, the receiving (functioning) server, and the sending (kernel panic'ing) server. Should I undo it on the functioning server? – VxJasonxV May 05 '11 at 21:03
  • That's a good question, I've never plugged two active consoles into each other. – DerfK May 06 '11 at 01:43

0 Answers0