0

As a result of experiments with PCI driver development, I had my kernel crashed. Now I'm in situation when the OS boots and crashed again, since it tries to load my faulty driver. What is the way to fix it, probably avoid booting my driver, so that I can log in the system in 'safe' mode and then fix my driver or at least uninstall from the system?

UPDATE After reading http://docs.oracle.com/cd/E36784_01/pdf/E36801.pdf and other docs, it appears that the steps are as follows:

1) boot from solaris CD

2) select Shell

3) zpool import -R /a rpool

4) zfs mount rpool/ROOT/zfsBE

5) cd /a and remove the faulty driver from /usr/kernel/drv/

Not entirely sure and I don't want to screw up the system again, so would like to get a confirmation from Solaris gurus.

UPDATE2 So the above fix-scenario almost worked for me, that said I was able to import rpool and this automatically mounted /export/home under /a and this allowed me to delete my faulty driver, since earlier I made a soft-link to it from /usr/kernel/drv; basically I was able to reboot and start solaris with no issues and no error messages, so I didn't even run fsck. But what didn't work for me is zfs mount rpool/ROOT/solaris which is a root FS, because I wanted to delete a link from /usr/kernel/drv. The error message said I should be doing this with mount.

BTW, for now I copy my driver on /tmp and create a link /usr/kernel/drv/amf64/mydrv, so in case of crash, the system reboot and cleans the /tmp partition. I will need to invest some time in beadm later on.

Thanks.

Mark
  • 6,052
  • 8
  • 61
  • 129
  • 3
    This is why you should use beadm to make an additional boot environment, so you just need to choose a different grub menu entry on boot to get back to a working kernel. That only helps if you do it before you install a broken driver though. – alanc Jul 15 '15 at 21:09
  • @alanc. Indeed. The first thing I learned early on in the development of my first kernel module was "Have a plan to reboot without that module." – Andrew Henle Jul 15 '15 at 22:07

1 Answers1

1

Boot from a CD/DVD, or read around page 81 of this document, assuming you're running on x86 hardware.

Basically, you need to get around the loading of your device. If you don't have a safe-mode GRUB option (again assuming x86), or if the safe mode still loads your driver, it's probably a lot easier to just boot a Solaris CD/DVD, mount/import your root pool, and remove your driver from the file system.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • thanks for response. This is x86 machine, what is a safe-mode GRUB option and how it could help? Could you also show how to mount/import my root pool once I booted from solaris CD? – Mark Jul 15 '15 at 20:47
  • please see my updated original message, specifically commands that I found to import & mount root FS. Thanks. – Mark Jul 15 '15 at 21:43
  • 1
    Yes, that's pretty much the easiest way if you don't get a failsafe or "safe boot" option. Solaris 10 would provide such an option on x86 though I haven't find any documentation for it on Solaris 11 - see http://docs.oracle.com/cd/E26505_01/html/E29492/ggqdn.html. And as @alanc mentioned in his comment, creating a backup boot environment is another method. See http://docs.oracle.com/cd/E36784_01/html/E36803/index.html for how to do that. Be aware that one side effect of multiple boot environments on a single ZFS root pool is a nasty mess of file system clones and snapshots. – Andrew Henle Jul 15 '15 at 22:04
  • I updated my original post. Basically it worked for me. The crash dump was created in /var/crash, can I safely delete this with rm or some special command should be used? – Mark Jul 16 '15 at 14:41
  • @Mark - Just `rm` the file(s) there. You know why your machine crashed. See `man dumpadm` to see how to control the generation of kernel crash dump files, including disabling their creation. – Andrew Henle Jul 17 '15 at 00:14