[Orca-users] SE Toolkit 3.4 SEGV on missing disks

David Michaels dragon at raytheon.com
Sat Aug 26 08:31:12 PDT 2006


Brian --

This issue has come up in the past, and there's a lot of info about it 
in the mail archives.  I found this message that indicates that 
recompiling SE for Solaris 10 will help:

http://www.orcaware.com/pipermail/orca-users/2006-March/004847.html

However, recompiling didn't work for someone else (the next message in 
the thread), but they are using Sun Cluster 3.1, which may have 
contributed to the problem.

You can find the SE Toolkit source on sunfreeware.com at 
http://www.sunfreeware.com/setoolkit.html (it comes with the source 
code, as well as a Solaris 10 binary, but you might still find that you 
need to recompile).

Also, though you have the latest Orca snap, make sure that the 
orcallator.se you are using came from that snapshot, and is not the 1.37 
linked in the download section of the Orcaware.com/orca page.

As a last resort, you can turn off disk data collection in the 
orcallator.se file itself, so you can at least get the other information 
gathered until the problem gets resolved.

Cheers!
--Dragon

Brian Poole wrote:
> Hello,
>
> I've just recently rolled out the latest Orca snap & SE Toolkit 3.4 to
> ~300 servers. Overall it went very smooth but unfortunately 5 of them
> are hitting seg faults when Orca is started. Trussing it shows the seg
> fault occurs while it is doing kstat's on the disks. All 5 are
> database servers that now have "missing disks" as we did a SAN
> migration so the old SAN's disks are no longer available. Rebuilding
> the device tree is not something we are eager to jump into as it
> issues bus resets and I'm hesitant to do that on production DB servers
> given the stale device entries haven't caused any problems except now
> with SE/Orca. The servers that are having the problem are all Fujitsu
> PP650s running Solaris 10 3/05 but I believe the issue has more to do
> with the missing disks than the platform as we have identical servers
> without this issue.
>
> The problem I am having looks identical to the one referenced here
> (same truss output, same debug disks.se output):
> http://marc.theaimsgroup.com/?l=orca-users&m=114417824631600&w=2
>
> Can anyone provide more information as to where this bug lies as well
> as options for fixing it (preferably besides rebuilding the device
> tree)?
>
> I'm happy to provide more information as needed, just let me know what
> you need. Here are snippets from both truss and se -d disks.se:
>
> if (dp.d_name<c3t8d24s4> == <.> || dp.d_name<c3t8d24s4> == <..>)
> if (!(dp.d_name<c3t8d24s4> =~ <s0$>))
> ld = readdir(dirp<18446744071543194240>)
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<18446744071543217872>)
> if (dp.d_name<c3t8d24s5> == <.> || dp.d_name<c3t8d24s5> == <..>)
> if (!(dp.d_name<c3t8d24s5> =~ <s0$>))
> ld = readdir(dirp<18446744071543194240>)
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<18446744071543217904>)
> Segmentation Fault (core dumped)
>
> 29617:  ioctl(4, KSTAT_IOC_READ, "sd3935,err")          = 610619
> 29617:  ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 610619
> 29617:  ioctl(4, KSTAT_IOC_READ, "sd1971,err")          = 610619
> 29617:  ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 610619
> 29617:  ioctl(4, KSTAT_IOC_READ, "sd1972,err")          = 610619
> 29617:      Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF7DF0092C
> 29617:        siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7EE06000
> 29617:      Received signal #11, SIGSEGV [default]
> 29617:        siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7EE06000
>
> Thank you,
>
> Brian
> _______________________________________________
> Orca-users mailing list
> Orca-users at orcaware.com
> http://www.orcaware.com/mailman/listinfo/orca-users
>   





More information about the Orca-users mailing list