[Orca-users] SE Toolkit 3.4 SEGV on missing disks

Tue Aug 29 22:20:20 PDT 2006

I tried recompiling SE from source with Sun Forte 5.8 and GCC 3.4.6 on
Solaris 10 and using the resulting se binary on one of the servers
having this issue. The problem still exists. I plan to try GCC 4.1.1
for one last try tomorrow but it seems rather unlikely at this point.

I have confirmed and we are definitely using the latest orcallator.se
from the snap.

Thanks for the suggestions though.

On 8/26/06, David Michaels <dragon at raytheon.com> wrote:
> Brian --
>
> This issue has come up in the past, and there's a lot of info about it
> in the mail archives.  I found this message that indicates that
> recompiling SE for Solaris 10 will help:
>
> http://www.orcaware.com/pipermail/orca-users/2006-March/004847.html
>
> However, recompiling didn't work for someone else (the next message in
> the thread), but they are using Sun Cluster 3.1, which may have
> contributed to the problem.
>
> You can find the SE Toolkit source on sunfreeware.com at
> http://www.sunfreeware.com/setoolkit.html (it comes with the source
> code, as well as a Solaris 10 binary, but you might still find that you
> need to recompile).
>
> Also, though you have the latest Orca snap, make sure that the
> orcallator.se you are using came from that snapshot, and is not the 1.37
> linked in the download section of the Orcaware.com/orca page.
>
> As a last resort, you can turn off disk data collection in the
> orcallator.se file itself, so you can at least get the other information
> gathered until the problem gets resolved.
>
> Cheers!
> --Dragon
>
> Brian Poole wrote:
> > Hello,
> >
> > I've just recently rolled out the latest Orca snap & SE Toolkit 3.4 to
> > ~300 servers. Overall it went very smooth but unfortunately 5 of them
> > are hitting seg faults when Orca is started. Trussing it shows the seg
> > fault occurs while it is doing kstat's on the disks. All 5 are
> > database servers that now have "missing disks" as we did a SAN
> > migration so the old SAN's disks are no longer available. Rebuilding
> > the device tree is not something we are eager to jump into as it
> > issues bus resets and I'm hesitant to do that on production DB servers
> > given the stale device entries haven't caused any problems except now
> > with SE/Orca. The servers that are having the problem are all Fujitsu
> > PP650s running Solaris 10 3/05 but I believe the issue has more to do
> > with the missing disks than the platform as we have identical servers
> > without this issue.
> >
> > The problem I am having looks identical to the one referenced here
> > (same truss output, same debug disks.se output):
> > http://marc.theaimsgroup.com/?l=orca-users&m=114417824631600&w=2
> >
> > Can anyone provide more information as to where this bug lies as well
> > as options for fixing it (preferably besides rebuilding the device
> > tree)?
> >
> > I'm happy to provide more information as needed, just let me know what
> > you need. Here are snippets from both truss and se -d disks.se:
> >
> > if (dp.d_name<c3t8d24s4> == <.> || dp.d_name<c3t8d24s4> == <..>)
> > if (!(dp.d_name<c3t8d24s4> =~ <s0$>))
> > ld = readdir(dirp<18446744071543194240>)
> > if (count<31> == GLOBAL_diskinfo_size<101>)
> > dp = *((dirent_t *) ld<18446744071543217872>)
> > if (dp.d_name<c3t8d24s5> == <.> || dp.d_name<c3t8d24s5> == <..>)
> > if (!(dp.d_name<c3t8d24s5> =~ <s0$>))
> > ld = readdir(dirp<18446744071543194240>)
> > if (count<31> == GLOBAL_diskinfo_size<101>)
> > dp = *((dirent_t *) ld<18446744071543217904>)
> > Segmentation Fault (core dumped)
> >
> > 29617:  ioctl(4, KSTAT_IOC_READ, "sd3935,err")          = 610619
> > 29617:  ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 610619
> > 29617:  ioctl(4, KSTAT_IOC_READ, "sd1971,err")          = 610619
> > 29617:  ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 610619
> > 29617:  ioctl(4, KSTAT_IOC_READ, "sd1972,err")          = 610619
> > 29617:      Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF7DF0092C
> > 29617:        siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7EE06000
> > 29617:      Received signal #11, SIGSEGV [default]
> > 29617:        siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7EE06000
> >
> > Thank you,
> >
> > Brian
> > _______________________________________________
> > Orca-users mailing list
> > Orca-users at orcaware.com
> > http://www.orcaware.com/mailman/listinfo/orca-users
> >
>
>
>