[Orca-users] Orcallator - Segmentation Fault

Dmitry Berezin dberezin at surfside.rutgers.edu
Fri Sep 8 07:05:56 PDT 2006


Can you rebuild device tree on this server and try to run Orcallator again?

 

  -Dmitry.

 

-----Original Message-----
From: Biju Joseph [mailto:biju.joseph at gmail.com] 
Sent: Friday, September 08, 2006 9:49 AM
To: Dmitry Berezin
Cc: Cockcroft, Adrian; Brian Poole; orca-users at orcaware.com
Subject: Re: [Orca-users] Orcallator - Segmentation Fault

 

Today I tried to run orcallator on a different machine which has no cluster
software, but EMC disks attached and VxVM installed. Getting same problem. (
Segmentation Fault )

 

But as mentioned earlier, It has got installed successfully on machines
which are not having EMC SAN disks. 

 

Any solution is highly appreciated.

 

Thanks

Biju..

 

On 9/8/06, Dmitry Berezin <dberezin at acs.rutgers.edu> wrote: 

It is failing while dereferencing, but the pointer is not null -

dp = *((dirent_t *) ld<4281687800>) 
> Segmentation Fault (core dumped)

-Dmitry.


> -----Original Message-----
> From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com 
> [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> Behalf Of Cockcroft, Adrian
> Sent: Thursday, September 07, 2006 2:32 PM 
> To: Brian Poole
> Cc: Dmitry Berezin; orca-users at orcaware.com; Biju Joseph
> Subject: Re: [Orca-users] Orcallator - Segmentation Fault
>
> OK, so it's failing while walking the directory tree, I can see that the 
> renew is already in place a line or so earlier.
>
> Its dereferencing a directory structure that isn't there, so a test
> needs to be added to skip this if readdir returns something bad. Its
> already testing for null, so there is something bad happening between
> the null test and the actual usage of the dirp.
>
> http://docs.sun.com/app/docs/doc/819-2243/6n4i099g0?q=readdir
<http://docs.sun.com/app/docs/doc/819-2243/6n4i099g0?q=readdir&a=view>
&a=view
>
> I'm not sure how to fix this, maybe a second test for null immediately
> before it's de-referenced?
>
> Adrian 
>
> -----Original Message-----
> From: Brian Poole [mailto:pooleb at gmail.com]
> Sent: Thursday, September 07, 2006 10:39 AM
> To: Cockcroft, Adrian
> Cc: Dmitry Berezin; Biju Joseph; orca-users at orcaware.com
> Subject: Re: [Orca-users] Orcallator - Segmentation Fault
>
> Here is all of the information I've been able to gather on the crash
> (SE Toolkit 3.4 on Solaris 10). I compiled it fresh using Forte with
> debugging enabled. I took a quick look at trying to find where the
> problem actually lies but was unable to come up with anything useful. 
>
> Here is running the disks.se with debug:
>
> # /opt/RICHPse/bin/se.sparcv9 -d /opt/RICHPse/examples/disks.se
> if (count<31> == GLOBAL_diskinfo_size<101>) 
> dp = *((dirent_t *) ld<4281687704>)
> if (dp.d_name<c3t8d24s3> == <.> || dp.d_name<c3t8d24s3> == <..>)
> if (!(dp.d_name<c3t8d24s3> =~ <s0$>))
> ld = readdir(dirp<4281664128>) 
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<4281687736>)
> if (dp.d_name<c3t8d24s4> == <.> || dp.d_name<c3t8d24s4> == <..>)
> if (!( dp.d_name<c3t8d24s4> =~ <s0$>))
> ld = readdir(dirp<4281664128>)
> if (count<31> == GLOBAL_diskinfo_size<101>)
> dp = *((dirent_t *) ld<4281687768>)
> if (dp.d_name <c3t8d24s5> == <.> || dp.d_name<c3t8d24s5> == <..>)
> if (!(dp.d_name<c3t8d24s5> =~ <s0$>))
> ld = readdir(dirp<4281664128>)
> if (count<31> == GLOBAL_diskinfo_size<101>) 
> dp = *((dirent_t *) ld<4281687800>)
> Segmentation Fault (core dumped)
>
> So tracking that back shows the segfault occurs on line 215 of
> include/diskinfo.se:
>
>     for (ld = readdir(dirp); ld != 0; ld = readdir(dirp)) { 
>       // grow the array if needed
>       if (count == GLOBAL_diskinfo_size) {
>         GLOBAL_diskinfo_size += 4;
>         GLOBAL_disk_info = renew GLOBAL_disk_info[GLOBAL_diskinfo_size];
>       }
>       dp = *((dirent_t *) ld);     <---------
>
> Also the truss output:
>
> # truss -fo /tmp/truss.log /opt/RICHPse/bin/se.sparcv9
> /opt/RICHPse/examples/disks.se
> # tail -15 /tmp/truss.log
> 5967:   ioctl(4, KSTAT_IOC_READ, "sd3547,err")          = 701015
> 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> 5967:   ioctl(4, KSTAT_IOC_READ, "sd2146,err")          = 701015 
> 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> 5967:   ioctl(4, KSTAT_IOC_READ, "sd2177,err")          = 701015
> 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015 
> 5967:   ioctl(4, KSTAT_IOC_READ, "sd3935,err")          = 701015
> 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> 5967:   ioctl(4, KSTAT_IOC_READ, "sd1971,err")          = 701015 
> 5967:   ioctl(4, KSTAT_IOC_CHAIN_ID, 0x00000000)        = 701015
> 5967:   ioctl(4, KSTAT_IOC_READ, "sd1972,err")          = 701015
> 5967:       Incurred fault #6, FLTBOUNDS  %pc = 0xFF2E08EC 
> 5967:         siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
> 5967:       Received signal #11, SIGSEGV [default]
> 5967:         siginfo: SIGSEGV SEGV_MAPERR addr=0xFF356000
>
> And perhaps more indicative, the trace: 
>
> # /opt/SUNWspro/bin/dbx /opt/RICHPse/bin/se.sparcv9 core
> For information about new features see `help changes'
> To remove this message, put `dbxenv suppress_startup_message 7.5' in
> your .dbxrc 
> Reading se.sparcv9
> core file header read successfully
> Reading ld.so.1
> Reading libkvm.so.1
> Reading libkstat.so.1
> Reading libdl.so.1
> Reading libelf.so.1
> Reading libgen.so.1
> Reading libm.so.2
> Reading libsocket.so.1
> Reading libnsl.so.1
> Reading libc.so.1
> Reading libc_psr.so.1
> Reading libmp.so.2
> Reading libmd5.so.1
> Reading libscf.so.1
> Reading libdoor.so.1
> Reading libuutil.so.1
> Reading librt.so.1
> Reading libaio.so.1
> program terminated by signal SEGV (no mapping at the fault address)
> 0xff2e08ec: _memcpy+0x042c:     ldd      [%o1], %c2 
> Current function is member_fill
> dbx: warning: can't find file "/tmp/se-src/run.c"
> dbx: warning: see `help finding-files'
> (dbx) where
>   [1] _memcpy(0x129938, 0xff356000, 0x8, 0xfffffffa, 0x4, 0x1), at 
> 0xff2e08ec
> =>[2] member_fill(vp = 0x1297f0, area = 0xff355ef8 "", bias = 0), line
> 994 in "run.c"
>   [3] struct_fill(vp = 0x1296b0, area = 0xff355ef8 "", bias = 0), line 
> 1043 in "run.c"
>   [4] run_indirection(sp = 0xffbfc4b8), line 1308 in "run.c"
>   [5] run_call(sp = 0xffbfc4b8), line 1608 in "run.c"
>   [6] resolve_expression(vp = 0xffbfcae0, ep = 0x129620, runit = 1), 
> line 2892 in "run.c"
>   [7] run_assign(sp = 0x127530), line 1675 in "run.c"
>   [8] run_statement_list(lp = 0x127510), line 513 in "run.c"
>   [9] run_for(sp = 0x12c078), line 2538 in " run.c"
>   [10] run_statement_list(lp = 0x127330), line 513 in "run.c"
>   [11] run_for(sp = 0x12c0b8), line 2538 in "run.c"
>   [12] run_statement_list(lp = 0x121208), line 513 in " run.c"
>   [13] run_block(bp = 0x133288), line 402 in "run.c"
>   [14] run_call(sp = 0xffbfcec8), line 1625 in "run.c"
>   [15] resolve_expression(vp = 0xffbfd450, ep = 0x13cd80, runit = 1), 
> line 2892 in "run.c"
>   [16] resolve_l_expression(ep = 0x13ae18), line 2659 in "run.c"
>   [17] run_if(sp = 0x13cf88), line 523 in "run.c"
>   [18] run_statement_list(lp = 0x13cf88), line 513 in " run.c"
>   [19] run_block(bp = 0x1426f8), line 402 in "run.c"
>   [20] se_run(argc = 1, argv = 0x74b88), line 366 in "run.c"
>   [21] main(argc = 2, argv = 0xffbffcc4), line 542 in " main.c"
> *vp = {
>     var_flags      = VF_MEMBER
>     var_special    = 0
>     var_type       = VAR_CHAR
>     var_struct     = (nil)
>     var_name       = 0xc44f0 "d_name" 
>     var_qname      = (nil)
>     var_attach_lib = (nil)
>     var_address    = (nil)
>     var_initial    = (nil)
>     var_un         = {
>         var_string  = 0x129840 "c3t8d24s6" 
>         var_digit   = 1218624
>         var_udigit  = 1218624U
>         var_ldigit  = 5233950226120704LL
>         var_uldigit = 5233950226120704ULL
>         var_rdigit  = 2.5859149987693e-308 
>         var_user    = 0x129840
>         var_array   = 0x129840
>     }
>     var_dimension  = 256
>     var_subscript  = (nil)
>     var_instances  = (nil)
>     var_offset     = 10 
>     var_parent     = 0xffbfd588
>     var_next       = (nil)
> }
>
> I would be more than happy to provide any additional information on
> the problem you might need. Feel free to contact me directly on this 
> issue.
>
> Thank you,
>
> Brian
>
> On 9/7/06, Cockcroft, Adrian <acockcroft at ebay.com> wrote:
> > It should still be possible to avoid the crash by checking for a null 
> at
> > the right point.
> >
> > Is it crashing in kstat read of the iostats, or the devinfo name
> mapping
> > at startup?
> >
> > Adrian
> >
> > -----Original Message-----
> > From: Dmitry Berezin [mailto:dberezin at surfside.rutgers.edu]
> > Sent: Thursday, September 07, 2006 8:43 AM
> > To: Cockcroft, Adrian; 'Biju Joseph'; orca-users at orcaware.com
> > Subject: RE: [Orca-users] Orcallator - Segmentation Fault
> >
> > Adrian,
> >
> > I believe that the actual problem is not with the array sizes, but has 
> > to do
> > with the "stale" disk devices. SE "segfaults" when it tries to access
> a
> > device that is not currently present on the system. That is why the
> > problem 
> > is usually seen on the clustered systems with shared storage or
> systems
> > with
> > BCV devices that frequently change their state to offline. A number of
> > people had previously reported that rebuilding device tree fixed the 
> > problem.
> >
> > I have not had time to look at the code, so I do not know if this
> could
> > be
> > solved by changing scripts or SE itself has to be patched.
> > 
> >   -Dmitry.
> >
> >
> > > -----Original Message-----
> > > From: orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com
<mailto:acs.rutgers.edu at orcaware.com> 
> > > [mailto:orca-users-bounces+dberezin=acs.rutgers.edu at orcaware.com] On
> > > Behalf Of Cockcroft, Adrian
> > > Sent: Thursday, September 07, 2006 11:13 AM 
> > > To: Biju Joseph; orca-users at orcaware.com
> > > Subject: Re: [Orca-users] Orcallator - Segmentation Fault
> > >
> > > Years ago I fixed the code that looks at disks to resize the array 
> > > dynamically, I guess that this code got overwritten at some point,
> but
> > its
> > > a simple fix, just doesn't look much like C code...
> > >
> > > You can use the "renew" keyword to make a new array that is bigger 
> and
> > > contains the same items, so figure out where its indexing into the
> > disk
> > > array, check the index and renew the array to be size+10 or
> something.
> > > There's example code in the generic SE disk class, which for some 
> > reason
> > > orcallator doesn't seem to use?
> > >
> > > I'm not currently working on a Solaris box, so it will take me a
> while
> > to
> > > get a setup I could test this fix on, probably a few weeks when I 
> get
> > back
> > > from a business trip.
> > >
> > > Adrian
> > >
> > > -----Original Message-----
> > > From: orca-users-bounces+acockcroft= ebay.com at orcaware.com on behalf
> of
> > > Biju Joseph
> > > Sent: Thu 9/7/2006 7:28 AM
> > > To: orca-users at orcaware.com
> > > Subject: [Orca-users] Orcallator - Segmentation Fault
> > >
> > > Hello All,
> > >
> > > I am trying to start orcallator on two nodes of VCS cluster ( 4.1 )
> > with
> > > VxVM 4.1 . Database is on EMC disks. Orcallator is giving
> segmentation
> > > fault.
> > >
> > > RICHPse version is 3.4 (03:59 PM 01/05/05).  I tried using 
> > orcallator.se
> > > 1.36 and 1.37. Both giving same problem.
> > >
> > > The same combination is working on non clustered systems. All
> systems
> > are
> > > Solaris 10
> > >
> > > Can any of you help.
> > >
> > > Appreciate your help.
> > >
> > > Regards
> > > Biju K Joseph
> > > +91-9866116298
> > >
> > > _______________________________________________
> > > Orca-users mailing list
> > > Orca-users at orcaware.com
> > > http://www.orcaware.com/mailman/listinfo/orca-users
> >
> > _______________________________________________ 
> > Orca-users mailing list
> > Orca-users at orcaware.com
> > http://www.orcaware.com/mailman/listinfo/orca-users 
> >
>
> _______________________________________________
> Orca-users mailing list
> Orca-users at orcaware.com
> http://www.orcaware.com/mailman/listinfo/orca-users

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: </pipermail/orca-users/attachments/20060908/a434a70a/attachment.html>


More information about the Orca-users mailing list