[Orca-users] Re: Orcallator dumping core

Blair Zajac blair at akamai.com
Fri Feb 9 14:28:55 PST 2001


Patrick,

Can you send the last 50 to 100 lines of both the truss output,
the se -d output and the /etc/mnttab file exactly when se crashes
to make sure that the below description fits?

I think the problem is with the last line shown here in
include/mnt_class.se:

    if (initial == 1) {
      input = fopen("/etc/mnttab", "r");
      if (input == 0) {
        number$ = -1;
        return;
      }
      initial = 0;
      return;
    }
    if (number$ == last) {
      return;
    }
    last = number$;
    if (last == 0) {
      fseek(input, 0, SEEK_SET);
    }
    if (fgets(buf, sizeof(buf), input) == nil) {
      fseek(input, 0, SEEK_SET);
      number$ = -1;
      return;
    }
    strcpy(strchr(buf, '\n'), "");

According to the SE users manual, this code should never be run unless
the user is entirely sure that a buf will contain a '\n':

  while(fgets(buf, sizeof(buf), stdin) ! = nil) {
    strcpy(strchr(buf, '\n'), "");
    puts(buf);
  }

In this case, the result of the "strchr" call is never assigned to a
variable and its return value remains uncopied before being sent to the
"strcpy" function.  Strcpy then copies the string "" onto the new-line
and turns it to the null character in doing so.

There are several things to do here.

1) Change start_orcallator.se to keep se in the foreground.  When it
   exists, make a copy of /etc/mnttab or email it somewhere to look at.
   Please email me a copy of it and the se -d output.

    # Now start the logging.
    echo "Starting logging"
    $SE $LE_PATCH -DWATCH_OS $WATCH_WEB $libdir/orcallator.se
    mailx -s "Bad /etc/mnttab" YOUR at EMAIL_ADDRESS.COM < /etc/mnttab

   It would be interesting to see the problem with the file.

2) Another is the edit include/mnt_class.se and change

     mnt$() {
       char buf[BUFSIZ];
       string p;

   to

     mnt$() {
       char buf[BUFSIZ<<4];
       string p;

   in the hope that the line does have a \n but that the lines are too
   long for the buffer.

3) Change the last line from

    strcpy(strchr(buf, '\n'), "");

   to

    if (strchr(buf, '\n') != nil) {
      strcpy(strchr(buf, '\n'), "");
    }

    This isn't optimal since it will search for '\n' twice, but it
    should work.

Regards,
Blair
Patrick Aland wrote:
> 
> Ok, I'm running
> se - Version 3.1 (pre-fcs) (10:39 AM 03/31/99) for sparcv9 SunOS 5.7
> I'm using the orcallator.se that came with the .26 tar
> 
> After running se in debug mode twice it appears to be dieing during one of the filesystem checks,
> Run 1:
> if (last<120> == <0>)
> if (fgets(buf<tophat:/export/home1/kharman>, sizeof(buf<tophat:/export/home1/kharman>), input<4296373264>) == <(nil)>)
> strcpy(strchr(buf<tophat:/export/home1/abaker\t/home/abaker\tnfs\t>, <10>), <>)
> 
> Run 2:
> if (last<120> == <0>)
> if (fgets(buf<tophat:/export/home2/jpim>, sizeof(buf<tophat:/export/home2/jpim>), input<4296373264>) == <(nil)>)
> strcpy(strchr(buf<tophat:/export/home1/mdemurga\t/home/mdemurga\tnfs\tdev=313c5>, <10>), <>)
> 
> Those are the last 3 lines of output from the two runs before if segfaulted.
> 
> Running it through truss we get:
> statvfs("/var/log", 0x10019E300)                = 0
> statvfs("/var/news", 0x10019E300)               = 0
> statvfs("/var/mail", 0x10019E300)               = 0
> statvfs("/export", 0x10019E300)                 = 0
> read(9, 0x1007595C4, 8192)                      = 0
>     Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF7EB40070
>       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
>     Received signal #11, SIGSEGV [default]
>       siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000
>         *** process killed ***
> 
> As I said in my original post the timing where completely different for all three.
> Run 1 went almostand hour, Run 2 about 3 minutes, Run 3 about 25.
> 
> As far as PCP goes I don't believe it has the ability to monitor solaris simply because it uses the /proc filesystem. It doesn however have a plugin architecture to write new namespaces (some have been written to monitor Cisco equip, orcale db's, etc) so it might be possible to extend it. I have found that I can't get all the stats from it that se can get and unfortunately a kernel patch is required to get stats on a partition level basis.
> 
> --Patrick
> 
> On Wed, Jan 24, 2001 at 02:55:41PM -0800, Blair Zajac wrote:
> > First, which version of SE and orcallator.se are you using?  To track
> > this down, I'd do two things.  Run it with the -d SE flag and then
> > under truss (probably separately since there will be a lot of output):
> >
> > se -d -DWATCH_OS orcallator.se 5
> > truss se -DWATCH_OS orcallator.se 5
> >
> > Let's see what is causing the failure.
> >
> > I haven't heard of PCP but it looks interesting.  Could it be
> > packaged so that Orca would have an OS independent data collector?
> > This may be similar to libgtop which I was hoping would provide
> > an OS independent data collector but there's been no work on.
> >
> > Regards,
> > Blair
> >
> 
> --
> ------------------------------------------------------------
>  Patrick Aland                          paland at stetson.edu
>  Network Administrator                  Voice: 904.822.7217
>  Stetson University                     Fax: 904.822.7367
> ------------------------------------------------------------



More information about the Orca-users mailing list