[Orca-users] Some Best Practices Perhaps?

Fri Dec 20 12:56:00 PST 2002

After seeing the latest set of issues... Maybe some 'best practices' would be in order.

For our location, doing 100's of machines, including non-Sun and non-UNIX (more on that later), we usually DON'T run
orca in daemon mode (always running).  We don't care about the 'real time' view and it isn't buying us that much.  We can't have the data NFS mounted to a
central location, etc.  so we do hourly/n-daily/once-a-day data transfers.

I've built orca/rrdtool on SGI IRIX 6.5, Sun Solaris 2.6/2.7/2.8 and have found memory/file leaks occur from time to time.
There is NO problem with running orca on demand (orca -o), it just takes a bit longer because of startup, etc.
On the few instances where we do run orca as a daemon, we restart it once a week.

When setting up a new system or when diagnosing a problem with a system and SEToolkit/Orcallator - run it in foreground.  Error messages get lost.  This
will catch most of the array bound errors because of a large number of disks, etc.  This would also help detect missing software requirements.  SEToolkit
requires, by default, the C Preprocessor.  There is a way around this with appropriate compilation of the final se script.

Before Blair added the split of the file into percol-YYYY-mm-dd-NNN files, we were slicing the files into pieces at each timestamp line and letting orca
piece things back together.  This help eliminate some of the gaps that cropped up, especially when disks/network cards/etc. were added or removed from the
systems.

Because the data is 'in the rrd' after processing, we also archive off the finished rrd's.  This eliminates the bunzip/gunzip processing of the already
processed percol/orcallator files.  This seems to work just fine.  We do try to NOT process after missing data until we can verify the data doesn't exist.
If it doesn't we make the run, if it does, we get that data and let Orca catch things up.

To fill in gaps in the data.... you can't.  You must scrub the rrd's and start over.  When this happens, we tend to process the entire set of files in
pieces, depending on the number of systems and files involved.  For a single system we might do a month at a time, for a large number of systems we'll do a
loop and process each day.

We've even taken other UNIX systems and created our own orcallator formatted file and started graphing them using Orca.  NT was a tougher nut to crack, but
we now graph Perfmon data too.

Understand the format of the input file that Orca/rrd are expecting.

So, here is a strawman for a Best Practices FAQ....  Comments are welcome.

1) Determine if you need real-time graphs, or if hourly/daily runs of orca (orca -o) would be sufficient.

2) Restart Orca periodically.  Rule of thumb, weekly.

3) Verify SEToolkit/Orcallator on a new install in case adjustments must be made to internal arrays.  This will also allow you to verify the
settings/switches for you installation.  Not every machine is a web server, etc.

4) Once percol/orcallator files are processed (or after a week), archive those files off somewhere else.  This is especially true if you do NOT run orca
continuously.

5) If you are trying to catch things up or need to reprocess a lot of input files for orca, consider doing it in stages.

6) Orca/RRD are handy enough for other data.   Just figure out how to format it like a percol/orcallator file..... Don't try to change the names to match
SEToolkit (though for UNIX it isn't that hard).  Key piece of perl code:
    use Time::Local;        # set up for calculating seconds since EPOCH.
    $epochsec = timelocal($sec, $min, $hour, $day, $mon-1, $year-1900);

7) The format of the file isn't hard.
In the easiest of terms.....

headingline label1 label2 label3 ...
timestamp0 data1 data2 data3 ...
timestamp1 data1 data2 data3 ...

e.g.
timestamp locltime sys% usr% wio% idle% ...
12909309 00:00:00 10 20 5 65
12909609 00:05:00 11 19 4 66

or

timestamp ft_per_fortnight cc_per_quadrangle angels_dancing_on_pinhead
13003930  1000 3000 10000000000000000

etc.