[Orca-users] Re: Orcallator CPU Questions

Thu Aug 15 13:10:41 PDT 2002

Pre-Flames-Extinguisher - If any of the other ORCA users out there see 
anything wrong in my reasoning below and I'm sure there may be (I'm no 
expert), be nice in correcting me plez.  So flames off :)

At 03:50 PM 8/14/2002 -1000, Gary M.Blumenstein wrote:
>We have a 16 processor Sparc machine used mainly for running an image
>processing application.  Processing times for each image takes anywhere
>between 2-6 minutes depending on image size, complexity, Etc.  Right now I
>have Orca and Orcallator.se set up to generate graphs using the default 5
>minute sampling interval and the results show max CPU usage rarely exceeds
>25% user time.  Very little time is spent in system and only occaisional
>blips in wait.  The vast majority (80-90%) of the CPU time remains idle
>and that has a few people around here a little perplexed.

ORCA shows the average CPU utilization of ALL 16 processors at once.  So on 
a system that 16 CPUs and 8 CPUs are pegged at 100%, ORCA will only show 
50% utilization - you can play games with the math any way you want.

>The author of the image processing code doesn't beleive our Orcallator
>numbers accurately shows how the CPUs are being used by his application.
>He says our sampling interval is too long and that we're "missing" periods
>where images are being processed and completed before the next Orcallator
>interval occurs.  For example, where the image takes 2 minutes to complete
>but Orcallator reports every 5 minutes.

Your not missing much.  Its a 5 Minute "AVERAGE".  Short large spikes 
within the probe period are flatten out.  You can change the data 
collection interval in orcallator.se to anything you want.  I have mine set 
to 60 seconds.  Each data element in the hourly and daily graphs will still 
be a "5 minute average" though - this is an RRD limitation not a limitation 
of orcallator.se.  RRD takes these 1 minute interval date elements and 
"computes" them into the RRD files whose lowest data interval is 5 minutes.

The lower you make this interval though, the more CPU power ORCA requires 
to "chew" on the data as RRD has to take this additional data and "process" 
it into the RRD data files.

That said ... I haven't actually physically verified this.  Folks on the 
Cricket mailing list have had this discussion before and that's where my 
comments come from.

>He explained - and he's correct about this - when you watch mpstat every 5
>seconds while an image is being processed, you see instances where all 16
>processors are 100 percent busy executing a mix of system, user, i/o wait,
>and system calls.  However there's other times while the same image is
>being processed, where the CPUs go from busy, to kinda' busy, to
>not-so-busy, then back to fully busy again. Once the image is complete,
>the CPUs return back to idle.

He just wants to "see" the box used more heavily.  Sure at 5 second 
samplings the box is PEGGED at times.  But you need to determine your 
requirements.  What probe interval is important to you - 5 seconds, 30 
seconds, 5 minutes?  At 5 second intervals, you can't see the forest 
because of the trees.

>Based on mpstat, the programmer thinks we're running our Sparc E6500
>system at full-bore during image processing and we would see that if we
>decreased Orcallator's sample interval.  In the past he has made the case

Probably not - again those very annoying "averages" come into play again.

>My theory is that we're looking at two related but seperate things.  The
>near real-time output from mpstat does indeed show instances where all 16
>CPUs sustain high peak loads.  However, the results from Orcallator shows
>the actual workload for the past 5 minutes.  We're not "missing" data as
>the developer is suggesting but rather Orcallator's histogram is based on
>the total capacity of the machine and that includes all available CPU
>cycles for the entire sampling period.  OTOH, mpstat shows the
>instantaneous load and like the Heisenberg uncertainty principle, the CPUs
>may be in a different state while you're watching it's output.  If for

Yep ... In relation to CPU, think of ORCA as vmstat.  vmstat is going to 
give you similar numbers to ORCA when taking a 5 minute sampling.  Hell, if 
you run "mpstat 300" your not going to see the spikes anymore either - 
well, if you do they aren't going to be anywhere near as big as "mpstat 5".

IMHO ... the "probe interval" in a way is determining what you are trying 
to see:  the trees (5 seconds), the valley (30-60 seconds), or some part of 
the forest (60+ seconds )

But ORCA is telling you SOMETHING ... your box has lots of CPU headroom 
MOST of the time.  There is something else limiting the application's 
ability to FULLY utilize the CPUs at all times.  This could be lots of 
things including disk I/O (but your initial comments don't make it sound 
like this is it), serialized processing, network chatter, etc.

Sounds like your programmer needs to review how his image processing 
application does its job and see if he can find more ways of cutting the 
work up into pieces and running them in parallel.

--
........................................................
......... ..- -. .. -..- .-. ..- .-.. . ... ............
.-- .. -. -... .-.. --- .-- ... -.. .-. --- --- .-.. ...

Sean O'Neill