[Orca-users] Orcallator CPU Questions

Gary M.Blumenstein garyb at mhpcc.edu
Wed Aug 14 18:50:33 PDT 2002


Dear Orca Users,

I'm trying to better understand how Orcallator and SEtoolkit determines
CPU utilization on a multiprocessor Sparc system.  First here's a little
background...

We have a 16 processor Sparc machine used mainly for running an image
processing application.  Processing times for each image takes anywhere
between 2-6 minutes depending on image size, complexity, Etc.  Right now I
have Orca and Orcallator.se set up to generate graphs using the default 5
minute sampling interval and the results show max CPU usage rarely exceeds
25% user time.  Very little time is spent in system and only occaisional
blips in wait.  The vast majority (80-90%) of the CPU time remains idle
and that has a few people around here a little perplexed.

The author of the image processing code doesn't beleive our Orcallator
numbers accurately shows how the CPUs are being used by his application.  
He says our sampling interval is too long and that we're "missing" periods
where images are being processed and completed before the next Orcallator
interval occurs.  For example, where the image takes 2 minutes to complete
but Orcallator reports every 5 minutes.

He explained - and he's correct about this - when you watch mpstat every 5
seconds while an image is being processed, you see instances where all 16
processors are 100 percent busy executing a mix of system, user, i/o wait,
and system calls.  However there's other times while the same image is
being processed, where the CPUs go from busy, to kinda' busy, to
not-so-busy, then back to fully busy again. Once the image is complete,
the CPUs return back to idle.

Based on mpstat, the programmer thinks we're running our Sparc E6500
system at full-bore during image processing and we would see that if we
decreased Orcallator's sample interval.  In the past he has made the case
to management that his application is very CPU intensive and thus requires
massive amounts of hardware to run.  He was a little perturbed when I
showed the Orcallator stats during a presentation in front of the whole
program management group because this seemed to contradict a lot of the
justification that was used to purchase the big iron.  I'm not trying to
demonstrate under-utilization or trivialize the application.  I'm just
trying to find a tool that accurately reports the system's true
utilization.

My theory is that we're looking at two related but seperate things.  The
near real-time output from mpstat does indeed show instances where all 16
CPUs sustain high peak loads.  However, the results from Orcallator shows
the actual workload for the past 5 minutes.  We're not "missing" data as
the developer is suggesting but rather Orcallator's histogram is based on
the total capacity of the machine and that includes all available CPU
cycles for the entire sampling period.  OTOH, mpstat shows the
instantaneous load and like the Heisenberg uncertainty principle, the CPUs
may be in a different state while you're watching it's output.  If for
example Orcallator reports peaks of 20% in user time, theoretically we
still have 80% capacity left based on the total number of CPU cycles that
were available when this was reported.  Furthermore, if I were to decrease
the sampling interval, I'm guessing we would still see the same 20% peaks,
just more of them.  Is this correct and if not, could somebody please
explain how CPU utilization works so I can better understand this?

Thanks very much!

-Gary

--
_______________________________________________________________________________
Gary M. Blumenstein, Computer Systems Engineer   E-mail: garyb at mhpcc.edu
Maui High Performance Computing Center           Voice : 1.808.879.5077  x225  
550 Lipoa Parkway                                FAX   :  1.808.879.5018 
Kihei, Maui, HI  96753                           URL   :  http://www.mhpcc.edu/
_______________________________________________________________________________



More information about the Orca-users mailing list