[Orca-users] Re: CPU Graphs Problem Found

Blair Zajac blair at orcaware.com
Sat Aug 17 13:14:12 PDT 2002


Rich,

I think there's miscalculation in vmglobal_total that Liston's system
shows where pvm.idle_time is greater than 100.  Measured data and plots
from the data are attached.

Possible cause and solution discussed below.

Best,
Blair

Liston Bias wrote:
> 
> On Sat, 17 Aug 2002, Blair Zajac wrote:
> 
> > That's really odd.
> >
> > What is your measurement interval and the interval specific in the
> > orcallator.cfg file?
> >
> > I would first check to see if the raw orcallator data looks ok.  Find
> > one idle (purple) point in time where the plot looks ok and one period
> > where it's white and find the orcallator data.  Pull the columns
> > of data labeled usr%, sys%, wio% and idle% out.  You can use
> > orcallator_column.pl to do this for you.
> >
> > See if the data exists, what values there are and what they sum to.
> 
> I think I found the problem with CPU graphs... The attached data and
> hourly graphs seem to highlight the fact.  When the idle% is recorded as a
> number > 100 then the graph seems to just white out everything.  It
> doesn't make sense that the idle% could be less than 100%.

You mean more than 100%?

> Should I make correction to the orcallator.se to insure it only records
> number up to 100 or should I just tell orcallator to treat number > 100
> as 100???
> 
> With the attached graphs/data, I see the blanks for period when idle is
> 100.50.  I checked a handful of "good" graphs and the idle is never
> greater than 100.
> 
> - Liston


The data is being generated with these lines in orcallator.se using the
SE modules, so I think the problem is in SE.

  // In SE 3.0 and below user_time and system_time are int and in SE
  // 3.1 and above they are double, so cast everything to double using
  // + 0.0.
  pvm = vmglobal_total();
  put_output(" usr%",    sprintf("%5.1f", pvm.user_time + 0.0));
  put_output(" sys%",    sprintf("%5.1f", pvm.system_time + 0.0));
  put_output(" wio%",    sprintf("%5.1f", pvm.wait_time + 0.0));
  put_output("idle%",    sprintf("%5.1f", pvm.idle_time + 0.0));

In live_rules.se, the vmglobal_total() function calculates user_time,
system_time, wait_time and idle_time.  Portions of the code shown
below:

  int total;
  ...
  if (GLOBAL_pvm_ncpus > 1) {
    /* average over cpu count */
    pvm.user_time        /= GLOBAL_pvm_ncpus;
    pvm.system_time      /= GLOBAL_pvm_ncpus;
    pvm.wait_time        /= GLOBAL_pvm_ncpus;
    pvm.idle_time        /= GLOBAL_pvm_ncpus;
#if MINOR_VERSION < 70
    /* reduce wait time - only one CPU can ever be waiting - others are idle */
    /* system counts all idle CPUs as waiting if any I/O is outstanding */
    pvm.wait_time        /= GLOBAL_pvm_ncpus;
#endif
  }
  total = pvm.user_time + pvm.system_time + pvm.wait_time + pvm.idle_time;
  if (total < 100) {
    pvm.idle_time += (100 - total);
  }

Because total is an integer, and pvm.*_time are doubles, due to the double
to int truncation, total will always be equal to or less than the real sum
of *time, which may or may not be 100.  Because the real total may already
be 100, but total is an integer less than 100, there's an addition addition
done that should not be.

Try this patch to live_rules.se and let me know how it works.

--- live_rules.se.0     Tue Jul 16 18:02:54 2002
+++ live_rules.se       Sat Aug 17 13:11:10 2002
@@ -121,7 +121,7 @@
 /* function to total up global per_cpu data into vmstat form */
 p_vmstat vmglobal_total() {
   int i;
-  int total;
+  double total;
   p_vmstat pvm;

   pvm = GLOBAL_pvm[0];
@@ -151,8 +151,15 @@
 #endif
   }
   total = pvm.user_time + pvm.system_time + pvm.wait_time + pvm.idle_time;
-  if (total < 100) {
-    pvm.idle_time += (100 - total);
+  if (total < 100.0) {
+    pvm.idle_time += (100.0 - total);
+  }
+
+  /* Make sure that total is never greater than 100%.  Better less
+   * than 100% than greater than 100%.  */
+  total = pvm.user_time + pvm.system_time + pvm.wait_time + pvm.idle_time;
+  if (total > 100.0) {
+    pvm.idle_time -= (total - 100.0);
   }
   return pvm;
 }

Best,
Blair
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dalapp16.png
Type: application/octet-stream
Size: 5618 bytes
Desc: not available
URL: </pipermail/orca-users/attachments/20020817/8d483757/attachment.obj>
-------------- next part --------------
/tmp/dalapp16/orcallator-2002-08-17-000  12:40:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  12:45:02      0.00      0.40      0.10     99.60 
/tmp/dalapp16/orcallator-2002-08-17-000  12:50:02      0.00      0.40      0.20     99.40 
/tmp/dalapp16/orcallator-2002-08-17-000  12:55:02      0.00      0.40      0.10     99.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:00:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:05:02      0.10      0.50      0.20    100.30 
/tmp/dalapp16/orcallator-2002-08-17-000  13:10:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:15:02      0.00      0.40      0.10     99.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:20:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:25:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:30:02      0.00      0.40      0.00     99.60 
/tmp/dalapp16/orcallator-2002-08-17-000  13:35:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:40:02      0.00      0.40      0.10    100.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:45:02      0.00      0.40      0.10     99.50 
/tmp/dalapp16/orcallator-2002-08-17-000  13:50:02      0.00      0.40      0.10     99.40 
/tmp/dalapp16/orcallator-2002-08-17-000  13:55:02      0.00      0.40      0.10    100.50 
                                Machine  locltime      usr%      sys%      wio%     idle% 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dfwusa00.png
Type: application/octet-stream
Size: 5947 bytes
Desc: not available
URL: </pipermail/orca-users/attachments/20020817/8d483757/attachment-0001.obj>
-------------- next part --------------
/tmp/dfwusa00/orcallator-2002-08-17-001  12:30:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  12:35:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  12:40:01      0.00      0.40      0.10    100.40 
/tmp/dfwusa00/orcallator-2002-08-17-001  12:45:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  12:50:01      0.10      0.50      0.20    100.30 
/tmp/dfwusa00/orcallator-2002-08-17-001  12:55:01      0.00      0.40      0.10    100.40 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:00:01      0.00      0.40      0.20    100.40 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:05:01      0.30      0.70      0.20     99.80 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:10:01      0.00      0.40      0.20    100.30 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:15:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:20:01      0.10      0.50      0.10     99.40 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:25:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:30:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:35:01      0.00      0.40      0.10    100.50 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:40:01      0.00      0.40      0.20    100.40 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:45:02      0.00      0.40      0.10     99.40 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:50:02      0.10      0.40      0.20    100.30 
/tmp/dfwusa00/orcallator-2002-08-17-001  13:55:02      0.00      0.40      0.10     99.40 
                                Machine  locltime      usr%      sys%      wio%     idle% 


More information about the Orca-users mailing list