This is a FAQ (Frequently Asked Questions) for Orca and the tools that gather data for it. Please email submissions to the FAQ to orca-users@orcaware.com. # $HeadURL: http://svn.orcaware.com:8000/repos/trunk/orca/FAQ $ # $LastChangedDate: 2003-10-07 10:49:43 -0700 (Tue, 07 Oct 2003) $ # $LastChangedBy: blair $ # $LastChangedRevision: 262 $ General ------- 1.1) What is the m, k, u, or some other character following the numbers in the Y axis scale? 1.2) Why is my Y axis scale have such large numbers? 1.3) Why are there random characters at the end of my HTML and GIF or PNG images names, i.e. o_host3_disk_runp_c0t6d0...disk_runp_c-4QyP2ziXlrwXj8eG_n_A.html? 1.4) What should I use, NFS or rsync, to get my data from my clients to the Orca server? Should I push my data to the server from the clients or have my server pull my data? 1.5) How should I set up ssh access securely without entering a password everytime a process needs to contact a remote system? Warning Messages ---------------- 2.1) Number of columns in line '1,2,3.....' of ../orcallator/.../percol-2000-09-26 does not match column description. 2.2) Warning: file `../orcallator/.../temp-percol-2001-02-22' was current and now is not. 2.3) Warning: cannot create Orca::HTMLFile object: cannot open `/home/orca_html/o_host1-monthly.html.htm' for writing: Too many open files. 2.4) Warning: file `.../orcallator/host1/orcallator-2001-11-06-000' did exist and is now gone. Solaris/Orcallator.se --------------------- 3.1) What do I do about the error "/opt/RICHPse/bin/se: Unsupported platform: sparcv9 SunOS 5.8"? 3.2) Why does orcallator.se die, in particular when I log out of the system that I started it on? 3.3) Why is the page scan rate is always zero even though sar says it is not? 3.4) Why are all the NFS server statistics zero? 3.5) Why are my Ethernet bits/second measurements all zero? 3.6) Why do I get the error "Fatal: member: txunderruns vanished!: Near line 201"? 3.7) Why do I get the error "Fatal: member: txunderruns0 vanished!: Near line 255"? 3.8) Why do I get the error "Fatal: member: framming vanished!: Near line 160"? 3.9) Why do I get the error "Fatal: member: drop vanished!: Near line 285"? 3.10) Why do I get the error "Fatal: member: XXXXX vanished!: Near line YYY"? 3.11) What do I do when I get the message "Fatal: subscript: 2 out of range for: GLOBAL_net[2]: Near line 178"? 3.12) Why don't I get plots for my Veritas filesystems? 3.13) Why don't I get plots for my RSM filesystems? 3.14) Why don't I get any Interface Bits Per Second data for my qe board? 3.15) Orcallator.se core dumps. 3.16) Why should I keep my compressed percol-* or orcallator-* files? 3.17) Why do my Orca plots no longer contain any data after I change anything related to orcallator, such as the subsystems to measure, or when something changes on the system, such as mount points, ethernet devices, etc? General ------- 1.1) What is the m, k, u, or some other character following the numbers in the Y axis scale? This is SI magnitude symbol to scale the number by. Here is a table of symbols and scaling factors. a 10e-18 Ato f 10e-15 Femto p 10e-12 Pico n 10e-9 Nano u 10e-6 Micro m 10e-3 Milli k 10e3 Kilo M 10e6 Mega G 10e9 Giga T 10e12 Terra P 10e15 Peta E 10e18 Exa So if you see "250 m" in the Y axis, this means 0.25. The page http://www.physlink.com/reference_dprefixes.cfm is also a good reference for this information. 1.2) Why is my Y axis scale have such large numbers? See question 1) above. Most likely there is a scaling letter after your number that means the true value is several orders of magnitude smaller than you think it is. 1.3) Why are there random characters at the end of my HTML and GIF or PNG images names, i.e. o_host3_disk_runp_c0t6d0...disk_runp_c-4QyP2ziXlrwXj8eG_n_A.html? The way Orca generates HTML and image filenames uses all of the input data sources. With plots that contain a large amount of different data, the filename can exceed the maximum filename length for the operating system and Orca will not be able to create the file. The solution is to limit the filename length to less than 255 characters. This could be performed by simply trimming the filename down to less than 255 characters, but then the filename may not be unique and two distinct HTML files and/or images may end up being written to the same file. The solution used by Orca is to calculate the MD5 hash of the full length filename, trim the filename down and insert the MD5 into the short filename, which will guarantee uniqueness. 1.4) What should I use, NFS or rsync, to get my data from my clients to the Orca server? Should I push my data to the server from the clients or have my server pull my data? [Answer written by Sean O'Neill .] Yeah, NFS is a total pain for more reasons than just security. rsync is the way to go. By default, it uses ssh as it's transport application vs. rsh. Don't use rsh for the obvious reasons. But you need to really think about what this means in regards to security. First, your security group is probably going to be nervous about anything that allows for unattended password-less access between servers. But you also need to figure out if you want to PUSH or PULL your Orca data. If you have a Orca server that ssh's into the remote systems and rsync's down the data (e.g. PULL), this one machine would have ssh access to LOTS of other systems and would probably make any security group very nervous about that machine. If you have the remote systems rsync their data to the Orca server (e.g. PUSH), then you have lots of other machines with ssh access to ONE system. This generally makes security a /little/ less nervous. Some folks on the list have multiple Orca servers because of the system resources required by Orca. Its a CPU/memory hog at times. Also, pushing data into a box is generally an asynchronous activity (from the Orca server's point of view) so it will take in as many as the box will support. If the Orca server is PULLING data, you need some script to keep track of what systems to pull data from, have logic to make it less serial to get the data down faster, etc etc - e.g. its more of a headache IMHO. 1.5) How should I set up ssh access securely without entering a password everytime a process needs to contact a remote system? To get ssh working, use key authentication. One easy way to use key authentication is to use the keychain tool at http://www.gentoo.org/proj/en/keychain.xml The first keychain article introduces the concepts behind RSA/DSA key authentication and shows you how to set up primitive (with passphrase) RSA/DSA authentication: http://www-106.ibm.com/developerworks/library/l-keyc.html The second article shows you how to use keychain to set up secure, password-less ssh access in an extremely convenient way. keychain also provides a clean, secure way for cron jobs to take advantage of RSA/DSA keys without having to use insecure unencrypted private keys. http://www-106.ibm.com/developerworks/linux/library/l-keyc2/ A third keychain article shows you how to use ssh-agent's authentication forwarding mechanism. http://www-106.ibm.com/developerworks/linux/library/l-keyc3/ Even with these methods, when a system reboots, a person will need to manually log into the system, su into the account, run keychain and enter the passphrase to unlock the RSA/DSA keys. Warning Messages ---------------- 2.1) Number of columns in line '1,2,3.....' of ../orcallator/...../percol-2000-09-26 does not match column description. When Orca sees a line in an input data file that does not have the same number of columns as defined at the top of the file when column_description is `first_line' or does not match the column_description, then Orca will complain and ignore the line. Additionally, Orca will not record the data from this line in the RRD files and no data will be plotted. If this happens when using orcallator.se, then it will happen for orcallator.se versions 1.28b6 or older when hardware, mount points, network interfaces, etc. are added or removed from the system and orcallator.se outputs a different number of columns. To work around this problem, upgrade to version 1.32 or later of orcallator.se and orcallator.cfg at http://www.orcaware.com/orca/pub/ which now create new output data file any time the number of columns or a name of a column changes so that Orca will not complain and all of your data will be plotted. 2.2) Warning: file `../orcallator/.../temp-percol-2001-02-22' was current and now is not. First, Orca considers a file to be current if the file's last modified time is within `late_interval' seconds of the current time. In other words, Orca checks if a process is modifying the file to keep it current. The `late_interval' value is determined by the configuration file or set to the `interval' value if the configuration file does not set `late_interval'. Orca stat()s the file when it first looks for files using the `find_files' and determines that the file is current. Any time after that Orca reads the file, it stat()s the file again and determines if it is current. If there was a previous stat() and the file was current followed by another stat() and the file is not current, then the message is printed. The appearance of this message means that the process that has been updating the file has stopped updating it and this may be worth looking into. This message is also seen when the data gathering program, e.g. orcallator.se, opens a new log file at the end of a day and the old log file is no longer updated. Orca tries to manage this situation when a file is no longer updated at the end of a day. If the actual measurement interval is not consistent with Orca's configuration file "interval" 300 seconds, then Orca's "interval" should be modified to match the actual measurement interval. If you need to do this, then delete the RRD files because they will keep the old "interval" and the input data will need to be reloaded into new RRD files. Increasing the "late_interval" may also remove this error. 2.3) Warning: cannot create Orca::HTMLFile object: cannot open `/home/orca_html/o_host1-monthly.html.htm' for writing: Too many open files. This obviously happens with Orca runs out of open file descriptors. Orca opens many file descriptors to do its work and it doesn't like to close them unless it needs to. The first thing to check is the maximum number of file descriptors each process can have. On some systems, the login shell scripts lower the maximum number of open file descriptors a process may have. To check this in a Csh shell variant (csh, tcsh), then type limit descriptors or for Bourne shell variant (sh, bash), then type ulimit -n On all operating systems Orca should be able to use 256 file descriptors. On some, such as Linux, Orca can open 1024 files at once. If the number you are getting is less than 256, then raise this limit. Some operating systems let you raise the limit. such as Solaris, while others do not, such as Linux. To try to raise the limit, do limit descriptors 1024 or ulimit -n 1024 If these commands do not work, ask your system administrator how to do this. There is a bug in Orca's older than 0.27b2 where Orca would not close a pipe file descriptor that is uncompressing a compressed percol-* file to Orca. If your percol-* files are compressed, then try either upgrading to 0.27b2 or later or apply the patch http://www.orcaware.com/orca/pub/patches/orca-0.26-defunct-processes-patch.txt to Orca 0.26. This should have Orca reduce its file descriptor count. 2.4) Warning: file `.../orcallator/host1/orcallator-2001-11-06-000' did exist and is now gone. Orca prints this message when it found an input data file to read and when it goes to read it, which may be a while later, the file no longer exists. When Orca is being used with orcallator.se, this message may occur when orcallator.se compresses the previous day's percol-* or orcallator-* file that Orca found. Solaris/Orcallator.se --------------------- 3.1) What do I do about the error "/opt/RICHPse/bin/se: Unsupported platform: sparcv9 SunOS 5.8"? There are two solutions. SE 3.2 is now available and you can upgrade to this version. It is available at: http://www.setoolkit.com/ If you are using an SE version less than 3.2, then do this: cd /opt/RICHPse/bin ln se.sparc.5.7 se.sparc.5.8 ln se.sparcv9.5.7 se.sparcv9.5.8 3.2) Why does my background orcallator.se process die, in particular when I log out of the system that I started it on? This sounds like orcallator.se is started under the Bourne shell, which kills background processes unless the process is started with nohup: nohup se orcallator.se & 3.3) Why is the page scan rate is always zero even though sar says it is not? It has been observed that on Solaris 2.5.1 the page scan rate is always zero. This occurs with orcallator.se version 1.23 or older. The problem is that on older versions of SE the p_vmstat.scan variable is an integer and orcallator.se was assuming a double. The fix is to upgrade to orcallator.se 1.24 or newer. 3.4) Why are all the NFS server statistics zero? On Solaris 2.6 there is a bug in the SE toolkit that prevents orcallator.se from getting the NFS server statistics. To fix this, edit your RICHPse/include/kstat.se file and change #ifdef to #if near the top of the file where it says #ifdef MINOR_VERSION >= 70 # define rfs_counter_t uint64_t #else to #if MINOR_VERSION >= 70 # define rfs_counter_t uint64_t #else 3.5) Why are my Ethernet bits/second measurements all zero? On 2.5.1 or older Solaris operating system releases, the kernel does not measure the bits/second going through a particular device. I believe some later kernel and device driver patches may fix this, but Solaris 2.6 and greater definitely does measure this. 3.6) Why do I get the error "Fatal: member: txunderruns vanished!: Near line 201"? This problem occurs with SE 2.5.0.2 and FDDI 5.0 interfaces. There are three possible fixes: 1) Upgrade to SE 3.0. 2) Visit the SE 2.5.0.2 download page and get the FDDI patch. 3) Add the latest patch to FDDI which should reinstate the metric that went missing. 3.7) Why do I get the error "Fatal: member: txunderruns0 vanished!: Near line 255"? This problem occurs with Solaris 2.5 and hme interfaces. There are three possible fixes: 1) Upgrade to a later Solaris release. 2) Get the hme patch for Solaris 2.5. 3) As a temporary work-around, change the member "defer" to "missing1" in the ks_hme_network structure in /opt/RICHPse/include/kstat.se like this: #ifdef MINOR_VERSION >= 51 ulong defer; #else ulong missing1; #endif The next build of SE 3.0 will figure this out automatically. 3.8) Why do I get the error "Fatal: member: framming vanished!: Near line 160"? This problem occurs with Solaris 2.5.1 and le interfaces. The le patch for Solaris 2.5.1 corrects the spelling from framming to framing. SE 3.0 tries to detect this patch, but if you don't have the patch directory in /var/sadm/patch it can't tell that the patch is installed. There are three fixes. 1) Upgrade to Solaris 2.6. 2) Reinstall the le patch for Solaris 2.5.1. 3) Create the directory /var/sadm/patch/103903-03 by hand. 4) As a temporary work-around, run scripts using se -DLE_PATCH script.se to force the update or edit start_orcallator.se and enable the appropriate SE_PATCHES variable. 3.9) Why do I get the error "Fatal: member: drop vanished!: Near line 285"? This has been observed on Solaris 2.6 using SE version 3.0. Upgrade to the latest version of SE. 3.10) Why do I get the error "Fatal: member: XXXXX vanished!: Near line YYY"? This happens when Sun changes some of the kernel names for particular variables that the SE package is expecting to find. For example, one error was that SE has some conditional compile flags that tell it to look for the new name. Try adding these to the command line options for SE or editing start_orcallator.sh to enable some of the defines. 1) -DLE_PATCH=1 2) -DHME_PATCH=1 3) -DHME_PATCH_IFSPEED=1 4) -DMULTICAST_PATCH=1 5) -DROBUST_LISTENQ=1 If a particular define does not fix the problem, then don't run SE with it. 3.11) What do I do when I get the message "Fatal: subscript: 2 out of range for: GLOBAL_net[2]: Near line 178"? Try adding the following string "-DMAX_IF=XYZ" where XYZ is larger than the total number of interfaces you have on the system to se's command line. For example, if you had 9 separate interfaces on the system, then you could edit start_orcallator like this: --- start_orcallator.sh.in.FCS Sat Oct 28 15:04:33 2000 +++ start_orcallator.sh.in Thu May 24 08:55:39 2001 @@ -31,6 +31,7 @@ #SE_PATCHES="$SE_PATCHES -DLE_PATCH" #SE_PATCHES="$SE_PATCHES -DHME_PATCH" #SE_PATCHES="$SE_PATCHES -DHME_PATCH_IFSPEED" +SE_PATCHES="$SE_PATCHES -DMAX_IF=10" # Check if the SE executable was found upon configure. if test -z "$SE"; then 3.12) Why don't I get plots for my Veritas filesystems? Version 1.13 or later of orcallator.se should be able to generate filesystem statistics for Veritas filesystems. The first thing to try is to upgrade to the latest SE and orcallator.se release. 3.13) Why don't I get plots for my RSM filesystems? I don't think orcallator.se records data from RSM disks. However, to make sure, can you look through the first line of the output orcallator files and see if the RSM filesystems are listed there? Look for the disk_runp_ string. If you see the filesystems there, then the Orca configuration file needs to be updated to plot this data. In this case, email me the names of the filesystems as listed in the output file and the Orca configuration file can be updated to plot the data. Otherwise, orcallator.se or the underlying SE header files will need to be modified to find RSM filesystems. Since I don't have access to a system with RSM on it, this is something that somebody else will need to take on. Hint, hint... 3.14) Why don't I get any Interface Bits Per Second data for my qe board? Most likely you are either running Solaris 7 or older or Solaris 8 with SE version 3.1 or older. You must run Solaris 8 to get qe bits per second data. Apply the following patches to include/kstat.se and include/netif.se: *** kstat.se.orig Fri Feb 9 01:44:14 2001 -- kstat.se Fri Feb 9 14:31:46 2001 *************** *** 646,651 **** --- 646,661 ---- ulong_t no_tbufs; ulong_t no_rbufs; ulong_t rx_late_collisions; + #if MINOR_VERSION >= 70 + ulong_t rbytes; + ulong_t obytes; + ulong_t multircv; + ulong_t multixmt; + ulong_t brdcstrcv; + ulong_t brdcstxmt; + ulong_t norcvbuf; + ulong_t noxmtbuf; + #endif }; *** netif.se.orig Fri Feb 9 01:45:06 2001 --- netif.se Fri Feb 9 14:31:10 2001 *************** *** 229,236 **** --- 229,241 ---- nocanput = if_qe.nocanput; defer = if_qe.excess_defer; nocarrier = if_qe.nocarrier; + #if MINOR_VERSION >= 70 + ooctets = if_qe.obytes + if_qe.multixmt + if_qe.brdcstxmt; + ioctets = if_qe.rbytes + if_qe.multircv + if_qe.brdcstrcv; + #else ooctets = 0; ioctets = 0; + #endif break; case NETIF_BF: kstat$bf.number$ = number$ - (if_max[NETIF_QE] + 1); 3.15) Orcallator.se core dumps. Some of the SE include files can cause core dumps if there are oddities on the system. Apply the following patches: --- diskinfo.se.orig Fri Jan 12 11:20:09 2001 +++ diskinfo.se Tue Feb 27 19:37:28 2001 @@ -197,7 +197,12 @@ points_at[n] = '\0'; // chop off the :a at the end - strcpy(strrchr(points_at, ':'), ""); + while (--n >= 0) { + if (points_at[n] == ':') { + points_at[n] = '\0'; + break; + } + } // hack off ../../devices from the start sscanf(points_at, "../../devices%s", &points_at); --- mnt_class.se.orig Fri Jan 12 11:20:09 2001 +++ mnt_class.se Tue Feb 27 19:53:34 2001 @@ -96,7 +96,12 @@ number$ = -1; return; } - strcpy(strchr(buf, '\n'), ""); + for (i=0; i