TNPM/datload

From neil.tappsville.com
Jump to navigationJump to search

Simulate a device for Discovery Formula Testing


When dataload goes bad and tries to ping the world

The following runs as root when a discovery is kicked off

qPing '''.'''.'''.''' -t 2.0 -r 1 -s 32 -d 10 -d2 1000 -m 0 -b 0 -o /appl/proviso/dataload


Dump everything that the Dataload knows about / is working on

dialogTest2 [[Model Dump]]

Special characters the dataload cant handle

The discovery may timeout at 7,200 seconds without completing which is typical when SNMP DL and Discovery server are observing interface (sub element) which include special characters as part of ifDescr OID .

Check the ifDescr OIDs on device which failed to be discovered and make sure that not a single interface includes as part of a description , listed below symbols and or combination of symbols such as two sequential "!!"

"!!" or " " : !; @;#;$;%;^;&;*;<;>;?; , ; +; |; [[; ]]; }; { .

The parameters in Proviso 443x that can be used to define the interval to restart watchd

Technote (FAQ)

Question

The Proviso 441x use to have parameters to define the interval for restarting watchd GLOBAL.WATCHMGR.WATCHDOG.PULSEINTERVAL: used for how often to sense the pulse interval. and GLOBAL.WATCHMGR.WATCHDOG.MAXPULSEINTERVAL: used for waiting until before trying to restart the collector. Which parameters in Proviso 4.4.3x can be used to define the interval to restart watchd daemon ? Answer

The analog of the Proviso 441x parameters

GLOBAL.WATCHMGR.WATCHDOG.PULSEINTERVAL and GLOBAL.WATCHMGR.WATCHDOG.MAXPULSEINTERVAL in Proviso 443x are the properties named PERIOD and MAXFLATLINE

PERIOD - Frequency with which to send the heartbeat signal The PETIOD value specifies, in seconds, how often the collector will communicate with the watchdog process and send the heartbeat signal. The default is 30 seconds. In topology editor located under Module Data Channel Component Global Data Channel Properties Location Advanced Properties tab

MAXFLATLINE - Maximum time to wait between two heartbeat signals The MAXFLATLINE value specifies the maximum amount of time, in seconds, the watchdog process will wait between two heartbeat signals. If that time is exceeded, the watchdog process attempts to recycle the collector process. The default is 45 seconds. In topology editor located under Module Data Channel Global Data Channel Properties Location Advanced Properties tab


Proviso 4.4.3x .Changing DATAMANAGER.FC_QUOTA size for single SNMP DL

Technote (FAQ)

Question

My SNMP DL stop creating .BOF files after accumulative files size in / output directory reaches 1GB , how I could increase value for SNMP.DATAMANAGER.FC_QUOTA using TE

. Answer

In order to increase value for SNMPx.x.DATAMANAGER.FC_QUOTA which will allow for SNMP DL to keep creating writing and storing .BOF files beyond

default size of 1 GB in /output directory for example for SNMP.1.1 Please do following : As root user start Topology Editor Load existed topology from database In Logical View tab find – Data Channels Click on — Custom Data Channel Component In the menu bar in the right upper corner for TE find sigh + Click sign + It will start dialog window which includes following fields Name Value Alias

In fields Name type: SNMP.1.1.DATAMANAGER.FC_QUOTA Value: 2000000000 Note : Use maximum value in bytes which you desire for SNMP DL to reach but not exceed Alias: SNMP.1.1.DATAMANAGER.FC_QUOTA Click finish save topology.xml and run the Deployer for installation Bounce all DC components using command Dccmd bounce all Restart pvmd for SNMP.1.1 DL From <DL_HOME>/bin For example /opt/proviso/dataload/bin pvmdmgr stop pvmdmgr start

To check that the SNMP Data Load was updated with the new quota after startup : $ cd <DL_HOME>/bin $ ./statGet –l stats –o Data Manager


The output should include the line : Data Manager:<none>:Filesystem quota (KB) immediate:1953125 Which is 2,000,000,000 /1024 =1953125

Now the SNMP DL will be able to accumulate up to 2GB of .BOF files in /output directory before it will stop producing .BOF files.


Check if a Dataload patch (IF) has been applied / loaded

Apply Patch

Restart Dataload

./dialogTest2 --version

Check Component Build, ie Maui.133 = IF0028 Applied

Change timeout value

There is no available configuration parameter for "timeout" in the Formula Editor GUI, but this value cascades from the configured timeout values in the snmpConf table for a specific device.


For example, the following resmgr output will state the configured timeout for a specific device:

resmgr -export snmpConf -noHead -colNames "dbIndex name state collector scf.ipaddress scf.timeout scf.retries" -filter "scf.ipaddress(<ip address>)"

where the column "scf.timeout" is the configured timeout for a particular device


Functioning of ifSpeed, Utilization and Throughput.Release notes =

Abstract How to determine Interfaces utilization and Throughput?

Content We set the speed of the interface subelement by the value that is returned from the ifSpeed OID during inventory. We can also evaluate the speed and classify them as either high or low speed interfaces.

Then we use this speed property or even the current ifSpeed OID value in our formulas.Utilization uses speed in its calculation.

For utilization we take Total Passed / Total Possible 100 to make it a %. In other words we get the volume of data passed since last poll. (delta of ifInOctets) 8 (to convert it to bits) divided by delta of time since last poll. (converts to bits pers second bps) divided by ifSpeed * 100 for a percentage.

(((ifInOctets 8) / 900) / ifSpeed) 100 = Interface Utilization %.

For Throughput we measure the volume of data passed over time since last poll. So like in the example above we will use ifInOctets.

((ifInOctets * 8) / 900) = bits per second (Throughput)

For throughput we measure the volume of data passed over the time since last poll, and we then break that down to bits per second ( I believe) so we can say that over the last poll period you averaged 7 Mbps (Megabits per second) through this 10 Mbps interface. So that would also break down to 70% utilized and 63,000 bits or a total volume of 7.8MBytes passed over the 15 minutes.

(These are not exact numbers just for an example)

Since we're using sysUpTime.0 (the number of hundreths of a second the router has been up) we actually are calculating the following 
Poll Time.1 - Poll Time.0 / 100 = Seconds elapsed

Poll Value.1 - Poll Value.0 = Bytes passed


So we take the bytes passed and divide by the seconds elapsed to get a Bytes per second. We convert bytes to bits (multiply by 8)

Now we have the bits per second.

Take the bits per second and divide by the property.ifSpeed and you have the ratio of utilization, We then multiply by 100 to get the percent utilization.



Debug a collection

.
Setting debug level to 6 for perticular task ID:
.
1) start with: statGet > statget.out
.
2) then we need to grep through that output for the target device,
(probably the sub-element too), and the formula
.
For example, if we are looking for an element 9.42.27.118 and the
formula "Inbound Errors"

[[pvuser@tiger2:/tmp]] grep "9.42.27.118" statGet.out  | grep "Inbound
Errors"
+ [[15]] ID 1016,{CAL none (peri=900)(next=2012/03/29 18:30:00)}(P2)
ASLEEP: [[Service FormLite]] ... Elmt=9.42.27.118;Metrics={Inbound
Errors,Unknown
Protocols};[[Sub Elmts]]={qarouter1.tivlab.raleigh.ibm.com_If<1>}
[[pvuser@tiger2:/tmp]]
.
The task ID: 1016

3) then enable debug on just that task:

/opt/dataload/contribs/dialogTest2 debug 6.{[[Task Number]]}

for example: /opt/dataload/contribs/dialogTest2 debug 6.1016
.
Run the task at this debug level until we see the peak value then
change it back to normal.
.
4) Unset the debug by running the following:

$PVMHOME/contribs/dialogTest2 debug d.{task id}
.
for example: /opt/dataload/contribs/dialogTest2 debug d.1016
.

Admin tools =

Some tools we've developed / used over the years to manage multiple >20 dataloads :)


Cluster Shell

Reminder: lives in /appl/proviso/data/tools

Run Cluster SSH to - open a session to all dataloads

Will need cssh cssh.pl



relies on a .csshrc file that lives in $HOME

need to modify the ssh command to to remove '-o Connect Timeout=10' which isnt supported and add the path to the xterm command


#!/bin/bash
# Open shells on all dataloads
# ENHANCED VERSION....
# 201308 - stolen from the new 'run''command''on_dataloaders.sh
# usage: open''shell''to''all''dataloads.sh


printf "Gathering information, please wait\n"

. /appl/proviso/datachannel/dataChannel.env
dcoutput=`dccmd debug CMGR "self dbCfgPrint" | egrep -i "$FTE.'''SOURCE.'''SNMP" | cut -d "@" -f2 | sort -n -t "." -k 3`


# Takes a command to run and runs it on each dataloader...
dl_list=""
for entry in $dcoutput
do
: # will look like : STGPROVISODL2//appl/proviso/dataload/SNMP.1.2/output
: host=`echo $entry | cut -d"/" -f1`
: dl''list="$dl''list $host"
done
printf "Dataload List is : $dl_list\n"
printf "Executing cssh\n"
/appl/proviso/data/tools/cssh.pl $dl_list

.csshrc

auto_quit=yes
command=
comms=ssh
console_position=
extra''cluster''file=
history_height=10
history_width=40
key_addhost=Control-Shift-plus
key_clientname=Alt-n
key_history=Alt-h
key_paste=Control-v
key_quit=Control-q
key_retilehosts=Alt-r
max''addhost''menu''cluster''items=6
max''host''menu_items=30
menu''host''autotearoff=0
menu''send''autotearoff=0
method=ssh
mouse_paste=Button-2
rsh_args=
screen''reserve''bottom=60
screen''reserve''left=0
screen''reserve''right=0
screen''reserve''top=0
send''menu''xml''file=/appl/proviso/.csshrc''send_menu
show_history=0
ssh=/bin/ssh
ssh_args= -x
telnet_args=
terminal=/usr/openwin/bin//xterm
terminal''allow''send_events=-xrm '*.VT100.allowSendEvents:true'
terminal_args=
terminal''bg''style=dark
terminal_colorize=1
terminal''decoration''height=10
terminal''decoration''width=8
terminal_font=fixed
terminal''reserve''bottom=0
terminal''reserve''left=5
terminal''reserve''right=0
terminal''reserve''top=5
terminal_size=80x24
terminal''title''opt=-T
title=CSSH
unmap''on''redraw=no
use_hotkeys=yes
window_tiling=yes
window''tiling''direction=right


Run Command on dataloads

Reminder: lives in /appl/proviso/scripts


#!/bin/bash
# Run command on all SNMP dataloads
# ENHANCED VERSION....
# 201307 - Neil and Tomas
# usage: run''command''on_dl.sh "command && command2 && command3"

# REMINDER: x && y, if x gracefully exits run y
# REMINDER: x ; y, no matter how x goes, run y

command=$1

printf "Gathering information, please wait\n"

. /appl/proviso/datachannel/dataChannel.env
dcoutput=`dccmd debug CMGR "self dbCfgPrint" | egrep -i "$FTE.'''SOURCE.'''SNMP" | cut -d "@" -f2 | sort -n -t "." -k 3`

printf "\nRunning command: $command\n"

# Takes a command to run and runs it on each dataloader...

for entry in $dcoutput
do
: # will look like : STGPROVISODL2//appl/proviso/dataload/SNMP.1.2/output
: host=`echo $entry | cut -d"/" -f1`
: dlnum=`echo $entry | sed "s/.'''SNMP/SNMP/" | sed "s/\/.'''//"`
: printf "\n############# $dlnum\t$host     #############\n "
: dl_path=`echo $entry | cut -d"/" -f2- | sed "s/\/SNMP.*//"`
: dl''cmd=". $dl''path/dataLoad.env && $command"
: ssh -q $host "$dl_cmd"
done

The above script makes life easy, restarting all DL becomes a simple as

./run''command''on_dataloaders.sh "pvmdmgr stop; sleep 5; pvmdmgr start"


Dataload Watchdog interval

Each dataload has its own watchdog the following settings can be set to change how often it trys to restart the dl (ie its too busy)

Parameters used for defining the interval are:
 GLOBAL.WATCHMGR.WATCHDOG.PULSEINTERVAL: This parameter is used for how often sense it.
GLOBAL.WATCHMGR.WATCHDOG.MAXPULSEINTERVAL: This parameter is used for,
 wait till this time before trying to restart the collector.

Simulate a Virtual Device (only for discovery formula testing

We can use `simTool` to 'walk' the MIB structure of an SNMP agent on a device, redirect it to a file and then load it on to Collector for further analysis, such as troubleshooting Discovery formulae.

1) Create the file by running `simTool` against the device and redirect the output 
$ ../dataload/contribs/simTool walk {IP} 1 -c {Read Community} -V V2c -b > simtool_file.txt



the above command would 'walk' the entire MIB on the device


For a full list of command options, please check the 'help' page by typing 
$ ../dataload/contribs/simTool -?
2) load the file 
Place the simulation fil under $PVMHOME/tmp

Full path will be (in our example): /opt/dataload/tmp/simCiscoIPT.txt

Naming of the file is not important. The IP address, SNMP port, and read community will be set by the format of the key, the simulation will be attached to. Changing the SNMP port, or the community is required by some use case around Discovery server, but does not serve any purpose for formula testing. This will be left to the default values (port=161, community=public). The key will then only be the IP address we want to simulate (could be different from the original IP adress the simulation was taken from) 
We'll use IP = 10.0.0.1 in that example

$ ../dataload/contribs/simTool load 10.0.0.1 /opt/dataload/tmp/simtool_file -S localhost -P 3002 file '/opt/dataload/tmp/simtool_file.txt' sucessfully loaded into key '10.0.0.1'.

Verify that the simulation is present with the 'show' command

$ ../dataload/contribs/simTool show -S localhost -P 3002

10.0.0.1 /opt/dataload/tmp/simtool_file.txt

Note that if the SNMP collector is at debug level 6, you'll start seeing messages 
1228334100 2008.12.03-19.55.00 SNMP.1.1-10782:1917 4 ELECTED_SNMPJOB About to start Big SNMPJob on agent 10.0.0.1:161 with: SIMULATED, R/W='public/private', t/r=2.00/2

in the SNMP.log.


The word SIMULATED indicates that NO SNMP traffic will take place for that task (even on IPs that are/were valid for the network), and that all data will be taken from the simulations currently loaded.

In this example, if IP 9.34.239.114 does not match a key from the 'simTool show' command, then a SNMP_TIMEOUT will be assumed, and very likely, no data will be produced by the formulas.

3) Test the 'virtual' device by running `snmpGet sysName.0` 
$ snmpGet 10.0.0.1 sysName.0
Caveats 
Virtual devices created by `simTool` are useful for Discovery formulae testing and development, but not recommended if we want to run collections against them, as it only support 'static' OIDs, and won't work with SNMP timetick based ones, such as deltas for 'sysUpTime.0'

SNMP COLLECTOR FAILS TO LOAD MIB FILES FOR OIDS CONTAINING NUMBERS IN THEIR NAME IV45805 2013-12-23

errors like 'GYMDL30116W Empty or missing MIB file'generated by all dataloads within the environment. The source MIB was already recompiled .By recompiling a confReload pvmMibs was done automatically. But nevertheless the error still occours.

1.3.1.0-TIV-TNPM-IF0068


Conf reload not reloading dataloads IV45083

Problem Description : Conf Reload not Reloading the chnages TNPM Version : Tivoli Netcool Performance Manager 1.3.1 Operating System: Sun Solaris

Data loader is not able to poll on the updated metric, following the changes done to the existing one, inspite of running confReload as below, and collection still happens as per the old formula data.

confReload PVMFormulas confReload PVMRequests confReload PVMMibs

If updated formula changes are not getting into effect then pvmdmgr stop & pvmdmgr start would be done and from then on collection happens as per the changes done. However, restarting the pvmdmgr process on Live environment for every metric change is not suggestible becasue there can be cases where the metric should be polled from multiple loaders, and restart should be done from all the loaders which involves lot of risk. Instead, Conf Reload should do the refresh. Hence please look into this.

1.3.1.0-TIV-TNPM-IF0068 1.3.2.0-TIV-TNPM-IF0053

1.3.1.0-TIV-TNPM-IF0068 also fixes IV51038 -2013-12-23 -->

The IP Address caching mechanism is not thread safe. There is a race condition when refreshing the cache in which IP Address objects are being deleted before all references are relinquished.

The problem is with "IPAddress::getIPAddress"






Untested from here -->

snoop command

snoop -o /tmp/snoop4.cap -rvV 172.20.7.136 port 162

The above command captures in verbose mode all packets sent to/from 172.20.7.136 via port 162.

statget Commands

statGet -l stats -o Targets statGet -l stats -o Scheduler statGet -l stats -o Targets -i Total statGet -l stats -o Data Manager statGet -l stats -o Targets -c "SNMP Availability (%) last hour"


TNPM snpmGets

date -u; snmpGet 172.17.136.73 ifHCOutOctets.* -m rfc2233-IF-MIB.oid; date -u date -u; snmpGet 172.17.136.73 sysName.0; date -u

NET SNMP snmpgets

date -u; snmpget -v2c 172.17.136.73 ifHCOutOctets.13 -c HuhaFuDAt3AgU4A2; date -u date -u; snmpbulkget -v2c 172.17.136.73 ifHCOutOctets.* -c HuhaFuDAt3AgU4A2; date -u date -u; snmpbulkwalk -v2c 172.17.136.73 ifHCOutOctets.* -c HuhaFuDAt3AgU4A2; date -u

View Scheduler Load

source /opt/dataload/dataLoad.env; statGet -l stats -o Scheduler

Number of devices the DL needs to monitor

$ ../dataload/bin/statGet -l stats -o Targets -c "SNMP Availability (%) last hour" | grep -v ": -" |wc -l


Average SNMP Availability for all the devices over the last hour $ ../dataload/bin/statGet -l stats -o Targets -c "SNMP Availability (%) grep -v "-0 last hour"$" Targets:_Total:SNMP Availability (%) last hour:0.66079295


Important parameters for the DL Scheduler

$ ../dataload/bin/statGet -l stats -o Scheduler
Scheduler::Collections Priority Mode (Nb) immediate:0
Scheduler::Execute external requests (Nb) immediate:1
Scheduler::Execute internal requests (Nb) immediate:1
Scheduler::Items Processed Average (Nb) last 24 hours: -
Scheduler::Items Processed Average (Nb) last hour: -
Scheduler::Items Processing Rate (Nb/s) last 24 hours: -
Scheduler::Items Processing Rate (Nb/s) last hour: -
Scheduler::Items Scheduled (Nb) immediate:53325
Scheduler::Max Items Scheduled (Nb) immediate:96000
Scheduler::Overflow Risk Ratio (%) last hour:2585.09977947
Scheduler::Queue Max Size (Nb) last 24 hours:53245
Scheduler::Queue Max Size (Nb) last hour:53244
Scheduler::Queue Size (Nb) immediate:53052
Scheduler::Threads (Nb) immediate:92
Scheduler::Threads Availability (%) last 24 hours:5.27955552
Scheduler::Threads Availability (%) last hour:3.82929909
Scheduler::internal errors (Nb) cumul:0


DLPERFSUMMARY Messages

The SNMP Data Load component produces a "DLPERFSUMMARY" log message at the top of every hour that details the prior hour’s performance metrics. A typical example of these would be 
 1301994068 2011.04.05-09.01.08 SNMP.1.1-11888:0 I DL31050 DLPERFSUMMARY 2011.04.05-09.00.00 items: 1080, threadAvail: 100, overflowRisk: 1, expMeas: 12597, prodMeas: 7223, respTime: 156, snmpSuccess: 95, outPduDisc: 0, inPduDisc: 0, outPduTot: 2958, inPduTot: 2524, upTime: 1038061, quotaAvail: 2883463, memUsed: 110157824, subElmts: 251, metrics: 1483, requests: 405, dbAvail: 100


What each field can means can be found on the following table:

Field​ Name​ Description​
1)​ Timestamp (UNIX format)​ 32-bit integer, number of seconds since UNIX epoch.​
2​) Timestamp (string)​ String representation of field 1.​
3​) Process Name, PID​ Indicates channel and collector number, and UNIX process id.​
4​) Message Severity​
F - fatal, typically an unrecoverable error​
E - error, unexpected condition affecting operation​
W - warning, error condition, may affect operation​
I - informational messages​
1 - debug, some detail​
2 - debug, more detail​
3 - debug, most detail​
>3 - trace level, extremely detailed​
5)​ Message Id​ A numeric message identifier.​
6​) Message Tag​ A string-based message identifier.​
7​) Timestamp (data hour)​ The data hour just completed, in local time.​
8+9)​ items​ Number of periodic tasks the SNMP [[Data Load]] is performing.​
10+11)​ threadAvail​ A percentage indicator reflecting the ability of the SNMP [[Data Load]] to meet the schedule of tasks. 100% means all tasks are completed on schedule.​
12+13​) overflowRisk​ A percentage indicator, averaged across all tasks, reflecting the likelihood that the SNMP [[Data Load]] may not meet scheduled task deadlines. 0% means no risk.​
14+15​) expMeas​ Number of measures (metrics) the SNMP [[Data Load]] expects it should produce for the data hour.​
16+17​) prodMeas​ Number of measures (metrics) the SNMP [[Data Load]] actually produced for the data hour.​
18+19​) respTime​ Average response time for all polled devices.​
20+21​) snmpSuccess​ Percentage reflecting the success of SNMP requests.​
22+23​) outPduDisc​ Number of PDUs sent for discovery.​
24+25​) inPduDisc​ Number of PDUs received for discovery.​
26+27​) outPduTot​ Total number of PDUs sent, collection and discovery.​
28+29​) inPduTot​ Total number of PDUs received, collection and discovery.​
30+31​) upTime​ Amount of time (seconds) the SNMP [[Data Load]] process has been running.​
32+33​) quotaAvail​ Amount of disk space (bytes) available within the configured quota.​
34+35​) memUsed​ Amount of physical memory (bytes) used by the SNMP [[Data Load]].​
36+37​) subElmts​ Number of sub-elements loaded from database.​
38+39​) metrics​ Number of metrics (formula) loaded from database.​
40+41​) requests​ Number of requests loaded from database.​
42+43​) dbAvail​ Percentage reflecting the availability of the database.​
A six hour sample would like like the following 
1167663669 2007.01.01-15.01.09 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-09.00.00 items: 36266, threadAvail: 100, overflowRisk: 11, expMeas: 1856318, prodMeas: 1845220, respTime: 505, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 202463, inPduTot: 200218, upTime: 1010860, quotaAvail: 976562, memUsed: 1405362176, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100
1167667201 2007.01.01-16.00.01 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-10.00.00 items: 36266, threadAvail: 100, overflowRisk: 16, expMeas: 1856316, prodMeas: 1836217, respTime: 392, snmpSuccess: 99, outPduDisc: 1275, inPduDisc: 1275, outPduTot: 202922, inPduTot: 200857, upTime: 1014390, quotaAvail: 976561, memUsed: 1403617280, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100
1167670863 2007.01.01-17.01.03 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-11.00.00 items: 36266, threadAvail: 100, overflowRisk: 9, expMeas: 1856302, prodMeas: 1837648, respTime: 514, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 201699, inPduTot: 199505, upTime: 1018054, quotaAvail: 976562, memUsed: 1405378560, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100
1167674469 2007.01.01-18.01.09 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-12.00.00 items: 36266, threadAvail: 100, overflowRisk: 9, expMeas: 1856312, prodMeas: 1841711, respTime: 489, snmpSuccess: 100, outPduDisc: 41015, inPduDisc: 40905, outPduTot: 242734, inPduTot: 240676, upTime: 1021660, quotaAvail: 976561, memUsed: 1405394944, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100
1167678066 2007.01.01-19.01.06 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-13.00.00 items: 36266, threadAvail: 100, overflowRisk: 14, expMeas: 1855805, prodMeas: 1832211, respTime: 440, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 200775, inPduTot: 198783, upTime: 1025257, quotaAvail: 976365, memUsed: 1405222912, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100
1167681664 2007.01.01-20.01.04 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-14.00.00 items: 36266, threadAvail: 100, overflowRisk: 11, expMeas: 1855798, prodMeas: 1836721, respTime: 528, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 201658, inPduTot: 199664, upTime: 1028855, quotaAvail: 976561, memUsed: 1405329408, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100


So we've got 
1. At 16:00 (UTC) the inPduDisc and outPduDisc are non-zero, this indicates a network discovery took place during this hour. 1275 PDUs is a relatively small number for a discovery, so the device(s) discovered were either small, or not numerous. Note: on the PMG platform, this discovery was for a Juniper ERX device with approx 8000 resources.

2. At 18:01 (UTC) again the inPduDisc and outPduDisc are non-zero indicating another network discovery took place during this hour. This hour the discovery PDU count is much higher, over 40,000 indicating a much larger Inventory Profile. Note: on PMG this discovery was for a profile with 42 devices, each with 500 interfaces. 3. Also at 18:01 (UTC) we observe the outPduDisc and inPduDisc are not equal; this could indicate either some packet drop or device(s) not responding to SNMP requests. Note: on PMG this reflects Inventory Profile negotiating the SNMP community and SNMP version, trying V2c and V3. 4. Uptime was increasing across the sampled interval. This indicates that the SNMP Data Load did not fail and require a restart. 5. The ratio of Produced Measures to Expected Measures was (at least) 100%. 6. Thread Availability remained constant at 100% across the sampled interval. This indicates that the SNMP Data Load was able to meet the scheduling demand for collection. 7. Overflow Risk was in the 9% - 16% range. This is acceptable and does not indicate any great risk that the SNMP Data Load will fail to complete its task load. 8. Memory Usage was fairly constant. Additional memory is sometimes required by the SNMP Data Load during a network discovery. The SNMP Data Load dynamically allocates and frees memory for discovered objects. The constant values here indicate there is no memory management issues (i.e., memory leak). 9. Quota Available was fairly constant. Though not a direct indicator of any success/failure or application performance metric of the SNMP Data Load, it indicates that the rest of the Netcool/Proviso sub-channel (FTE, CME) are probably running properly, retrieving and processing the SNMP Data Load output files


DATAMANAGER.FC_QUOTA

$ ./statGet –l stats –o Data Manager
; The output should include the line  : Data Manager::Filesystem quota (KB) immediate:1953125

Which is 2,000,000,000 /1024 =1953125

; You can also check that value propagated properly and is listed in database using following SQL command  : select * from reg$dcconfig where str''path ‘%FC''QUOTA%’ order by str_path;
/contribs/dialogTest2 [[Conf Dump]] All | grep FC_QUO

Run "statGet -o Data Manager -l stats |grep Filesystem|egrep '(quota)'" and multiply result by 1024