TNPM/datload
Simulate a device for Discovery Formula Testing
When dataload goes bad and tries to ping the world
The following runs as root when a discovery is kicked off
qPing '''.'''.'''.''' -t 2.0 -r 1 -s 32 -d 10 -d2 1000 -m 0 -b 0 -o /appl/proviso/dataload
Dump everything that the Dataload knows about / is working on
dialogTest2 [[Model Dump]]
Contents
- 1 Special characters the dataload cant handle
- 2 The parameters in Proviso 443x that can be used to define the interval to restart watchd
- 3 Proviso 4.4.3x .Changing DATAMANAGER.FC_QUOTA size for single SNMP DL
- 4 Check if a Dataload patch (IF) has been applied / loaded
- 5 Change timeout value
- 6 Functioning of ifSpeed, Utilization and Throughput.Release notes =
- 7 Debug a collection
- 8 Admin tools =
- 9 Dataload Watchdog interval
- 10 Simulate a Virtual Device (only for discovery formula testing
- 11 SNMP COLLECTOR FAILS TO LOAD MIB FILES FOR OIDS CONTAINING NUMBERS IN THEIR NAME IV45805 2013-12-23
- 12 Conf reload not reloading dataloads IV45083
Special characters the dataload cant handle
The discovery may timeout at 7,200 seconds without completing which is typical when SNMP DL and Discovery server are observing interface (sub element) which include special characters as part of ifDescr OID .
Check the ifDescr OIDs on device which failed to be discovered and make sure that not a single interface includes as part of a description , listed below symbols and or combination of symbols such as two sequential "!!"
"!!" or " " : !; @;#;$;%;^;&;*;<;>;?; , ; +; |; [[; ]]; }; { .
The parameters in Proviso 443x that can be used to define the interval to restart watchd
Technote (FAQ)
Question
The Proviso 441x use to have parameters to define the interval for restarting watchd GLOBAL.WATCHMGR.WATCHDOG.PULSEINTERVAL: used for how often to sense the pulse interval. and GLOBAL.WATCHMGR.WATCHDOG.MAXPULSEINTERVAL: used for waiting until before trying to restart the collector. Which parameters in Proviso 4.4.3x can be used to define the interval to restart watchd daemon ? Answer
The analog of the Proviso 441x parameters
GLOBAL.WATCHMGR.WATCHDOG.PULSEINTERVAL and GLOBAL.WATCHMGR.WATCHDOG.MAXPULSEINTERVAL in Proviso 443x are the properties named PERIOD and MAXFLATLINE
PERIOD - Frequency with which to send the heartbeat signal The PETIOD value specifies, in seconds, how often the collector will communicate with the watchdog process and send the heartbeat signal. The default is 30 seconds. In topology editor located under Module Data Channel Component Global Data Channel Properties Location Advanced Properties tab
MAXFLATLINE - Maximum time to wait between two heartbeat signals The MAXFLATLINE value specifies the maximum amount of time, in seconds, the watchdog process will wait between two heartbeat signals. If that time is exceeded, the watchdog process attempts to recycle the collector process. The default is 45 seconds. In topology editor located under Module Data Channel Global Data Channel Properties Location Advanced Properties tab
Proviso 4.4.3x .Changing DATAMANAGER.FC_QUOTA size for single SNMP DL
Technote (FAQ)
Question
My SNMP DL stop creating .BOF files after accumulative files size in / output directory reaches 1GB , how I could increase value for SNMP.DATAMANAGER.FC_QUOTA using TE
. Answer
In order to increase value for SNMPx.x.DATAMANAGER.FC_QUOTA which will allow for SNMP DL to keep creating writing and storing .BOF files beyond
default size of 1 GB in /output directory for example for SNMP.1.1 Please do following : As root user start Topology Editor Load existed topology from database In Logical View tab find – Data Channels Click on — Custom Data Channel Component In the menu bar in the right upper corner for TE find sigh + Click sign + It will start dialog window which includes following fields Name Value Alias
In fields Name type: SNMP.1.1.DATAMANAGER.FC_QUOTA Value: 2000000000 Note : Use maximum value in bytes which you desire for SNMP DL to reach but not exceed Alias: SNMP.1.1.DATAMANAGER.FC_QUOTA Click finish save topology.xml and run the Deployer for installation Bounce all DC components using command Dccmd bounce all Restart pvmd for SNMP.1.1 DL From <DL_HOME>/bin For example /opt/proviso/dataload/bin pvmdmgr stop pvmdmgr start
To check that the SNMP Data Load was updated with the new quota after startup : $ cd <DL_HOME>/bin $ ./statGet –l stats –o Data Manager
The output should include the line : Data Manager:<none>:Filesystem quota (KB) immediate:1953125 Which is 2,000,000,000 /1024 =1953125
Now the SNMP DL will be able to accumulate up to 2GB of .BOF files in /output directory before it will stop producing .BOF files.
Check if a Dataload patch (IF) has been applied / loaded
Apply Patch
Restart Dataload
./dialogTest2 --version
Check Component Build, ie Maui.133 = IF0028 Applied
Change timeout value
There is no available configuration parameter for "timeout" in the Formula Editor GUI, but this value cascades from the configured timeout values in the snmpConf table for a specific device.
For example, the following resmgr output will state the configured timeout for a specific device:
resmgr -export snmpConf -noHead -colNames "dbIndex name state collector scf.ipaddress scf.timeout scf.retries" -filter "scf.ipaddress(<ip address>)"
where the column "scf.timeout" is the configured timeout for a particular device
Functioning of ifSpeed, Utilization and Throughput.Release notes =
Abstract How to determine Interfaces utilization and Throughput?
Content We set the speed of the interface subelement by the value that is returned from the ifSpeed OID during inventory. We can also evaluate the speed and classify them as either high or low speed interfaces.
Then we use this speed property or even the current ifSpeed OID value in our formulas.Utilization uses speed in its calculation.
For utilization we take Total Passed / Total Possible 100 to make it a %. In other words we get the volume of data passed since last poll. (delta of ifInOctets) 8 (to convert it to bits) divided by delta of time since last poll. (converts to bits pers second bps) divided by ifSpeed * 100 for a percentage.
(((ifInOctets 8) / 900) / ifSpeed) 100 = Interface Utilization %.
For Throughput we measure the volume of data passed over time since last poll. So like in the example above we will use ifInOctets.
((ifInOctets * 8) / 900) = bits per second (Throughput)
For throughput we measure the volume of data passed over the time since last poll, and we then break that down to bits per second ( I believe) so we can say that over the last poll period you averaged 7 Mbps (Megabits per second) through this 10 Mbps interface. So that would also break down to 70% utilized and 63,000 bits or a total volume of 7.8MBytes passed over the 15 minutes.
(These are not exact numbers just for an example)
- Since we're using sysUpTime.0 (the number of hundreths of a second the router has been up) we actually are calculating the following
- Poll Time.1 - Poll Time.0 / 100 = Seconds elapsed
Poll Value.1 - Poll Value.0 = Bytes passed
So we take the bytes passed and divide by the seconds elapsed to get a Bytes per second. We convert bytes to bits (multiply by 8)
Now we have the bits per second.
Take the bits per second and divide by the property.ifSpeed and you have the ratio of utilization, We then multiply by 100 to get the percent utilization.
Debug a collection
. Setting debug level to 6 for perticular task ID: . 1) start with: statGet > statget.out . 2) then we need to grep through that output for the target device, (probably the sub-element too), and the formula . For example, if we are looking for an element 9.42.27.118 and the formula "Inbound Errors" [[pvuser@tiger2:/tmp]] grep "9.42.27.118" statGet.out | grep "Inbound Errors" + [[15]] ID 1016,{CAL none (peri=900)(next=2012/03/29 18:30:00)}(P2) ASLEEP: [[Service FormLite]] ... Elmt=9.42.27.118;Metrics={Inbound Errors,Unknown Protocols};[[Sub Elmts]]={qarouter1.tivlab.raleigh.ibm.com_If<1>} [[pvuser@tiger2:/tmp]] . The task ID: 1016 3) then enable debug on just that task: /opt/dataload/contribs/dialogTest2 debug 6.{[[Task Number]]} for example: /opt/dataload/contribs/dialogTest2 debug 6.1016 . Run the task at this debug level until we see the peak value then change it back to normal. . 4) Unset the debug by running the following: $PVMHOME/contribs/dialogTest2 debug d.{task id} . for example: /opt/dataload/contribs/dialogTest2 debug d.1016 .
Admin tools =
Some tools we've developed / used over the years to manage multiple >20 dataloads :)
Cluster Shell
Reminder: lives in /appl/proviso/data/tools
Run Cluster SSH to - open a session to all dataloads
Will need cssh cssh.pl
relies on a .csshrc file that lives in $HOME
need to modify the ssh command to to remove '-o Connect Timeout=10' which isnt supported and add the path to the xterm command
#!/bin/bash # Open shells on all dataloads # ENHANCED VERSION.... # 201308 - stolen from the new 'run''command''on_dataloaders.sh # usage: open''shell''to''all''dataloads.sh printf "Gathering information, please wait\n" . /appl/proviso/datachannel/dataChannel.env dcoutput=`dccmd debug CMGR "self dbCfgPrint" | egrep -i "$FTE.'''SOURCE.'''SNMP" | cut -d "@" -f2 | sort -n -t "." -k 3` # Takes a command to run and runs it on each dataloader... dl_list="" for entry in $dcoutput do : # will look like : STGPROVISODL2//appl/proviso/dataload/SNMP.1.2/output : host=`echo $entry | cut -d"/" -f1` : dl''list="$dl''list $host" done printf "Dataload List is : $dl_list\n" printf "Executing cssh\n" /appl/proviso/data/tools/cssh.pl $dl_list
.csshrc
auto_quit=yes command= comms=ssh console_position= extra''cluster''file= history_height=10 history_width=40 key_addhost=Control-Shift-plus key_clientname=Alt-n key_history=Alt-h key_paste=Control-v key_quit=Control-q key_retilehosts=Alt-r max''addhost''menu''cluster''items=6 max''host''menu_items=30 menu''host''autotearoff=0 menu''send''autotearoff=0 method=ssh mouse_paste=Button-2 rsh_args= screen''reserve''bottom=60 screen''reserve''left=0 screen''reserve''right=0 screen''reserve''top=0 send''menu''xml''file=/appl/proviso/.csshrc''send_menu show_history=0 ssh=/bin/ssh ssh_args= -x telnet_args= terminal=/usr/openwin/bin//xterm terminal''allow''send_events=-xrm '*.VT100.allowSendEvents:true' terminal_args= terminal''bg''style=dark terminal_colorize=1 terminal''decoration''height=10 terminal''decoration''width=8 terminal_font=fixed terminal''reserve''bottom=0 terminal''reserve''left=5 terminal''reserve''right=0 terminal''reserve''top=5 terminal_size=80x24 terminal''title''opt=-T title=CSSH unmap''on''redraw=no use_hotkeys=yes window_tiling=yes window''tiling''direction=right
Run Command on dataloads
Reminder: lives in /appl/proviso/scripts
#!/bin/bash # Run command on all SNMP dataloads # ENHANCED VERSION.... # 201307 - Neil and Tomas # usage: run''command''on_dl.sh "command && command2 && command3" # REMINDER: x && y, if x gracefully exits run y # REMINDER: x ; y, no matter how x goes, run y command=$1 printf "Gathering information, please wait\n" . /appl/proviso/datachannel/dataChannel.env dcoutput=`dccmd debug CMGR "self dbCfgPrint" | egrep -i "$FTE.'''SOURCE.'''SNMP" | cut -d "@" -f2 | sort -n -t "." -k 3` printf "\nRunning command: $command\n" # Takes a command to run and runs it on each dataloader... for entry in $dcoutput do : # will look like : STGPROVISODL2//appl/proviso/dataload/SNMP.1.2/output : host=`echo $entry | cut -d"/" -f1` : dlnum=`echo $entry | sed "s/.'''SNMP/SNMP/" | sed "s/\/.'''//"` : printf "\n############# $dlnum\t$host #############\n " : dl_path=`echo $entry | cut -d"/" -f2- | sed "s/\/SNMP.*//"` : dl''cmd=". $dl''path/dataLoad.env && $command" : ssh -q $host "$dl_cmd" done
The above script makes life easy, restarting all DL becomes a simple as
./run''command''on_dataloaders.sh "pvmdmgr stop; sleep 5; pvmdmgr start"
Dataload Watchdog interval
Each dataload has its own watchdog the following settings can be set to change how often it trys to restart the dl (ie its too busy)
Parameters used for defining the interval are: GLOBAL.WATCHMGR.WATCHDOG.PULSEINTERVAL: This parameter is used for how often sense it. GLOBAL.WATCHMGR.WATCHDOG.MAXPULSEINTERVAL: This parameter is used for, wait till this time before trying to restart the collector.
Simulate a Virtual Device (only for discovery formula testing
We can use `simTool` to 'walk' the MIB structure of an SNMP agent on a device, redirect it to a file and then load it on to Collector for further analysis, such as troubleshooting Discovery formulae.
- 1) Create the file by running `simTool` against the device and redirect the output
- $ ../dataload/contribs/simTool walk {IP} 1 -c {Read Community} -V V2c -b > simtool_file.txt
the above command would 'walk' the entire MIB on the device
- For a full list of command options, please check the 'help' page by typing
- $ ../dataload/contribs/simTool -?
- 2) load the file
- Place the simulation fil under $PVMHOME/tmp
Full path will be (in our example): /opt/dataload/tmp/simCiscoIPT.txt
- Naming of the file is not important. The IP address, SNMP port, and read community will be set by the format of the key, the simulation will be attached to. Changing the SNMP port, or the community is required by some use case around Discovery server, but does not serve any purpose for formula testing. This will be left to the default values (port=161, community=public). The key will then only be the IP address we want to simulate (could be different from the original IP adress the simulation was taken from)
- We'll use IP = 10.0.0.1 in that example
$ ../dataload/contribs/simTool load 10.0.0.1 /opt/dataload/tmp/simtool_file -S localhost -P 3002 file '/opt/dataload/tmp/simtool_file.txt' sucessfully loaded into key '10.0.0.1'.
Verify that the simulation is present with the 'show' command
$ ../dataload/contribs/simTool show -S localhost -P 3002
10.0.0.1 /opt/dataload/tmp/simtool_file.txt
- Note that if the SNMP collector is at debug level 6, you'll start seeing messages
- 1228334100 2008.12.03-19.55.00 SNMP.1.1-10782:1917 4 ELECTED_SNMPJOB About to start Big SNMPJob on agent 10.0.0.1:161 with: SIMULATED, R/W='public/private', t/r=2.00/2
in the SNMP.log.
The word SIMULATED indicates that NO SNMP traffic will take place for that task (even on IPs that are/were valid for the network), and that all data will be taken from the simulations currently loaded.
In this example, if IP 9.34.239.114 does not match a key from the 'simTool show' command, then a SNMP_TIMEOUT will be assumed, and very likely, no data will be produced by the formulas.
- 3) Test the 'virtual' device by running `snmpGet sysName.0`
- $ snmpGet 10.0.0.1 sysName.0
- Caveats
- Virtual devices created by `simTool` are useful for Discovery formulae testing and development, but not recommended if we want to run collections against them, as it only support 'static' OIDs, and won't work with SNMP timetick based ones, such as deltas for 'sysUpTime.0'
SNMP COLLECTOR FAILS TO LOAD MIB FILES FOR OIDS CONTAINING NUMBERS IN THEIR NAME IV45805 2013-12-23
errors like 'GYMDL30116W Empty or missing MIB file'generated by all dataloads within the environment. The source MIB was already recompiled .By recompiling a confReload pvmMibs was done automatically. But nevertheless the error still occours.
1.3.1.0-TIV-TNPM-IF0068
Conf reload not reloading dataloads IV45083
Problem Description : Conf Reload not Reloading the chnages TNPM Version : Tivoli Netcool Performance Manager 1.3.1 Operating System: Sun Solaris
Data loader is not able to poll on the updated metric, following the changes done to the existing one, inspite of running confReload as below, and collection still happens as per the old formula data.
confReload PVMFormulas confReload PVMRequests confReload PVMMibs
If updated formula changes are not getting into effect then pvmdmgr stop & pvmdmgr start would be done and from then on collection happens as per the changes done. However, restarting the pvmdmgr process on Live environment for every metric change is not suggestible becasue there can be cases where the metric should be polled from multiple loaders, and restart should be done from all the loaders which involves lot of risk. Instead, Conf Reload should do the refresh. Hence please look into this.
1.3.1.0-TIV-TNPM-IF0068 1.3.2.0-TIV-TNPM-IF0053
1.3.1.0-TIV-TNPM-IF0068 also fixes IV51038 -2013-12-23 -->
The IP Address caching mechanism is not thread safe. There is a race condition when refreshing the cache in which IP Address objects are being deleted before all references are relinquished.
The problem is with "IPAddress::getIPAddress"
Untested from here -->
snoop command
snoop -o /tmp/snoop4.cap -rvV 172.20.7.136 port 162
The above command captures in verbose mode all packets sent to/from 172.20.7.136 via port 162.
statget Commands
statGet -l stats -o Targets statGet -l stats -o Scheduler statGet -l stats -o Targets -i Total statGet -l stats -o Data Manager statGet -l stats -o Targets -c "SNMP Availability (%) last hour"
TNPM snpmGets
date -u; snmpGet 172.17.136.73 ifHCOutOctets.* -m rfc2233-IF-MIB.oid; date -u date -u; snmpGet 172.17.136.73 sysName.0; date -u
NET SNMP snmpgets
date -u; snmpget -v2c 172.17.136.73 ifHCOutOctets.13 -c HuhaFuDAt3AgU4A2; date -u date -u; snmpbulkget -v2c 172.17.136.73 ifHCOutOctets.* -c HuhaFuDAt3AgU4A2; date -u date -u; snmpbulkwalk -v2c 172.17.136.73 ifHCOutOctets.* -c HuhaFuDAt3AgU4A2; date -u
View Scheduler Load
source /opt/dataload/dataLoad.env; statGet -l stats -o Scheduler
Number of devices the DL needs to monitor
$ ../dataload/bin/statGet -l stats -o Targets -c "SNMP Availability (%) last hour" | grep -v ": -" |wc -l
Average SNMP Availability for all the devices over the last hour $ ../dataload/bin/statGet -l stats -o Targets -c "SNMP Availability (%) grep -v "-0 last hour"$" Targets:_Total:SNMP Availability (%) last hour:0.66079295
Important parameters for the DL Scheduler
$ ../dataload/bin/statGet -l stats -o Scheduler Scheduler::Collections Priority Mode (Nb) immediate:0 Scheduler::Execute external requests (Nb) immediate:1 Scheduler::Execute internal requests (Nb) immediate:1 Scheduler::Items Processed Average (Nb) last 24 hours: - Scheduler::Items Processed Average (Nb) last hour: - Scheduler::Items Processing Rate (Nb/s) last 24 hours: - Scheduler::Items Processing Rate (Nb/s) last hour: - Scheduler::Items Scheduled (Nb) immediate:53325 Scheduler::Max Items Scheduled (Nb) immediate:96000 Scheduler::Overflow Risk Ratio (%) last hour:2585.09977947 Scheduler::Queue Max Size (Nb) last 24 hours:53245 Scheduler::Queue Max Size (Nb) last hour:53244 Scheduler::Queue Size (Nb) immediate:53052 Scheduler::Threads (Nb) immediate:92 Scheduler::Threads Availability (%) last 24 hours:5.27955552 Scheduler::Threads Availability (%) last hour:3.82929909 Scheduler::internal errors (Nb) cumul:0
DLPERFSUMMARY Messages
- The SNMP Data Load component produces a "DLPERFSUMMARY" log message at the top of every hour that details the prior hour’s performance metrics. A typical example of these would be
1301994068 2011.04.05-09.01.08 SNMP.1.1-11888:0 I DL31050 DLPERFSUMMARY 2011.04.05-09.00.00 items: 1080, threadAvail: 100, overflowRisk: 1, expMeas: 12597, prodMeas: 7223, respTime: 156, snmpSuccess: 95, outPduDisc: 0, inPduDisc: 0, outPduTot: 2958, inPduTot: 2524, upTime: 1038061, quotaAvail: 2883463, memUsed: 110157824, subElmts: 251, metrics: 1483, requests: 405, dbAvail: 100
What each field can means can be found on the following table:
Field Name Description 1) Timestamp (UNIX format) 32-bit integer, number of seconds since UNIX epoch. 2) Timestamp (string) String representation of field 1. 3) Process Name, PID Indicates channel and collector number, and UNIX process id. 4) Message Severity F - fatal, typically an unrecoverable error E - error, unexpected condition affecting operation W - warning, error condition, may affect operation I - informational messages 1 - debug, some detail 2 - debug, more detail 3 - debug, most detail >3 - trace level, extremely detailed 5) Message Id A numeric message identifier. 6) Message Tag A string-based message identifier. 7) Timestamp (data hour) The data hour just completed, in local time. 8+9) items Number of periodic tasks the SNMP [[Data Load]] is performing. 10+11) threadAvail A percentage indicator reflecting the ability of the SNMP [[Data Load]] to meet the schedule of tasks. 100% means all tasks are completed on schedule. 12+13) overflowRisk A percentage indicator, averaged across all tasks, reflecting the likelihood that the SNMP [[Data Load]] may not meet scheduled task deadlines. 0% means no risk. 14+15) expMeas Number of measures (metrics) the SNMP [[Data Load]] expects it should produce for the data hour. 16+17) prodMeas Number of measures (metrics) the SNMP [[Data Load]] actually produced for the data hour. 18+19) respTime Average response time for all polled devices. 20+21) snmpSuccess Percentage reflecting the success of SNMP requests. 22+23) outPduDisc Number of PDUs sent for discovery. 24+25) inPduDisc Number of PDUs received for discovery. 26+27) outPduTot Total number of PDUs sent, collection and discovery. 28+29) inPduTot Total number of PDUs received, collection and discovery. 30+31) upTime Amount of time (seconds) the SNMP [[Data Load]] process has been running. 32+33) quotaAvail Amount of disk space (bytes) available within the configured quota. 34+35) memUsed Amount of physical memory (bytes) used by the SNMP [[Data Load]]. 36+37) subElmts Number of sub-elements loaded from database. 38+39) metrics Number of metrics (formula) loaded from database. 40+41) requests Number of requests loaded from database. 42+43) dbAvail Percentage reflecting the availability of the database.
- A six hour sample would like like the following
1167663669 2007.01.01-15.01.09 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-09.00.00 items: 36266, threadAvail: 100, overflowRisk: 11, expMeas: 1856318, prodMeas: 1845220, respTime: 505, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 202463, inPduTot: 200218, upTime: 1010860, quotaAvail: 976562, memUsed: 1405362176, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100 1167667201 2007.01.01-16.00.01 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-10.00.00 items: 36266, threadAvail: 100, overflowRisk: 16, expMeas: 1856316, prodMeas: 1836217, respTime: 392, snmpSuccess: 99, outPduDisc: 1275, inPduDisc: 1275, outPduTot: 202922, inPduTot: 200857, upTime: 1014390, quotaAvail: 976561, memUsed: 1403617280, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100 1167670863 2007.01.01-17.01.03 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-11.00.00 items: 36266, threadAvail: 100, overflowRisk: 9, expMeas: 1856302, prodMeas: 1837648, respTime: 514, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 201699, inPduTot: 199505, upTime: 1018054, quotaAvail: 976562, memUsed: 1405378560, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100 1167674469 2007.01.01-18.01.09 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-12.00.00 items: 36266, threadAvail: 100, overflowRisk: 9, expMeas: 1856312, prodMeas: 1841711, respTime: 489, snmpSuccess: 100, outPduDisc: 41015, inPduDisc: 40905, outPduTot: 242734, inPduTot: 240676, upTime: 1021660, quotaAvail: 976561, memUsed: 1405394944, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100 1167678066 2007.01.01-19.01.06 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-13.00.00 items: 36266, threadAvail: 100, overflowRisk: 14, expMeas: 1855805, prodMeas: 1832211, respTime: 440, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 200775, inPduTot: 198783, upTime: 1025257, quotaAvail: 976365, memUsed: 1405222912, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100 1167681664 2007.01.01-20.01.04 SNMP.5.223-20402:0 I [[DL31050]] DL''PERF''SUMMARY 2007.01.01-14.00.00 items: 36266, threadAvail: 100, overflowRisk: 11, expMeas: 1855798, prodMeas: 1836721, respTime: 528, snmpSuccess: 99, outPduDisc: 0, inPduDisc: 0, outPduTot: 201658, inPduTot: 199664, upTime: 1028855, quotaAvail: 976561, memUsed: 1405329408, subElmts: 41744, metrics: 1942, requests: 821, dbAvail: 100
- So we've got
- 1. At 16:00 (UTC) the inPduDisc and outPduDisc are non-zero, this indicates a network discovery took place during this hour. 1275 PDUs is a relatively small number for a discovery, so the device(s) discovered were either small, or not numerous. Note: on the PMG platform, this discovery was for a Juniper ERX device with approx 8000 resources.
2. At 18:01 (UTC) again the inPduDisc and outPduDisc are non-zero indicating another network discovery took place during this hour. This hour the discovery PDU count is much higher, over 40,000 indicating a much larger Inventory Profile. Note: on PMG this discovery was for a profile with 42 devices, each with 500 interfaces. 3. Also at 18:01 (UTC) we observe the outPduDisc and inPduDisc are not equal; this could indicate either some packet drop or device(s) not responding to SNMP requests. Note: on PMG this reflects Inventory Profile negotiating the SNMP community and SNMP version, trying V2c and V3. 4. Uptime was increasing across the sampled interval. This indicates that the SNMP Data Load did not fail and require a restart. 5. The ratio of Produced Measures to Expected Measures was (at least) 100%. 6. Thread Availability remained constant at 100% across the sampled interval. This indicates that the SNMP Data Load was able to meet the scheduling demand for collection. 7. Overflow Risk was in the 9% - 16% range. This is acceptable and does not indicate any great risk that the SNMP Data Load will fail to complete its task load. 8. Memory Usage was fairly constant. Additional memory is sometimes required by the SNMP Data Load during a network discovery. The SNMP Data Load dynamically allocates and frees memory for discovered objects. The constant values here indicate there is no memory management issues (i.e., memory leak). 9. Quota Available was fairly constant. Though not a direct indicator of any success/failure or application performance metric of the SNMP Data Load, it indicates that the rest of the Netcool/Proviso sub-channel (FTE, CME) are probably running properly, retrieving and processing the SNMP Data Load output files
DATAMANAGER.FC_QUOTA
$ ./statGet –l stats –o Data Manager ; The output should include the line : Data Manager::Filesystem quota (KB) immediate:1953125 Which is 2,000,000,000 /1024 =1953125 ; You can also check that value propagated properly and is listed in database using following SQL command : select * from reg$dcconfig where str''path ‘%FC''QUOTA%’ order by str_path; /contribs/dialogTest2 [[Conf Dump]] All | grep FC_QUO Run "statGet -o Data Manager -l stats |grep Filesystem|egrep '(quota)'" and multiply result by 1024