TNPM/KPI

From neil.tappsville.com
Jump to navigationJump to search

IMPLIED support in Proviso SNMP

The IMPLIED key is a kind of optimization method used in OBJECT-TYPE INDEX clause on mib table that does not specify the number of subidentifiers in the value of the last index. The effect of using IMPLIED key is removing the subidentifiers that indicates the length of an index string.

Example of IMPLIED key in a mib table :

#############################################

mplsLspInfoEntry OBJECT-TYPE
    SYNTAX     MplsLspInfoEntry
    MAX-ACCESS not-accessible
    STATUS     current
    DESCRIPTION
         "Entry containing information about a particular
         Label Switched Path."
    INDEX { IMPLIED mplsLspInfoName }
    ::= { mplsLspInfoList 1 }

############################################

Lets say we have two indexes which are variable length strings and row are indexed by :

mibTable.xColumn ("abc,defg")

Removing the subidentifiers will caused an issue in many case where it could not tell where the variable length strings begin and where it ends. With no IMPLIED key, the OID will come with a subidentifier that hint the length for both indexes where 'abc' has 3 characters and 'defg' has 4 characters as shown below.

mibTable.1.xColumn. 3.ASCIICodeOf(`a').ASCIICodeOf(`b').ASCIICodeOf(`c'). 4.ASCIICodeOf(`d').ASCIICodeOf(`e').ASCIICodeOf(`f').ASCIICodeOf(`g')

OIDs value example :

mibTable.1.xColumn.3.97.98.99.
                              4.100.101.102.103

Problem description The walk that returns the OIDs with no length if its IMPLIED is not supported in Proviso. The conversion of the variable indexes into readable string form failed because Proviso collector could not interpret the indexes properly due to the missing length that indicates the start and end of the variable strings.

Solution New formula type "ImpliedString" and formula function "asImpliedString()" are introduced to handle the IMPLIED key properly in Proviso.

An interim fix patch is required for maintenance to support the IMPLIED. It contains the fix that solves the string conversation issue and also implementing a formula type to handle the IMPLIED in Proviso so that the OIDs/instance values returned from the collector are consistent for both discovery and collection formula type.

Proviso 4433 Patch name: Netcool/Proviso 4.4.3.3-TIV-PROV-IF0096 Link : http://www-01.ibm.com/support/docview.wss?uid=swg24032228 TNPM 1.3.1 Patch name : Tivoli Netcool Performance Manager 1.3.1.0-TIV-TNPM-IF0035 Link : http://www-01.ibm.com/support/docview.wss?uid=swg24032054

Usage

ImpliedString New formula language type introduced specifically to handle IMPLIED OIDs value without prefixing the length. asImpliedString() SNMP formula function that translates the IMPLIED indexes into readable string form.

Usage in discovery formula

Discovery Formula (without conversion function):
################################################################
Dim I1 as ImpliedString;
V1=OIDVAL(sysName.0,once);
V2=OIDVAL(sysLocation.0,once);
V03=OIDVAL(mplsLspInfoOctets.%I1);
V3=OIDVAL(IndexAsValue(I1,%V03));
%V03 index "InstanceValue<%V3>||NULL||INULL||NULL"
################################################################

Output (note that the index as value in decimal-encoded form):

10.127.97.33 = InstanceValue<"80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48">||I1Value<80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48>||NULL||NULL:19
10.127.97.33 = InstanceValue<"80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48.98">||I1Value<80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48.98>||NULL||NULL:8
10.127.97.33 = InstanceValue<"116.111.95.109.120.57.54.48">||NULL||NULL||NULL:60


Discovery Formula (with conversion function):

################################################################
Dim I1 as ImpliedString;
V1=OIDVAL(sysName.0,once);
V2=OIDVAL(sysLocation.0,once);
V03=OIDVAL(mplsLspInfoOctets.%I1);
V3=asImpliedString(IndexAsValue(I1,%V03));
%V03 index "InstanceValue<%V3>||I1Value<%I1>||NULL||NULL"
################################################################

Output (note the index as 'readable' values):

10.127.97.33 = InstanceValue<"P2MP_from_T1600_to_M320">||I1Value<80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48>||NULL||NULL:19
10.127.97.33 = InstanceValue<"P2MP_from_T1600_to_M320b">||I1Value<80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48.98>||NULL||NULL:8
10.127.97.33 = InstanceValue<"to_mx960">||I1Value<116.111.95.109.120.57.54.48>||NULL||NULL:60

The asImpliedString() function appends a prefix length to the IMPLIED indexes as a requirement during translation so that the collector could interpret the decimal-encoded form and then translate them into human readable string form. Example of OIDs value appended with a prefix length (8 is the prefix length of the index): 8.116.111.95.109.120.57.54.48 Example of translated string with asImpliedString function: "to_mx960"

The collector however doesn't append the prefix length when processing the OIDs instance value as variables are cast to the type ImpliedString. It will still reads the IMPLIED indexes(no prefix length) as instance as an agreement to support the new formula language type which is IMPLIED so that the discovery and collection instances are consistent in terms of type and could mapped each other when processing the BOF's data. Example of OIDs value returned: 116.111.95.109.120.57.54.48

Usage in collection formula

Collection Formula (with ImpliedString type):

#########################################################################
Dim I1 as ImpliedString;
delta(mplsLspInfoPathChanges.%I1)*distrib(delta(sysUpTime.0),"default:1");
#########################################################################

Output (note that the indexes return does not append the prefix length):

HOST = 10.127.97.33 INSTANCE = 80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48 VALUE = 0
HOST = 10.127.97.33 INSTANCE = 80.50.77.80.95.102.114.111.109.95.84.49.54.48.48.95.116.111.95.77.51.50.48.95.98.108.117.101 VALUE = 0
HOST = 10.127.97.33 INSTANCE = 116.111.95.109.120.57.54.48 VALUE = 0

Concerns

Dont mix the formula type "DisplayString" and "ImpliedString" when writing the discovery and collection formula to support IMPLIED. The Collector was encoding the instance values per formula type declared in the formula expression. Using DisplayString as the type was causing a length prefix to be added which did not agree with the IMPLIED index and might caused no data issue in BOF due to inconsistent instance value defined in discovery and collection formula. Using ImpliedString as type where instance values are decimal-encoded as they are provided (no length prefix is added).

Examples:

Formula (using DisplayString):

###########################################
Dim I1 as DisplayString Default * Name IDX;
Def UseQuotedStrings no;
V1 = OIDVAL(pvStrImpAFNValue.%I1);
V2 = IndexAsValue(I1,%V1);
%V2 index "index<%V2>value<%V1>";
###########################################
Output (a subset with Trace Level 3 enabled):
(Instance = <paris>, observe the '5.' prefix to the decimal-encoded string and no results)
...
Information:Executing: OIDVAL( pvStrImpAFNValue.%I1 ) ...
Debug:OIDVAL( ): list of all SNMP values returned ...
Debug:1.3.6.1.4.1.999999.5.8.2.1.2.5.112.97.114.105.115: UNK:
...
Information:Formula has generated 0 line(s) in 0 sec [Init 0 /Snmp 0 /Eval 0 /Store 0]

Formula (using ImpliedString)

###########################################
Dim I1 as ImpliedString Default * Name IDX;
Def UseQuotedStrings no;
V1 = OIDVAL(pvStrImpAFNValue.%I1);
V2 = asImpliedString(IndexAsValue(I1,%V1));
%V2 index "index<%V2>value<%V1>";
###########################################
Output (a subset with Trace Level 3 enabled):
(Instance = <paris>, observe no prefix to the decimal-encoded string, and one result line is computed)
...
Information:Executing: OIDVAL( pvStrImpAFNValue.%I1 ) ...
Debug:OIDVAL( ): list of all SNMP values returned ...
Debug:1.3.6.1.4.1.999999.5.8.2.1.2.112.97.114.105.115: INT: 2
...
Information:Formula has generated 1 line(s) in 0 sec [Init 1 /Snmp 0 /Eval 0 /Store 0]
10.127.77.197:index<paris>value<2>:paris
 


Device availabilty (1119)

How does metric "Device Availability (percent):

~AP~Specific~SNMP~Devices~Availability~Availability (percent)

gets computed?

Answer Metric "Device Availability (percent)", or:

resmgr -export fgp -colNames "dbIndex name npath frm.dbIndex frm.name frm.data" -filter "frm.dbIndex(11119)"

10521|_|Availability|_|~AP~Specific~SNMP~Devices~Availability|_|11119|_|Availability (percent)|_|DEF UseLib RFC1213Interface;
deviceAvailability(Percent)|_|

makes use of the "RFC1213Interface" library, which uses the following formula to compute the percentage, and it ill be calculated based on system uptime and number of polls missed, based on the following formula:

up time / ( uptime + unknown time) * nb polls * 100

We take these values from the following entries found in "YYYY.MM.DDSNMP.log", ie:

2011.10.09-14.15.14 SNMP.1.1-15088:3001 2 RFC1213LIB Device 'UPC-VKADO-X032' (Last) upTime(s) 259087.79 @(epoch) 1318146314 (Curr) upTime(s) 396.57 @(epoch) 1318169714 =>Reboot Detected !! (up) 397 + (unknown) 23003 ~= (elapsed) 23400

in this case we'd have:

397 / ( 397 + 23003) * (23400/900) * 100 = 44%

When the SNMP polls from the Collector doesn't return, due to either network problems or if the device is down, we should see the following entries in "YYYY.MM.DDSNMP.log":

1318147256 2011.10.09-08.00.56 SNMP.1.1-15088:3001 2 RFC1213LIB Device 'xxx-xxxxx-xxx' unreachable.

There can be two scenarios:

If the delta is positive, as in the same as the sum of poll intervals, means the device was up, despite not being reachable

We can see the sequence of "0" values in the database by running SQL*PLUS (as 'pv_admin) ie:

select to_char(PVM_TIME.SECOND2DATE(dte_date),'YYYY-MM-DD HH24:MI:SS'), mf.* from PV_METRIC.MTRC00_NRAW_000_FULL mf where idx_resource = 200004060 and idx_metric = 11119 and dte_date > 1309478400 and dte_date < 1312442114;

2011-09-18 00:30:14      11119    200004060 1316305814        100
2011-09-18 00:45:14      11119    200004060 1316306714        100
2011-09-18 01:00:56      11119    200004060 1316307656          0
2011-09-18 01:15:56      11119    200004060 1316308556          0
2011-09-18 01:30:56      11119    200004060 1316309456          0
2011-09-18 01:45:56      11119    200004060 1316310356          0
2011-09-18 02:00:56      11119    200004060 1316311256          0
2011-09-18 02:15:56      11119    200004060 1316312156          0
2011-09-18 02:30:56      11119    200004060 1316313056          0
2011-09-18 02:45:56      11119    200004060 1316313956          0
2011-09-18 03:00:56      11119    200004060 1316314856          0
2011-09-18 03:15:56      11119    200004060 1316315756          0
2011-09-18 03:30:56      11119    200004060 1316316656          0
2011-09-18 03:45:56      11119    200004060 1316317556          0
2011-09-18 04:00:14      11119    200004060 1316318414       1300

According to the sequence above, there were 12 'unknown' SNMP poll results, and then the collector finally obtained a response on the 13th poll, checked the sysUptime, and confirmed that the delta of system uptime is same as the sum of poll intervals, therefore the device was actually up and running when those polls were missed, and each of them should have been a "100", as opposed to "0", so when it confirms the device was up on poll 13th, it muliplies "100" by "13", resulting in:

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13

100 0 0 0 0 0 0 0 0 0 0 0 1300

(100+1300)/13 = 107.6% Availability

When the '(Curr) uptime' is shorter than the '(Last) upTime(s)', it means the resulting delta is negative, and therefore the device was rebooted, then the Availability would be calculated as follows:

If the current system uptime is equal to sum of two poll intervals then the Availability returns 200.


T1 T2 T3 T4 T5 T6 T7 T8

100 0 0 0 0 0 0 200

(100 + 200)/8 = 37.5% Availability




How does TNPM Wireline or Proviso calculate percentile?Technote (FAQ)

Question I need more details on the percentile calculation method in TNPM Wireline

Answer 1. The standard way of calculating N-th percentile is: Sort the data set by value from highest to lowest, discard the highest (100-N) % of the sorted samples, and the next highest sample becomes the N-th percentile value for the data set

2. To compute percentiles, TNPM Wireline and Proviso use an approximation algorithm. That algorithm was chosen for performance reasons. But as for any approximation, it has a few limitations.

3. Following is a short explanation on the TNPM Wireline and Proviso implementation of the Percentile. TNPM and Proviso support percentile computation, available for real-time threshold detection and report display. This statistic is available for desired metrics for all report time periods (daily, weekly, monthly, quarterly, and yearly and 14 months). As the percentile computation requires a very resource intensive computation, the implementation relies on approximations. - Daily Percentile relies on the approximation algorithm. - Weekly Percentile relies on the approximation algorithm. - Monthly Percentile relies on the approximation algorithm. - Quarterly Percentile relies on the average of the 3 monthly percentiles - Yearly Percentile relies on the average of the 12 monthly percentiles - 14 months Percentile relies on the average of the 14 monthly percentiles

Basically, this method allows TNPM Wireline and PROVISO to compute an approximation without processing the entire data set after it is received (e.g. at the end of the month), but by processing the data set on the fly, as data flows through the DataChannel (e.g. during the course of the month). This results in two important benefits: - Huge resource savings (CPU, memory). - The percentile value is always available (even on unfinished periods)