Using G2 custom monitors, you can build and manage independent monitoring solutions without relying on the platform. Remote Script Executor executes shell, PowerShell, and Python scripts on Linux and Windows workstations.

Agent has following 2 different adapter frameworks to develop custom monitors

  • PowerShell

    With this adapter framework, building separate scripts for each metric is not recommended as it is less efficient. If the number of metrics is high, it can consume more system resources, such as processes and CPU, on the end device.

  • Remote Script Executor (RSE)

    With this adapter framework, you can build a single script to handle one or more metrics, which is recommended.

    In the following sections, we will focus on the Remote Script Executor (RSE), as it is our recommended approach for custom monitor development.

    RSE

Key features of RSE framework

  • User-defined external arguments support the script through custom attributes.
  • Monitor level script execution: Ability to pull more metrics in a single script execution.
  • Ability to execute different types of scripts through custom script options apart from default options.
  • Ability to use credentials attached to the device in the script using the dynamic macro support.

How to develop a RSE script

You can develop agent-based RSE scripts for both Windows and Linux environments. Here is the supported scripting languages list:

LanguageWindowsLinux
PythonYesYes
PowerShellYesNo
CustomYesYes
Bash/shellNoYes

Custom script type

Other than default script types (Python, Shell, PowerShell), it used to develop scripts in various other scripting languages. When ‘Custom’ option is selected, the script execution path is mandatory. You have to add the configuration parameter “custom.script.execution.path” in monitor creation page and provide the respective execution path specific to remote device Operating System.

In general, Script development has 3 major blocks:

  1. How to get input arguments - This is optional

    Few monitoring scripts need some additional inputs from your side to evaluate final metric data.

    Example: Services monitoring (Or) Process monitoring.

    Server can have 100s of processes and services. So, it is not recommended to monitor all of those services or processes because it will lead to alert noise, and also will increase unnecessary load on the end device. It is recommended to monitor the services or processes you are interested in. For this purpose, the script will expect service names / process names as input from you.

    You can get user inputs into the script using the below macro:

    NameValue
    Macro${custom.script.arguments}
    Return TypeString
    $customArgs = ${custom.script.arguments}
    #write-host "input custom script args : "$customArgs
  2. How to get multiple input arguments

    See multiple custom script arguments in macro for more information.

    For few scripts, additional inputs are not required.

    Example: CPU utilization (Or) System Uptime

    Let us assume you want to monitor CPU utilization in Windows/Linux servers. In this case, it does not require any additional inputs from you to evaluate CPU utilization. Script can evaluate CPU utilization by querying some WMI classes / CLI commands / by reading some system files.

    If you have this ${custom.script.arguments} requirement in the script then you need to add this configuration parameter in the Monitors creation page as shown below. Metrics, Monitors, and Template creation will be explained in the sections below.

    Create monitor

    While assigning the template, you will be asked to provide these user inputs as shown below and these input values will replace ${custom.script.arguments} macro in the script:

    Apply Templates
    1. Actual logic to evaluate metric data

    Need to develop a block of code to get all required data for metric evaluation.

    Example: If you want to get Windows services then you can get it using WMI classes.

       Get-WmiObject win32_service -namespace root\cimv2 -ErrorAction Stop | where-object{$_.Name -eq $serviceName} | select Name, DisplayName, State
       

    In other cases, you can get data by executing CLI commands / reading kernel level files / connecting to database / etc.

To get credentials into RSE script:

In few cases, if you want to monitor databases from RSE script then you need to connect to database and run some queries. You can get database credentials using below macros:

macro syntax: ${credentials.CredentialSetName.credentialField}

Example: Let us assume you defined a Credential set of Name PostgresDB_Credentials and assigned to the database resource. To use this credential in script you have to define macro as shown below:

user=${credentials.<b>PostgresDB_Credentials</b>.username}

pass=${credentials.<b>PostgresDB_Credentials</b>.password}

port=${credentials.<b>PostgresDB_Credentials</b>.port}

Valid credentialFields to use are:

  • username
  • password
  • domain
  • port
  • timeout

You can also get few device attributes using below macros:

MacroDescription
${resource.hostname}HostName of the device
${resource.ipaddress}Ipaddress of the device
${resource.mac}Mac address of the device
${resource.make}Make of the device
${resource.model}Model of the device.
${resource.os}Operating System of the device
${resource.serialno}Serial Number of the device
${resource.uniqueid}UniqueId of the device

NOTE: Please take care of exception handling because for any script errors this RSE framework will generate an alert on rse.invalid.json.error / rse.script.error metrics.

Below are sample InvalidJsonException alerts:

Alert sample
Alert sample
  1. Prepare final RSE supported JSON output

Once you have metric data, you need to prepare final JSON output in any of the below formats based on your requirement.

Reference Link for valid JSON output formats.

Sample Powershell script for Windows service monitoring

# Metric name
$metricName = "system.windows.service.status"
 
##Parsing user input
#
#Syntax : service name1,service name2
#Sample input: opsramp-agent,opsramp-shield
$customArgs = ${custom.script.arguments}
 
## Additional checks to vaidate user input and avoid further script execution
If (($([Int]$customArgs.Length) -eq 0) -or ($([String]$customArgs) -eq "Custom Script Arguments") -or ($($([String]$customArgs).Trim()) -eq ""))
{
    write-host "Exception - G2_Windows_Service_Monitor : Provided Custom.Script.Arguments are invalid or empty, So script will exit automatically"
    Exit 0
}
Else
{
    If($([String]$customArgs).contains(","))
    {
        $serviceNames = $([String]$customArgs) -split ","
    }
    Else
    {
        $serviceNames = @($([String]$customArgs))
    }
}
 
## Exit, if there are no services provided
If($([Int]$serviceNames.length) -eq 0)
{
    write-host "Exception - G2_Windows_Service_Monitor : We don't have any service name(s) from input parameters, This script will exit automatically"
    Exit 1
}
 
Function Normalize()
{
param([String]$str)
    $str = $str.Trim()
    $str = $str.Replace('"', '\"')
    $str = $str.Replace("\", "\\")
    Return $str
}
 
 
Try
{
    $serviceNameStr = ""
    ForEach($serviceName in $serviceNames)
    {
        #write-host "Service name :"$serviceName
        $metricValue = 1
        If($([String]$serviceName) -ne "")
        {
            $serviceColItems = Get-WmiObject win32_service -namespace root\cimv2 -ErrorAction Stop | where-object{$_.Name -eq $serviceName} | select Name, DisplayName, State
            #$serviceColItems
            ForEach($serviceColItem in $serviceColItems)
            {
                $componentName = Normalize "$([String]$serviceColItem.DisplayName)"
                $state = $([String]$serviceColItem.State).Tolower()
                If($state.contains("stopped"))
                {
                    $metricValue = 0
                }
                 
                $serviceNameStr += """$componentName""" + ":" + "$metricValue" + ","
            }
        }
    }
}
catch
{
    If(($([String]$_.Exception.Message).contains("Invalid class")) -or ($([String]$_.Exception.Message).contains("Invalid namespace")))
    {
        write-host "Exception - G2_Windows_Service_Monitor :"$_.Exception.Message
        Exit 2
    }
}
 
 
If($([Int]$serviceNameStr.Length) -gt 0)
{
    $serviceNameStr = $serviceNameStr.Substring(0, $serviceNameStr.Length - 1)
    $serviceNameStr = $serviceNameStr.TrimEnd(',')
    $JSONPayload =  """$metricName"":{ ""components"": { $serviceNameStr }}"
 
    ##### Printing the output in JSON format to console #####
    Write-Host  "{"$JSONPayload"}"
}
Else{Exit 2}

Output:

{ "system.windows.service.status":{ "components": { "opsramp-shield":1,"opsramp-agent":1 }}}

Sample bash script to monitor CPU, Memory and DISK:

#!/bin/bash
 
CPU=$(top -bn1 | grep load | awk '{printf "%.2f\t\t\n", $(NF-2)}')
 
MEMORY=$(free -m | awk 'NR==2{printf "%.2f\t\t", $3*100/$2 }')
 
DISK=`df -h | awk '$NF=="/"{ print $5}' | sed 's/%//g'`
 
# You can use the macro's as below in your script.
 
# Please refer to documentation for all the supported macro's.
 
# Pre-defined macro's.
 
# var1 = ${resource.ipaddress}
 
# var2 = ${resource.serialno}
 
# Accessing custom Attributes attached to the device.
 
# var3 = ${customattributes.customAttributekey}
 
# Accessing credentials attached to the device.
 
# var4 = ${credentials.CredentialName.credentialField}
 
printf "{\"disk.utilization\" : %s , \"memory.utilization\" : %s , \"cpu.usage\" : %s}" "$DISK" "$MEMORY" "$CPU"

Output:

{"disk.utilization" : 23 , "memory.utilization" : 24.28 , "cpu.usage" : 0.64}

Constraint:

  • Windows workstation monitoring using a Linux Gateway is not supported.

  • If the end device language is other than English, ensure that the resulting JSON output from the script contains valid English characters, converting any invalid characters as necessary.

    Example:

    For the Monitor ‘Agent G2 - Microsoft AD Global Catalog Search Time’, the resulting output is:

        {'microsoft_AD_global_catalog_searchTime" :{"Components":
    
        {121.0.0.1":"13,64"}        
        }}
        

    In this example, we received ‘13,64’ instead of ‘13.64’ due to the end device language being Finnish. To convert this to English, we made the following change in the script:

    $formattedTime = "{0:N2}" -f $bindTimeMilliseconds.ToString("N2",[System.Globalization.CultureInfo]::InvariantCulture)

    After this change, we obtained the valid JSON output:

        {'microsoft_AD_global_catalog_searchTime" :{"Components":
    
        {121.0.0.1":"13.64"}
        }}"
        

Best practices:

  1. Maintain indentation in the script for readability and for quick and better understanding.

  2. In case of Powershell script do not use the below statement as it will suppress runtime exceptions and will continue script execution. So we cannot find any exceptions in the agent log even though there is an exception with some script logic.

ErrorActionPreference = "silentlycontinue"

  1. Define Try-Catch blocks wherever required to avoid exceptions.

  2. If there is a requirement to maintain some data for current and previous polls then below is the recommended path to maintain previous poll data. Create a text file and maintain all required data:

C:\Program Files (x86)\OpsRamp\Agent\log\prevdata

  1. Add proper comments for each block of code. At least a single line, which can explain the high-level logic.

  2. In case of Powershell, Use -Compress parameter while using ConvertTo-Json cmdlet. This will omit the indentation and white spaces. If there are any buffer limitations on the agent side then we can avoid those issues.

Example: $hash | ConvertTo-Json -compress

  1. Define meaningful variable names, so that they are quickly understood what those variables are carrying. Do not create single character as a variable name (like $l, $p). It is not a good practice.

$userName – Recommended

$u – Not Recommended

  1. Escape special characters in metric component names to avoid invalid JSON exceptions.

Example: Escape " with "

In case of Powershell, go through the below link and escape those characters while preparing the JSON:

https://docs.microsoft.com/en-us/sql/relational-databases/json/how-for-json-escapes-special-characters-and-control-characters-sql-server?view=sql-server-ver15

Step 1: Create Metrics

Define metrics metadata that correlates with your script output and matches the monitor you define in the next step.

  1. Select a client from the All Clients list.
  2. Navigate to Setup > Monitoring > Metrics.
  3. From METRICS, click + Add.
    Metrics listing page
  4. From Create Metric, enter the values for the fields described in the following tables and click Save.

Metric specification

FieldDescription
Metric Scope(required) Select either 'Service Provider Metric' or 'Partner or Client Metric'. Based on your access level and role, you might see this dropdown menu slightly different. If you have chosen Partner or Client Metric you will be prompted to choose a Partner/Client from contextual dropdown which is dynamically populated.
  • Partner Metric Metric has partner scope.
  • Client Specific Metric Metric has client scope.
Adapter Type(required) Application. Select from the supported agent or gateway adapters.
Application TypeSelect "Remote Script Executor" from this dropdown.
Name(required) Unique metric name. The recommended naming convention is: ___. For example, apache_tomcat_webapps_count.
Tag NameIt will be filled automatically with the same metric name.
Display Name(required) Name to display, such as System Drive Free Space.
DescriptionProvide an elaborate description about this metric.
Data Point type(required) Type of data point specification:
  • Counter Delta - It calculates delta on top of metric value.
    Agent & Gateway RSE logic: If the result is less than zero then it returns zero

    Counter Delta = (Current poll value - Prev poll value)
  • Counter Rate - It calculates rate on top of metric value.
    Agent & Gateway RSE logic: If the result is less than zero then it returns zero.

    Counter Rate = (Current poll value - Prev poll value) / (Current poll time - Prev poll time)
  • Derive Delta - Not related to RSE. No support in both agent and gateway.
  • Derive Rate - Not related to RSE. No support in both agent and gateway
  • Gauge - It returns a direct metric value, which is returning from the script.
  • Rate - It calculates rate on top of metric value.
    Agent & Gateway RSE logic: If the result is less than zero then it returns a negative value.

    Rate = (Current poll value - Prev poll value) / (Current poll time - Prev poll time)
  • Delta - It calculates delta on top of metric value.
    Agent & Gateway RSE logic: If the result is less than zero then it returns negative value. If the result is less than zero then it returns a negative value.

    Delta = (Current poll value - Prev poll value)
  • None - Same as Gauge
UnitsMetric unit specification associated with the data point type. Click the drop-down menu to select from the available units. As this is service status monitoring, it does not require any units. So select None.
Unit Multiplication FactorFactor to multiply the data point value by. As this is status monitor, it does not require this factor value. So enter default value as 1.0.
Datapoint value conversionDatapoint value conversion specification:
  • Value Choose this option when no conversion required on the metric value. This is the default value for "Datapoint value conversion" dropdown.
  • Enumerated Map Map the datapoint to a state-description pair. Mostly for health status related metrics, some times script will return integer numbers and each number represents some health status.
    Example:
    1 - Running
    0 - Stopped
    • State Descriptions
      Click the plus icon to add state-description pairs for each datapoint value:
      • State State represented by the value.
      • Description Description associated with the state.
    • Use formatted value in: Render the value in an Alert or Graph.
Metric ProcessingMetric processing specification:
  • Graph
  • Notification
  • Graph and Notification
  • None
  1. Graph
  2. Choose this option when user only needs a Graph for the metric without an alert.
  3. Notification
  4. Choose this option when user only needs an Alert for the metric without a graph.
    When selecting this option, a dynamic UI will appear to configure alert thresholds, as shown in the screenshot below:
    Remote Script Executor Create Metric
    Configuring Alert Thresholds:
    When setting up notifications, the following options are available for configuring alert thresholds:
    Case 1: Enumerated Map
    • If "Datapoint Value Conversion" is set to "Enumerated Map," users need to specify the oState column values for Warning and Critical thresholds.
      • Warning Threshold: State [Specify State column Value]
      • Critical Threshold: State [Specify State column Value]

    Setting Alert thresholds for Enumerated Map conversion from strings to integers:
    Remote Script Executor Create Metric
    Setting Alert thresholds for Enumerated Map conversion from integers to strings:
    Remote Script Executor Create Metric
    Case 2: Value
    • If "Datapoint Value Conversion" is set to "Value" (the default setting), users need to specify the metric values for Warning and Critical thresholds.
      • Warning Threshold: Value [Specify Warning-level Threshold Value for the metric]
      • Critical Threshold: Value [Specify Critical-level Threshold Value for the metric]
    Additional Fields:
    • Subject: The metric subject is populated by default but can be customized.
    • Description: The metric description is populated by default but can be customized.
  5. Graph and Notification:
  6. Choose this option when user needs both Alert and Graph for the metric.
    When selecting this option, a dynamic UI will appear to configure alert thresholds, as shown in the screenshot below:
    Remote Script Executor Create Metric
    Configuring Alert Thresholds:
    The alert threshold configuration options are the same as the Notification option.
  7. None: Select this option when both alert and graph are not needed for the metric.

See the Metric token reference for a list of the tokens that is used in the Subject and Description fields.

Step 2: Create a Monitor

A custom Remote Script Executor monitor is a collection of Remote Script Executor metrics. You can create a template based on the Remote Script Executor monitor.

Macros are used to pass dynamic arguments to scripts. Use the static and dynamic macros listed in the following tables to make native and custom attributes available to the monitor.

You must create metrics in Step 1: Create metrics before creating a monitor.

  1. Select a client from the All Clients list.
  2. Navigate to Setup > Monitoring > Monitors.
  3. Click + Add.
  4. On the CREATE A MONITOR screen, enter the information in the fields as listed in the following table and click Save. The newly created monitor is added to the list of monitors.

Monitor specification

MethodDescription
Monitor Scope(required) Select either 'Service Provider Monitor' or 'Partner or Client Monitor'. Based on your access level and role, you might see this dropdown menu slightly different. If you have chosen to create the script at Partner or Client level you will be prompted to choose a Partner/Client from contextual dropdown which is dynamically populated.
  • Partner Monitor Monitor has partner scope.
  • Client Specific Monitor Monitor has client scope.
Adapter Type(required) Application. Select from the supported agent or gateway adapters.
Application TypeSelect "Remote Script Executor" from this dropdown.
Name(required) Unique monitor name.
Description(required) Description of the monitor.
Script(required) Copy and Paste the script, which you developed in the previous section.
Metrics(required) Click +Add to add metrics. Search for the previously defined metrics and click Add Metrics.
(NOTE: Make sure you select the right metrics, which have code support in the above script.)
Configuration ParametersClick +Add to add configuration parameters. Search for the parameter and click Add to add it to the monitor:
  • Using available configuration parameters, you can specify the scripting language type and the platform to execute the script.
  • If you select the Custom value in configuration parameters, add the parameter: `custom.script.execution.path`.
By default, you will see below 4 configuration parameters:
  • collector.application.type – Proceed with the default values, that is, RSE.
  • connection.timeout.ms – Proceed with the default value, that is, 15000 ms. (If required, you can increase it but it should be within the monitor frequency/poll time.)
  • remote.server.operatingsystem – Select target / end device operating system.
  • remote.server.scripttype – It supports the below scripting languages, so choose the right script type.
Script typeSupported OS
PythonWindows & Linux
PowerShellWindows
CustomOther than default script types(Python, Shell, PowerShell.)
It is used to develop scripts in various other scripting languages. When Custom option is selected, the script execution path is mandatory. You have to add the configuration parameter custom.script.execution.path and give the respective execution path specific to remote device Operating System.
BashLinux & Unix

You can also add 4 additional configuration parameters based on your requirement:

  • execution.timeout.ms – If script is taking more than default value (i.e 20000 ms) for execution then you can increase upto required value (this option is available from Agent v10)
  • custom.script.execution.path – If script type is Custom (other than Python, Shell, PowerShell) then need to provide complete path of script executable.
  • custom.script.arguments – Add this config parameter if script required any additional input arguments.
    Limitation: You can add only one custom script argument per monitor and need to provide complete input in a single text box.
  • application.component.name – No support in both agent and gateway.

Static macros

Use static macros to override the resource values. See the Static macro reference for a list of static macros.

Dynamic macros

MacroDescription
${customattributes.serviceName}Get custom attributes of the device - If you want to use an argument in any script, apply the custom attributes on the device.

For an example, you have a custom attribute on the device with `Key: serviceName` and `Value: oracledb`. During runtime, the Value: oracledb replaces the macro: `${customattributes.serviceName}` in the script.
${credentials.CredentialName.credentialField}Get the credentials added to the device - You can use (macros) credentials in the script to avoid storing the original username and password in plain text. When you run the script, the macros replace the original credentials.

For an example, if you define a credential set with a name JMXCred and added it to a device. You can use the macro ${credentials.JMXCred.username} in your script and macro will replace the original credentials in your script at runtime.
${credential.type.all}Use this macro to get all credential sets (assigned on the device) into the script.
${credential.type.name}Use this MACRO( ${credential.type.name}) to get specific credentials in the script.

For an example, If the device has SSH, WMI and Database credentials and if user want to get only database credentails inside the script then need to use ${credential.type.Database} inside the script. Similarly user can get any type of credentails into the script by replacing .name with the credentails type.
Example: ${credential.type.SSH}, ${credential.type.SNMP}, ${credential.type.VMWARE} etc.

Below are the supported credential types:

Credential TypesMarcosSample Output
VMWARE${credential.type.VMWARE}
[{
"uuid":"xB35mBgN354UGM3zQFZBQSaJ",
"type":"VMWARE",
"name":"sample-vmware",
"timeoutMs":10000,
"transportType":"HTTPS",
"appName":null,
"domain":null,
"username":"testuser",
"password":"xxxxxx",
"port":443
}]
SSH${credential.type.SSH}
[{
"uuid": "NwKGzg6qqSF29Sdy5hRJrAyH",
"type": "SSH",
"name": "sample-ssh",
"timeoutMs": 10000,
"transportType": "HTTP",
"appName": null,
"userName": "testuser",
"password": "xxxxxx",
"port": 22,
"privkey": null,
"passPhrasePasswd": null,
"privKeyFileName": null,
"sshAuthType": "PASSWORD"
    }]

The following are the remaining credential types:
SNMP, XEN, WINDOWS, JMX, HTTP, Database, CIM, NETAPP, NETAPPCLUSTER, HYPERFLEX, PURESTORAGE, FTP, CISCOUCS, EMCCLARIION, EMCVNX, EMCVNXE, EMCVMAX, IBM, HPEVA, REMOTE_CLI, TELNET, XTREMIO, VIPTELA, EMCVPLEX, EMCRPA, NUTANIX, HITACHIVSP, AZURESTACK, APPLICATION, and CITRIX_CVDA.

Script execution path for configuration parameters in Linux and Windows

The following runtime executables should be available in your Path variable for the corresponding operating system.

Script TypeLinuxWindows
BashbashNot Applicable
ShellshNot Applicable
PowershellNot Applicablepowershell.exe
Pythonpythonpython.exe
PerlPerlNot Applicable

If your runtime is not set as an environment variable, provide the runtime absolute path in custom.script.execution.path. For example, if python is not set, set custom.script.execution.path to /usr/lib/python.

Step 3: Create a Template

A template is an instance of a monitor and is applied to devices.

  1. Select a client from the All Clients list.
  2. Navigate to Setup > Monitoring >Templates.
  3. From Templates, click + Add.
  4. From MONITOR TEMPLATE screen, provide details for the following parameters and click Save.
MethodDescription
Select Template Scope(required) Select either 'Service Provider Template' or 'Partner or Client Templates'. Based on your access level and role, you might see this dropdown menu slightly different. If you have chosen to create Partner or Client level you will be prompted to choose a Partner/Client from contextual dropdown which is dynamically populated.
Collector TypeSelect Agent or Gateway based on your requirement.
Monitor TypeFor Gateway, select Monitors radio button and for Agent select G2 Monitors.
Applicable forSelect Device.
Template NameName of the template.
DescriptionProvide an elaborate description about this template.
GenerationGeneration that the template belongs to
TagsUser-defined tags for filtering
PrerequisitesEssential things to consider when using this template
StatusActive or end-of-life template
NotesInformation that you want to add to the template
Template Family NameCategory that applies to the application, such as Windows Server, Storage Server, or Network Server
Deployment TypeOne of the following methods to use to apply the template to the resource:
  • Custom
  • Optional
  • Standard

After adding the template, add component thresholds and component filters by editing metric values. For more information, see Add Filter, Component Filters and Define Threshold

Step 4: Assign a Template

  1. Click Infrastructure > Resources.
  2. From the Resources tab, select the required resource from the list of resources. Or, use the search option to find the resource.
  3. Click the resource name to view details.
  4. From the left pane, click Monitors.
  5. From the Templates tab, click +Assign Templates.
  6. From Apply Templates, select the templates.
    The selected Templates section displays the chosen templates.
  7. Click Assign. The template gets assigned to the selected device.

After assigning the template to a resource for monitoring, click Get Latest Metric Values to view the latest metric information.

Step 5: View Graphs

The Agent monitors the system using the assigned templates and displays the results in a graphical format.

  1. From the left pane, click Infrastructure.
  2. From the Resources tab, select the required resource from the list of resources. Or, use the search option to find the resource.
  3. Click the resource name to view details.
  4. From the left pane, click Metrics. The Metrics page displays graphs generated by all monitoring templates assigned to a device.
  5. Search with the template name to filter the graphs.

Scripts Folder and Permissions

By default, all the Remote Script Executor (RSE) custom monitor scripts are downloaded to a predefined location in your system. The system user (Windows) and Root/Non-Root User (Linux) hold all the required permissions (Read, Write, and Execute) in the downloaded folders.

The following table provides the default folder locations and users for Windows and Linux:

Operating SystemDefault Folder LocationDefault User
Windows
  • 32 bit – %programfiles%\OpsRamp\Agent\plugins\rse
  • 64 bit – %programfiles(x86)%\OpsRamp\Agent\plugins\rse
System User
Linux/opt/opsramp/agent/plugins/rseRoot/Non-Root User
  • Default permission set for all executable files in this folder is 744.
  • Default permission set for other files in this folder is 644.

Script File Integrity Check

OpsRamp will verify for the custom script for file integrity and ensure that the file checksum remains the same before performing the monitor script execution. If the G2 based custom monitor script on the resource is changed, OpsRamp will re-download the original monitor script from the source for security reasons and continue the monitoring.

During this process, a critical alert about the “File checksum mismatch” will be generated on the resource, and users will be notified about the script modification on the resource.

Remote Script Executor example

Script

#!/bin/bash

CPU=$(top -bn1 | grep load | awk ‘{printf "%.2f\t\t\n", $(NF-2)}’)
MEMORY=$(free -m | awk ‘NR==2{printf "%.2f\t\t", $3*100/$2 }’)
DISK=`df -h | awk ‘$NF=="/"{ print $5}’ | sed ‘s/%//g’`

printf "{\"disk.utilization\" : %s , \"memory.utilization\" : %s , \"cpu.usage\" : %s}" "$DISK" "$MEMORY" "$CPU"

Output :

{"disk.utilization" : 23 , "memory.utilization" : 24.28 , "cpu.usage" : 0.64 }

Standard JSON output format

We recommend to adjust the script output to match one of the following JSON formats:

Include the scriptExceptions payload in the final JSON output is optional and should only be done if the script returns any exceptions. If no exceptions occur, the scriptExceptions payload should not be included in the final JSON output. The scriptExceptions payload can be used with any of the following formats. Below, we provide examples for a few of these formats. For more details on how to use scriptExceptions in RSE scripts, refer to the documentation.

Format 1:

Description:

This format is used when the metric does not have any dynamic components or instances. In such cases, construct the JSON by using the metric name as the key and mapping that key to the appropriate value.

Format for the JSON output when there are no exceptions:

{
  "Metric1": 98,
  "Metric2": 70,
  "Metric3": 80
}

Or

Format for the JSON output when exceptions occur:

{

  "Metric1": 98,
  "Metric2": 70,
  "Metric3": 80,
  "scriptExceptions": {
    "subject": "No monitoring data / Unable to fetch monitoring data / Incomplete script execution",
    "description": "Failed to collect data for following metrics. \n metricName: <Metric Name 1>, FailureReason: Failure Reason 1 \n metricName: <Metric Name 2>, FailureReason: Failure Reason 2",
    "raiseAlert": true,
    "logRequired": true,
    "alertState": "warning"
  }
}

Example:

{
  "system_windows_uptime_inMinutes": 120,
  "system_windows_overallCPU_utilization": 50,
}

Format 2:

Description: This format is used, when one metric returns a string and another metric returns a number. In such cases, a JSON output is prepared by mapping those strings or metrics to their appropriate values.

{
"Metric1": 98,
"Metric2": "STATE",
"Metric3": 80
}

Example:

{

   "system_windows_memory_usage": 35.75,
   "system_windows_winrm_status": "running"
}

Note: We recommend using numerical representation for String outputs by utilising Enumerated Map option of RSE.

Format 3:

Description: This format may be used, when the metric contains multiple components. In the below example, the data is fetched for multiple components of disks.

Format for the JSON output when there are no exceptions:

{
  "MetricName1": {
    "components": {
      "component1": 70,
      "component2": 98
    }
  },
  "MetricName2": {
    "components": {
      "component1": 77,
      "component2": 98
    }
  }
}

Or

Format for the JSON output when exceptions occur:

{
  "MetricName1": {
    "components": {
      "component1": 70,
      "component2": 98
    }
  },
  "MetricName2": {
    "components": {
      "component1": 77,
      "component2": 98
    }
  },

"scriptExceptions": {
    "subject": "No monitoring data / Unable to fetch monitoring data / Incomplete script execution",
    "description": "Failed to collect data for following metrics. \n metricName: <Metric Name 1>, FailureReason: Failure Reason 1 \n metricName: <Metric Name 2>, FailureReason: Failure Reason 2",
    "raiseAlert": true,
    "logRequired": true,
    "alertState": "warning"
  }
}

Example:

{
  "System_Windows_PhysicalDisk_WriteBytes_PerSec": {
    "components": {
      "physicaldisk_1": 0,
      "physicaldisk_2": 0
    }
  },
  "System_Windows_PhysicalDisk_AvgDisk_SecPerWrite": {
    "components": {
      "physicaldisk_1": 0,
      "physicaldisk_2": 0
    }
  }
}

Explanation:

In this example, System_Windows_PhysicalDisk_WriteBytes_PerSec is a metric name that has two components, physicaldisk_1 and physicaldisk_2, with their values mapped inside the components key.

Similarly, System_Windows_PhysicalDisk_AvgDisk_SecPerWrite is another metric name organized in the same way.

Format 4:

Description: This format is used when user needs metric-level alertTokens for all metrics of the script. Alert tokens are used to specify more information about the metrics in alert subject or in alert description or in both.

Format for the JSON output when there are no exceptions:

{
  "MetricName1": {
    "components": {
      "component1": "STATE",
      "component2": 98
    },
    "alertTokens": {
      "token1": "value",
      "token2": "value2"
    }
  },
  "MetricName2": {
    "components": {
      "component1": 77,
      "component2": 98
    },
    "alertTokens": {
      "token1": "value",
      "token2": "value2"
    }
  }
}

Or

Format for the JSON output when exceptions occur:

{
  "MetricName1": {
    "components": {
      "component1": "STATE",
      "component2": 98
    },
    "alertTokens": {
      "token1": "value",
      "token2": "value2"
    }
  },
  "MetricName2": {
    "components": {
      "component1": 77,
      "component2": 98
    },
    "alertTokens": {
      "token1": "value",
      "token2": "value2"
    }
  },
  "scriptExceptions": {
    "subject": "No monitoring data / Unable to fetch monitoring data / Incomplete script execution",
    "description": "Failed to collect data for following metrics. \n metricName: <Metric Name 1>, FailureReason: Failure Reason 1 \n metricName: <Metric Name 2>, FailureReason: Failure Reason 2",
    "raiseAlert": true,
    "logRequired": true,
    "alertState": "warning"
  }
}

Example:

{
  "system_linux_memory_utilization": {
    "components": {
      "real_memory_utilization": 50
    },
    "alertTokens": {
      "memory.usage": "Total memory: 16 GB, Used memory: 8 GB"
    }
  }
}

Explanation:

In this example, the system_linux_memory_utilization metric includes a component real_memory_utilization with a value of 50. The alert tokens offer a summary at the metric level, indicating “Total memory: 16 GB, Used memory: 8 GB”. This information is included in the alert description, the alert subject, or both for the system_linux_memory_utilization metric.

Format 5:

Description: This format is used, when user needs alert tokens at metric-level, for only few of the metrics of the script.

Format:

{
    "MetricName1": 254,
    "MetricName2": {
        "components": {
            "comp1": 90,
            "comp2": 60
        }
    },
    "MetricName3": {
        "components": {
            "component1": "STATE",
            "component2": 98
        },
        "alertTokens": {
            "token1": "value",
            "token2": "value2"
        }
    }
}

Example:

{
  "system_linux_services_status": {
    "components": {
      "iptables": "inactive",
      "irqbalance": "active",
      "kdump": "active",
      "ypxfrd": "inactive"
    },
    "alertTokens": {
      "services.active.counts": " 2 services are active",

      "services.inactive.counts": "2 services are inactive"
    }
  },
  "system_linux_totalServices_count": "4"
}

Explanation:

The system_linux_services_status metric includes two alert tokens at the metric-level.

  • Token 1, denoted as services.active.counts, provides the count of active services.
  • Token 2, identified as services.inactive.counts, offers the count of inactive services.

These tokens provide valuable insights into the status of the services monitored under this metric.

On the other hand, the metric system_linux_services_count does not have any associated alert tokens. So, alerts for this metric will not include any additional descriptive information.

Format 6:

Description: This format is used, when user needs Component Level Alert Tokens. Alert token value for each metric component are specified separately, as mentioned below:

Format for the JSON output when there are no exceptions:

{
  "MetricName1": 254,
  "MetricName2": {
    "components": {
      "comp1": 90,
      "comp2": 60
    }
  },
  "MetricName3": {
    "components": {
      "component1": "STATE",
      "component2": 98
    },
    "alertTokens": {
      "token1": "value",
      "token2": "value2"
    }
  },
  "MetricName4": {
    "components": {
      "component1": 10,
      "component2": 20
    },
    "alertTokens": {
      "token1": {
        "component1": "token 1 value of component 1",
        "component2": "token 1 value of component 2"
      },
      "token2": {
        "component1": "token 2 value of component 1",
        "component2": "token 2 value of component 2"
      }
    }
  }
}

Or

Format for the JSON output when exceptions occur:

{
  "MetricName1": 254,
  "MetricName2": {
    "components": {
      "comp1": 90,
      "comp2": 60
    }
  },
  "MetricName3": {
    "components": {
      "component1": "STATE",
      "component2": 98
    },
    "alertTokens": {
      "token1": "value",
      "token2": "value2"
    }
  },
  "MetricName4": {
    "components": {
      "component1": 10,
      "component2": 20
    },
    "alertTokens": {
      "token1": {
        "component1": "token 1 value of component 1",
        "component2": "token 1 value of component 2"
      },
      "token2": {
        "component1": "token 2 value of component 1",
        "component2": "token 2 value of component 2"
      }
    }
  },
  "scriptExceptions": {
    "subject": "No monitoring data / Unable to fetch monitoring data / Incomplete script execution",
    "description": "Failed to collect data for following metrics. \n metricName: <Metric Name 1>, FailureReason: Failure Reason 1 \n metricName: <Metric Name 2>, FailureReason: Failure Reason 2",
    "raiseAlert": true,
    "logRequired": true,
    "alertState": "warning"
  }
}

Example:

{
  "system_linux_interfaces_count": {
    "components": {
      "system_linux_interfaces_count": 8
    },
    "alertTokens": {
      "interfaces_names": "interfaces names are cni0, ens160, flannel.1, veth05f2cc15,  veth10f0079d,  veth42a5dd4f,  vetha7efbaa5,  vethdc3c2d78"
    }
  },
  "system_linux_network_interface_trafficIn": {
    "components": {
      "cni0": 14329842632,
      "ens160": 13491465976
    },
    "alertTokens": {
      "mac.address": "cni0:42-a5-a4-fd-86-eb,ens160:00-0c-29-b7-be-c3"
    }
  },
  "system_linux_network_interface_trafficOut": {
    "components": {
      "cni0": "9148429744",
      "ens160": "12488937472"
    }
  },

"scriptExceptions": {
    "subject": "An exception has occurred. Unable to fetch the monitoring data",
    "alertState": "critical",
    "description": "Failed to collect data for following metrics. \n metricName: system_linux_network_interface_errorsIn, FailureReason: rx_errors attribute not available for interface cni0, metricName: system_linux_network_interface_errorsOut, FailureReason: tx_errors attribute not available for interface cni0",
    "raiseAlert": true,
    "logRequired": true
  }
}

Explanation:

In this example, the JSON structure demonstrates the use of scriptExceptions along with component-level and normal metric-level alert token.

The system_linux_interfaces_count metric includes a component that counts the number of interfaces and a metric-level alert token listing the names of these interfaces.

For system_linux_network_interface_trafficIn, the components show the inbound traffic for specific interfaces, while the component-level alert token provides the MAC addresses of these interfaces, offering more details on the interface.

system_linux_network_interface_trafficOut presents outbound traffic data for the interfaces but does not include additional alert tokens.

Multiple Custom Script Arguments in Macro

Refer the document Multiple Custom Script Arguments for more details.

Generate Alert Tokens in RSE

Refer the document Generate Alert Tokens in RSE for more details.

Exception Handling In RSE

Refer the document Exception Handling In RSE for more details.

Enumerated Mapping In RSE

Refer the document Enumerated Mapping In RSE for more details.

Agent G2 RSE Troubleshooting Steps

Windows & Linux

Use case 1

Unable to fetch latest metrics data

When you apply an Agent-based RSE template on a Windows device and encounter the message, “Failed to get latest metrics/Agent is offline” while fetching latest metrics data, follow the below steps:

Step 1:

Navigate to the Overview section of the Device (Infrastructure > Resources > Search using IP or Device Name) and on that device ensure that the Agent is installed and online (indicated by blue color as shown below).

Overview screen

Step 2:

Identify whether a template is global or customer written.

To determine if a template is global or custom-written, refer to FAQ #3.

Step 3:

Alerts Types:

Review any alerts associated with the template on the Overview page of the Device (or) navigate to Command Center > Alerts page and filter using the specific server name or Ip address. Common alert subjects to look out are as follows:

i. There are two cases:

Case I: InvalidJsonException: Validate your Script Output. (Metric Name is JSON_PARSE_ERROR)

Case II: ScriptExecutionFailureException: Failed to execute the Script. Validate your Script. (Metric Name is script_error)

When you encounter such errors within global templates, follow below steps -

  • Ensure that input parameters are provided to the template in the Input Parameters section according to the Template Usage Guidelines or the Template description, if any. See here and search with template or metric name.
  • Ensure that the server meets the pre-requisites mentioned in the Template’s pre-requisites section, if any.
  • If the input parameters are provided as per the Template Usage Guidelines and the pre-requisites are met, then raise a case by manually executing the script and retrieve both the script output and debug level agent logs as mentioned in step #4.

When you encounter such errors in customer written scripts, follow below steps:

  • Advise customers to ensure that the final script output adheres to one of the JSON output formats mentioned here
  • In Script failure exception cases, look for the proper error message given in the alert description and rectify the code accordingly.

ii. No Credentials are found against the Device of Type: Database. (Metric Name is rse.no.credentials.error)

To resolve this issue, assign credentials of type Database in the Credentials section as demonstrated in below screenshot.

Credentials screen

For errors related to different types of credentials, assign the appropriate credential type accordingly on the device.

iii. Handling Other Alert Subjects returned from scriptExceptions ( Metric Name is rse_metric_collection_failures)

For alerts with subjects like “An exception has occurred. Unable to fetch the monitoring data,” typically encountered in global templates due to exception handling mechanisms within scripts,

If you encounter alerts with descriptions such as “Empty/invalid input parameters”, “Unable to load PostgreSQL environment,” follow these steps:

  • Check if input parameters are provided to the template in the input parameters section according to the Template Usage Guidelines or Template description.
    See here and search with template or metric name.
  • Confirm that the server meets any pre-requisites mentioned in the template’s Pre-Requisites section, if any.

Step 4:

Proceed by gathering debug-level logs and manually running the script to obtain the script output as depicted below:

Retrieving debug Logs:

  1. By default, agent log level is set to warn. To enable debug level logs, below are the steps to be performed on the end device:

    1. Navigate to the path where agent is installed. Default agent installation path for Windows is C:\Program Files (x86)\OpsRamp\Agent\conf, for Linux it is /opt/opsramp/agent/conf
    2. Open the configuration.properties file.
    3. Under the Log section, locate the log_level parameter.
    4. Change the value of log_level from warn to debug.
    5. Save the changes to the configuration.properties file.

      Navigate to the agent installed path on your device. Default path for Windows is: C:\Program Files (x86)\OpsRamp\Agent\log, for Linux it is /opt/opsramp/agent/log

  2. Share that log folder in ZIP format with the respective team for further analysis.

                                 (OR)
    

i. If the user lacks device access, an alternative option is available to adjust the log level to debug directly from the device page in the UI by selecting the command “Enable Agent Log Debug Mode.” See the screenshot provided below.

Commands
ii. Retrieve the last 500 lines from the logs by selecting the command “Show Recent Agent Log”, as depicted below:
Commands

Executing Scripts Manually:

  1. Proceed to the agent default installation directory, default path in Windows is C:\Program Files (x86)\OpsRamp\Agent\plugins\rse\, in Linux path is /opt/opsramp/agent/plugins/rse
  2. To locate the desired RSE script file within the specified directory, follow these steps for both Windows and Linux: i. To locate Powershell script in windows , use command Get-ChildItem -Path "C:\Program Files (x86)\OpsRamp\Agent\plugins\rse" -Filter *.ps1 -Recurse | Select-String -Pattern "metric_name" | Select-Object -Unique Path ii. To locate Shell script in Linux, use command grep -rl "metric_name" /opt/opsramp/agent/plugins/rse In both cases, replace metric_name with the specific metric name you are searching for within the template and update paths to match the respective RSE folders for Windows and Linux if they are different from the above used paths.
  3. Execute the script manually and capture the output for further analysis, as shown in below screenshots-
  • Executing Powershell script

    Commands

  • Executing Shell script

    Commands

  1. After executing the script,

    • Check the JSON validity using online tools like https://jsonformatter.org/#google_vignette, if it is valid, make sure that it aligns with one of the RSE supported formats mentioned in documentation here.

    • If output is blank or script execution throws error,
      i. If it is global template, raise a case with your findings, and attach the logs and the manual script output. ii. If it is customer written template, verify the commands used in script on device and rectify the script accordingly.

Use case 2

Graph data is not populating for specific or all metrics

Things to check:

  1. Validate whether the metric is retrieving data from end device by checking latest snapshot data, if not, refer to Use case #1.

  2. Also check if the graph is enabled or not at metric level. If it is enabled, check whether data received from latest snapshot data is a string, as shown below:

Snapshot

If it is a string, then check if Enum Mapping is defined for that string at metric level, as shown below:

Enum mapping
  1. If Enum mapping is not defined for that particular string and it is a global template, then raise a case with your findings, while attaching screenshots of latest snapshot data, Enum Mapping defined at Metric.
  2. If Enum mapping is not defined for that particular string and it is a customer written template, then suggest customer to edit the metric and add this new state in State Descriptions field, as shown in the above screenshot.
Use case 3

User is observing gaps in metric graphs.

This issue might be due to the following reasons:

  • Agent going offline at that time
  • Device itself being offline
  • No data is available for the metric on the device at that time.

Check the debug level logs to cross verify if Agent/Device was offline at that time.

If you do not find any logs related to those, then raise a case with your findings, while attaching logs, to analyze command/script behavior at those specific times when graph is not populating.

Use case 4

Alerts are not getting generated on the resource for a particular metric.

Check latest snapshot data to see if any data is being retrieved from the device for that metric and also verify the thresholds defined for the metric.

If the latest snapshot data is also not being received for that metric, then execute the command or script manually on the device, to see if any data exists for that metric.

Then raise a case with your findings, if the issue is still unclear.

Use case 5

Alerts generated do not align with the defined alert thresholds.

Refer to the Alerts Hierarchy outlined below:

Alert thresholds Precedence Order: Template-level threshold > Device-level threshold > Device-Component level thresholds.

  1. Template-Level Thresholds: These are the thresholds set in the template.
  2. Device Level Thresholds: These are the thresholds set in the device page for a template. These thresholds override the template-level thresholds.
  3. Device Component Level Thresholds: These are the thresholds set in the Monitors tab for a template for each component of the metric. These thresholds have the highest precedence, overriding both device and template-level thresholds.

In summary, device component-level thresholds override device-level thresholds, which in turn override template-level thresholds.

This hierarchy ensures that monitoring configurations can be fine-tuned at various levels of the system, allowing for granular control over alerting parameters. This approach enables more precise and effective management of alerts, tailored to the specific needs of each level.

After checking the above things, raise a case with your findings, if the issue still persists.

Use case 6

User has made changes to metrics and monitor but still cannot see the latest changes reflected on template data.

After making changes to a metric or monitor, follow below steps to see latest changes reflected in template data:

Metric-level changes: This is applicable only for customer-created metrics, as users do not get Edit Option for Global Metrics.

  • Un-assign the template from the devices it is applied to.
  • Remove the monitor from the template and add any other monitor.
  • Navigate to the monitor and to the metric section. Click on the Actions option of the metric for which the changes were made. You will see a delete option, as shown below:
  • Click on delete and save the monitor. Reopen the monitor, add the updated metric, and save the monitor.
  • Finally, assign this monitor back to the template.
    Metrics

Monitor-level changes: This is applicable when user creates their own monitor or creates a copy of the global monitor.

  • If changes are made to the monitor’s script or any configuration parameter, reassign the monitor to the template to see the changes reflected.

    After checking the above, raise a case with your findings, if issue still persists.

Use case 7

User has removed any metric(s) from the monitor.

If a customer clones an existing monitor and decides not to monitor certain metrics listed under that monitor, advise them not to remove these metrics directly from the monitor. This can lead to errors due to missing metric mappings at the monitor level. Instead, recommend removing unwanted metrics during the template creation process.

Here is a sample alert screenshot for reference:

Sample Alert

Gateway-based Custom Monitors development

Gateway has following 2 different adapter frameworks to develop custom monitors:

  • Remote Shell - With this adapter framework, you need to build separate scripts for each metric - NOT RECOMMENDED as it is less efficient (if metrics count is high it will consume more system resources like number of processes, cpu, etc. on end device).
  • Remote Script Executor (RSE) - With this adapter framework, you can build a single script for 1 or more metrics - RECOMMENDED

How to develop a RSE script:

Follow the same step-by-step script development process explained in the above agent generation 2 section. The only per-requisite is SSH credentials should be assigned / attached against the target device in OpsRamp platform. So that gateway can establish SSH connection to the target device and it can execute RSE script on target device and get JSON output back to gateway.

See here for Gateway RSE Supported scripting languages:

NOTE: You will see Bash / PowerShell / Python at monitor creation page, but for other languages, you have to select Custom script type.

How to invoke a script in remote machine ?

You can also invoke a script that resides on the remote machine. You just have to mention the Absolute file path in the script block (in monitor creation page).

It was only tested for shell scripts.

Create a Metric

Follow step-by-step metric creation process explained in the above agent generation 2 section.

Create a Monitor

Follow step-by-step monitor creation process explained in the above agent generation 2 section.

Create Template

Follow step-by-step template creation process explained in the above agent generation 2 section, but the only change is select Gateway as collector type instead of Agent.

Assign Template

Follow step-by-step template assign process explained in the above agent generation 2 section, but the only change is select Gateway as collector type instead of Agent.

Gateway G2 Troubleshooting Steps

Remote Script Executor Troubleshooting steps

Pre-Requisites:

  • Gateway version 7.0.0. or more should be present.
  • SSH Credentials should be assigned on the End device itself, not on the Gateway.

Checking Monitoring Config:

To check the updated monitoring configuration pushed to gateway, use the following command. This needs to be executed on gcli prompt only.

syntax: syntax : flag add mon.conf.json on <Log enabled for number of minutes>

Example command: flag add mon.conf.json on 30

After enabling the above flag, Monitoring configuration file has been created in the gateway at this path /var/log/app/tmp/.

File format: monconf-<timeStamp>.json

Example filename: monconf-1711003657507.json

Windows & Linux RSE

Use Case 1

Unable to fetch latest metrics data

When you apply Gateway-based RSE template on a Windows device and encounter the message “Failed to get latest metrics/Gateway is offline”, while fetching latest metrics data, follow the below steps:

Step 1:

Navigate to the Overview page of the Device (Infrastructure > Resources > Search using IP or Device Name ) and on that device ensure that the device is managed by Gateway and it is online (indicated by blue color as shown below).

Gateway

Step 2:

Identify whether a template is global or customer written.

To determine if a template is global or custom-written, refer to FAQ #3.

Step 3:

Alerts Types:

Review any alerts associated with the template on the Overview page of the Device (or) navigate to Command Center > Alerts page and filter using the specific server name or Ip address. Common alert subjects to look out are as follows:

i. There are two cases as mentioned below:

Case I: InvalidJsonException: Validate your Script Output. (Metric Name is rse.invalid.json.error )

Case II: ScriptExecutionFailureException: Failed to execute the Script.Validate your Script. (Metric Name is rse.script.error or rse.script.timeout.error)

When encountering such errors within global templates, follow below steps:

  • Ensure that input parameters are provided to the template in the Input Parameters section according to the Template Usage Guidelines or the Template description, if any. See here and search with template or metric name.
  • If custom script Type is selected, make sure that the script execution path is valid.
  • Ensure that the server meets the pre-requisites mentioned in the Template’s pre-requisites section, if any.
  • If the input parameters are provided as per the Template Usage Guidelines and Pre-requisites are met, then raise a case by manually executing the script and retrieving both the script output and debug level agent logs as mentioned in step #4.

When you encounter such errors in customer-written scripts, follow below steps:

  • Advise customers to ensure that the final script output adheres to one of the JSON output formats mentioned in the link
  • In Script failure exception cases, look for the proper error message given in alert description and rectify the code accordingly.
  • If custom script Type is selected, make sure that the script execution path is valid.

ii. No Credentials found against the Device of Type: SSH ( Metric Name is rse.no.credentials.error )

To resolve this issue, check the following when you observe such alerts:

  1. Credentials are not attached to device.
  2. Attach credentials of type SSH, if remote operating system is Linux or Unix.
  3. Attach credentials of type Windows, if remote operating system is Windows.

iii. Macros related errors ( Metric Name is rse.unresolved.macro.error )

To resolve this issue, check the following when you observe such alerts:

  1. The macros used in the script are not resolved.
  2. The credentials used in the macros are not available on the device.
  3. The customAttributes used in the macros are not available on the device.

iv. Device connection errors ( Metric Name is rse.device.connection.errors )

To resolve this issue, check the following when you observe such alerts:

  1. The device should be reachable from Gateway.
  2. The port should be accepting the connections.
  3. The Credentials attached to this device are valid.
  4. If Credentials are Key based, evaluate your private key.

v. Authentication errors ( Metric Name is rse.authentication.error )

To resolve this issue, check the following when you observe such alerts:

If OSType is WINDOWS:

  1. Credentials attached to this device should be valid.
  2. WinRM Service should be enabled on Gateway or Remote Device.
  3. WinRm Service should be allowed through Firewall on Gateway or Remote Device.

If OSType is not Windows: The following may be one of the reasons for this alert:

  1. Credentials attached to this device are not valid.
  2. If credentials are Key based, evaluate your private key.

vi. Handling Other Alert Subjects returned from scriptExceptions ( Metric Name is rse_metric_collection_failures )

For alerts with subjects like “An exception has occurred. Unable to fetch the monitoring data,” typically encountered in global templates due to exception handling mechanisms within scripts.

If you encounter alerts with descriptions such as “Empty/invalid input parameters,” “Unable to load PostgreSQL environment,” etc., follow these steps:

  • Check if input parameters are provided to the template in the input parameters section according to the Template Usage Guidelines or Template description.

    For reference, see the sample guidelines provided here

  • Confirm that the server meets any pre-requisites mentioned in the template’s Pre-Requisites section, if any.

Step 4:

Gather debug-level logs as depicted below:

For Windows:

  1. Enable Debug Mode for Logs:

    • Access the Gateway Command Line Interface (gcli) by running: telnet localhost 11445

    • Use the following commands to enable the required flags:

      syntax : flag add rse.log on

      flag add rse.log on 30
      flag add rse.script.log on 30

  2. Retrieve Logs:

    Check the latest logs: C:\Program Files\OpsRamp\Gateway\log\vprobe.log

  3. Get that log folder in ZIP format with respective team for further analysis(C:\Program Files\OpsRamp\Gateway\log).

For Linux:

Enable Debug Mode for Logs:

  • Enter gcli mode by running command: gcli

  • Execute the following commands to enable flags:

    syntax: flag add rse.log on

    flag add rse.log on 30
    flag add rse.script.log on 30

Retrieve Logs:

  • Exit gcli and run the following command to observe logs:

    sudo tail -100f /var/log/app/vprobe.log

After following the above steps for different errors, if the issue still persists, then raise a case with your findings, while sharing that log folder in ZIP format with the respective team for further analysis(/var/log/app).

Use case 2

Graph data is not populating for specific or all metrics.

See Graph data Use case

Use case 3

User is observing gaps in metric graphs.

See Gaps in graphs

Use case 4

Alerts are not getting generated on the resource for particular metric.

See Alerts not generated

Use case 5

Alerts generated do not align with the defined alert thresholds.

See Alerts not aligned with alert thresholds

Use case 6

User has made changes to metrics and monitor, but still cannot see the latest changes reflected in template data.

See Metric or Monitor Changes

RSE Limitations and Challenges

Challenges across all script types:

  • Cannot use 3rd party utilities without additional installations in the end device.

Challenges specific to script type for Gateway Collector Type:

  • Bash – No support for arrays. It will treat array syntax as predefined macros available in RSE framework.

G2 RSE FAQ’s (Agent & Gateway)

  1. Where should I check for OpsRamp supported metrics?

    You should check the Recommended Templates page within the public documentation.

    You can search for the operating system name (Windows / linux / AIX )or application name (Active Directory / Exchange / IIS, etc.) or database name(MSSQL / Oracle, etc.) to determine if monitoring support already exists.

    Refer: https://docs.opsramp.com/support/reference/recommended-templates/

    Or search for specific metric name in the below page:

    https://docs.opsramp.com/support/reference/agent-templates/g2-agent-template-details/

    If the required monitoring support is not found in these pages and it is a generic request applicable beyond the customer’s specific needs, then only submit a case to the support team for Request for Enhancement (RFE).

    However, if the monitoring requirement is specific to the customer’s needs, then develop your own script by following the Remote Script Executor (RSE) public documentation by referring the below links:

    Agent RSE: https://docs.opsramp.com/solutions/monitors/custom-monitors/setting-up-agent-based-g2-custom-monitors/

    Agentless RSE: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor

  2. How to identify suitable JSON format (out of the 6 mentioned formats in public documentation) as per your requirements when you write your own script?

    Execute the commands manually on the device and adjust the output as per supported JSON formats mentioned in documentation.

    Agent Based RSE: https://docs.opsramp.com/solutions/monitors/custom-monitors/setting-up-agent-based-g2-custom-monitors/#standard-json-output-format

    Agentless Based RSE: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/#standard-json-output-formats-remote-script-executor-script

    JSON formats will be the same irrespective of collector type: Agent or Gateway.

  3. How to capture additional logs and generate alerts for script failure reasons, in customer-written scripts.

    To capture additional logs and generate alerts for script failure reasons, in customer-written scripts, you can utilize the scriptExceptions option of RSE.

    See: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/#exception-handling-in-rse

  4. Where to refer when you want to develop your own G2 monitoring in RSE?

    You can refer to the Remote Script Executor (RSE) guide available in the public documentation.

    This guide provides comprehensive instructions on how to develop custom metrics.

    For Agentless RSE Monitoring: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/

    For Agent RSE Monitoring: https://docs.opsramp.com/solutions/monitors/custom-monitors/setting-up-agent-based-g2-custom-monitors/

  5. Latest metric snapshot data is not available from the template.

    For Agent Template: Refer to UseCase 1 in Agent G2 RSE Troubleshooting steps.

    For Gateway Template: Refer to UseCase 1 in Gateway G2 RSE Troubleshooting steps.

  6. How to plot a graph for String values like health or status metrics?

    To plot a graph for state or status-related metrics returned as strings, utilize the Enumerated Map option in RSE. This needs to be defined at the metric level by selecting Datapoint Value Conversion as “Enumerated Map.”

    See: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/#enumerated-mapping-in-rse

  7. Script arguments related macro error is observed in alerts.

    Check if the macros are being properly used in script at monitor level.

    See: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/#macros-in-remote-script-executor

    For gateway templates, enable the below flag in GCLI and then you can see replaced script in logs. There, you can see if macro values are properly replaced or not in the script.

    flag add rse.script.log on 1440

    For Agent based Template: Refer Alert Types in Agent G2 RSE Troubleshooting steps.

    For Gateway based template: Refer Alert Types in Gateway G2 Troubleshooting steps.

  8. Credentials related macro error is observed in error logs.

    Check if credential related macros are being properly used in script at monitor level.

    See: https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/#macros-in-remote-script-executor

    For Agent based Template: Refer Alert Types in Agent G2 RSE Troubleshooting steps.

    For Gateway based template: Refer Alert Types in gateway G2 Troubleshooting steps.

  9. You want to exclude monitoring for some components of metric. How can you achieve this?

    If you want to exclude monitoring some components of metric, then use Component Filters option of RSE.

    By using these component filters, you can monitor specific components or ignore unwanted components from monitoring.

    Refer to this documentation on how to use RSE Component Filters: https://docs.opsramp.com/solutions/monitoring/template/component-filters/

  10. For Agentless monitoring, should SSH credentials and template be assigned on the target device or on the Gateway device?

    In Agentless (Gateway) monitoring, it is essential to assign the target device’s SSH login credentials and templates directly on the target device, not on the Gateway device.

  11. Observed any command execution errors on device like permissions error in alert description>

    Agent or Gateway should have sufficient permissions to execute the commands used in the script on the device.

  12. If you want to know the queries or commands used for RSE template?

    If you want to know the queries or commands used in RSE template, create a copy of global monitor and see the queries or commands used, and then delete that copied monitor if it is not needed anymore.

    If the command / query used is still not clear, then raise a JIRA with respective team / template owner.

    For Query based DB templates, query can be seen on metrics page for that respective metric.

  13. You wish to personalize the alert subject and description of a global metric. How can you achieve this?

    Create a copy of global monitor: https://docs.opsramp.com/solutions/monitoring/monitor/copy-monitor/. Then follow the below steps:

    To modify the alert subject of a metric, you can utilise the ‘Edit’ option found in the Actions tab at the monitor on the Setup page for the relevant metrics, demonstrated in the screenshot below:

    Agent based monitor - metrics

    After clicking on “Edit,” you can input custom message or text in the Subject field for the Alert subject and the Description field for the Alert description.

    Agent based monitor - subject and description

    After making these adjustments, you should add the copied monitor to the template for the changes to take effect.

    Note: In the same edit option, you can also update Units of the metric, if needed.

  14. What is the maximum length of characters that you can pass to a template as input parameters?

    You can pass up to 5000 characters as input parameters for a template.

  15. Can a user assign version 1 and version 2, 3, 4, etc. of a template on the same device?

    No, in nearly all cases, the metrics present in version 1 (v1) will also be included in version 2 (v2) or later versions. The later version typically includes additional metrics, enhancements to existing metrics or approaches, and bug fixes. Therefore, we recommend always using the latest version of the template to ensure user benefits from these improvements and new features.

  16. What can be returned as Metric / Metric’s Component values?

    Metric / Metric’s Component values should be limited to either a string representing a state/status or an integer, which can be used for configuring alerts relevant to monitoring.

    Other values, such as IP addresses, time formats, or strings that represent names (other than state/status), should be included in alertTokens as additional information related to the metric/metric’s component. This extra information will only be displayed when an alert is triggered for the corresponding metric/metric’s component.

    For detailed guidance on using alert tokens in RSE, refer to the below link:

    https://docs.opsramp.com/solutions/monitors/agentless-monitors/remote-script-executor/#generate-alert-tokens-in-rse




Next Steps