8. Monitoring

Fig. 8.1 Monitoring Overview

RED exposes monitoring data through four dedicated interfaces.

  • REST API
    Provides a dedicated set of APIs for monitoring. Including RED and Exasol Database status data.

  • SQL API
    Allows access to all control and status views within the database.

  • syslog
    Expose the Exasol syslog information. Refer to the Exasol homepage for more information on monitoring.

  • CLI
    Redinfo is a commandline tool on the RED Batch enviroment providing status information.

RED provides all control and status information in views and tables in the database. Even the status of the RED-PSS can be monitored in this way using SQL-queries only. Therefore, it is not strictly necessary to monitor individual processes on OS-level. Detailed documentation of individual columns in the referenced tables and views can be found directly in the database itself as column comments.

8.1. RED REST Monitoring Endpoints

The APIs are accessable via swagger (Open API) and can be found under //http[s]:<red-server>:<rest-port>/docs.

8.1.1. Authentication

Authentication is done via Access Token provided in each request. The User must be provisioned in the RED Framework first see REST API.

8.1.2. Monitoring Endpointlist

Endpoint

Description

/mon/adm_jmon_current_load

Returns current number of active processes in the default queue (job and watcher processes with short runtime), the loader queue (long running processes that execute the load) and the action queue. As well the configured queue size and the queue utilizations.

/mon/adm_jmon_job_status

Job Name, Job ID, Execution Time, Next execution Time of the latest job executions that were started by job monitor (jmon) process within all schemas.

/mon/all_adm_job_status

Job execution status information. Error and alert levels depend on execution return codes and statistical calculations of run time deviations. See comment on column ALERT for more information. Alert level. 0 = no alert, 1 = warning, 2 = error. 1 means either execution has been delayed between 30 and 60 seconds after next scheduled start time or run time took 3-4 standard deviations longer than average run time. 2 means either an error occurred or execution has been delayed for more than 60 seconds after next scheduled start or run time took more than 4 standard deviations longer than average run time.

/mon/all_adm_watcher_exec_status

Error log of watcher processes within all schemas including alert evaluation for execution retries.

/mon/all_adm_watcher_status

Alert evaluation of watcher processes within all schemas (REDADM, CFIADM, DBIADM, ZDWADM) for the last processing.

/mon/all_db_size_last_day

Returns raw size, memory size, object count, storage use every 30 minutes.

/mon/all_monitor_last_day

Returns cpu utilization, hdd read ratios, net traffic ratios, swap ratio, … information every 30 seconds.

/mon/all_usage_last_day

Returns number of login users and queries every second

/mon/app_status

Applications in alert state. Returns Schema, Table name, min and max bdate, effected number of days and alert code and alert reason.

/mon/check_python_vm

Python VM Availability and Latency Check

/mon/db_backup_list

Returns a list of useable backups.

/mon/jmon_status

Log data of the latest job executions that were started by job monitor (jmon) process within all schemas.

/mon/object_status

Objects in alert state. Returns Schema, Table name, min and max bdate, effected number of days and alert code and alert reason.

/mon/exa_dba_transaction_conflicts_last_day

Returns for each minute the number of transaction conflicts and max transactionconflict duration in seconds

/mon/exa_loadavg

Returns information on the current CPU load in each of the EXASolution nodes.

/mon/exa_statistics_object_sizes

Returns the sizes of all statistical system tables aggregated by the type of the statistical info.

/mon/exa_volume_usage

Returns details of the database usage of the storage volumes.

8.2. SQL API/Monitoring Views

Database objects whose name begins with “STA_” contain condensed overview information about the status and configuration of the system. Some objects hold ALERT and ALERT_REASON columns that indicate whether there is an immediate need for action. The information is prepared for usage in external monitoring tools or direct user queries. The alert levels and their meaning are:

  • 0…OK

  • 1…Warning

  • 2…Error

Furthermore the ALERT_REASON column might show a textual description, if a warning or an error occured.

Views starting with “STA_” are always granted to public, while all other views and tables are only granted to the application manager.

8.2.1. Object Status (REDADM)

see Object Status

8.2.2. TSA-Loader Status (REDADM)

The related views in REDADM show status information of all TSA-connectors in the system, i.e. DBI, CFI and custom connectors.

  • STA_TSA_QUEUED_LOADER: Shows the status of all TSA-loaders which are currently queued for execution (status: NOT SCHEDULED).

  • STA_TSA_ACTIVE_LOADER: Shows the status of all TSA-loaders which are currently in status SCHEDULED, RUNNING or FAILED. The column RUNTIME shows the actual runtime in seconds while RUNTIME_AVG shows the average runtime of this loader and RUNTIME_STDDEV shows the corresponding standard deviation of the sample.

    The ALERT column switches to 1 if:

    • the loader has been more than 30s in status SCHEDULED

    • the RUNTIME > RUNTIME_AVG + 3 x RUNTIME_STDDEV (3-sigma rule)

    The ALERT column switches to 2 if:

    • the loader has been more than 60s in status SCHEDULED

    • the RUNTIME > RUNTIME_AVG + 4 x RUNTIME_STDDEV (4-sigma rule)

  • STA_TSA_LOADER_COUNT: Gives an overview (total count) of all not scheduled, scheduled, running and failed TSA loaders in the framework.

8.2.3. PSA-Loader Status (REDADM)

The related views in REDADM show status information of currently active PSA-loader processes.

  • STA_PSA_QUEUED_LOADER: Shows the status of all PSA-loaders which are currently queued for execution (status: NOT SCHEDULED).

  • STA_PSA_ACTIVE_LOADER: Shows the status of all PSA-loaders which are currently in status SCHEDULED, RUNNING or FAILED. The column RUNTIME shows the actual runtime in seconds while RUNTIME_AVG shows the average runtime of this loader and RUNTIME_STDDEV shows the corresponding standard deviation of the sample.

    The ALERT column switches to 1 if:

    • the loader has been more than 30s in status SCHEDULED

    • the RUNTIME > RUNTIME_AVG + 3 x RUNTIME_STDDEV (3-sigma rule)

    The ALERT column switches to 2 if:

    • the loader has been more than 60s in status SCHEDULED

    • the RUNTIME > RUNTIME_AVG + 4 x RUNTIME_STDDEV (4-sigma rule)

  • STA_PSA_LOADER_COUNT: Gives an overview (total count) of all not scheduled, scheduled, running and failed PSA loaders in the framework.

  • STA_PSA_CONTROL_DETAIL: Shows the full history of each and every PSA loader process.

8.2.4. Application Status (REDADM)

  • STA_APP_OBJECT_STATUS: Shows the object status of all components of an application. The OBJECT_STATUS takes its values from STA_OBJECT_STATUS and LAST_CHANGE is taken from LAST_PSA_FINISH. Alert values are based on STA_OBJECT_STATUS as well. Note that this view ignores all loads which are not planned by a corresponding load cycle since they not relevant for an application status.

  • STA_APP_STATUS: Applications

  • RED_APP_ACT_STATUS: Shows the status application actions for each BDate (information about the last run only).

  • RED_APP_ACT_CONTROL_DETAIL: Shows detailed information of each action run (full history).

8.2.5. Connector Logs

For detailed log information regarding the data import from source to TSA check out the views CFIADM.CFI_IMPLOG_DETAILS and DBIADM.DBI_CDC_LOG_DETAILS, respectively. An aggregated status can be found in DBIADM.DBI_CDC_STATUS and CFIADM.CFI_IMP_STATUS, respectively.

8.3. REDINFO

redinfo is a commandline tool which gives a (constantly refreshing) overview of the RED framework including watchers, jobs and active loaders as well as of the general status of the DB. It may be used for interactive monitoring.