Instance

The instance dashboard provides information about your running system, including the DataForge server itself, the various microservices, and push servers.

The instance dashboard can be viewed under Instance > Instance. To navigate through the different categories, use the navigation bar at the top of the page:

instanceNavigation
Instance: Navigation bar

Server actions

The server actions can be accessed by clicking the corresponding button next to the dashboard navbar. This will open a modal with the following options:

serverActions
Instance: Server actions

  • Reload configs: Reload your DataForge configuration without having to restart the server.
  • Refresh caches: Refresh all your DataForge caches.
  • Stop server: Stops your DataForge server. If DataForge is deployed on a docker and restart: is set to always, the server will restart immediately after.

DataForge server

The DataForge server section provides:

  • Overview of gRPC, Zabbix gRPC and NSQ-RPC stats requests and stats
  • Successful authentication ratio
  • Reporting summary
  • Recent job queue size
  • Job queue size
  • Caches

Request overviews

There are four request overviews: gRPC, NSQ-RPC, Zabbix-RPC and database. They are always structured in the same way, they provide an overview of the latency of the requests and a success rate:

gRPCOverview
Instance: gRPC overview

The latency is further divided into four subsections:

  • 99th percentile: The 99th percentile indicates which of a series of values was higher, and therefore worse, than 99% of the other values.
  • First percentile: The first percentile indicates which of a series of values was smaller, and therefore better, than 99% of the other values.
  • Median: Indicates the latency median.
  • Average: Indicates the latency average.

Successful authentication overview

Describes the ratio of how many of the various login processes were successful:

successfulAuthRatio
Instance: Successful authentication ratio

Thread guard activations

The error stack of a crashed thread. This information will only be generated if the threadguardenabled is set to true.

threadguard1
Instance: Thread guard activations

threadguard2
Instance: Thread guard activations error stack

Recent job queue size

A graph showing how many jobs were queued at what time. The tabs can be used to select the jobs for which the graph is to be rendered:

recentJobQueueSize
Instance: Recent job queue size

Job queue

The job queue is a list of jobs that are still being executed. Each entry has an indicator of its place in the queue. If you click on an entry, you will be forwarded to the respective job details. The tabs can be used to change the type of jobs to be listed:

jobQueue
Instance: Job queue

Caches

Shows the number of reads/writes per cache:

cache
Instance: Cache

If you hover over the bar of a cache type, a more detailed subdivision is displayed:

cacheHover
Instance: Cache detailed information

RPC stats

Each RPC stats card has the same structure and exists for the types: gRPC, Zabbix-RPC and NSQ-RPC. In general, it is used to display more precise statistics per RPC method. You can switch between “average, median, 1st percentile and 99th percentile”, by clicking the corresponding tabs above the graph.

gRPCStats
Instance: gRPC stats

Information about the latency of the different methods can be read from a graph showing the different methods. You can hover over a bar to see details about the method’s latency and error rate:

statsLatency
Instance: RPC latency

More details per method are displayed in the list below the graph:

statsPerMethod
Instance: RPC stats per method

  • Method: The name of the method.
  • Total calls: The total amount the method was called.
  • Total errors: The total amount of errors while calling the method.
  • Error rate: The relative amount of total errors compared to the total amount of calls.
  • Average latency: The average latency of the method call.
  • Median latency: The median latency of the method call.
  • 1st percentile latency: The 1st percentile latency of the method call.
  • 99th percentile latency: The 99th percentile latency of the method call.

Microservices

Every microservice page has the same structure. The microservices for which there are details are as follows: Collector, Renderers, Deliverers, Preprocessors, AI trainer, AI tester, and push servers:

microservice
Instance: Microservice dataforge_renderer

General

Three information points can be found at the top of the card:

  • Status: Indicates if the service is online or offline.
  • Last heartbeat: The date and time of the last registration of an action by the service.
  • Name: The name of the service.
microserviceGeneral
Instance: Microservice dataforge_renderer general

Overview

microserviceOverview
Instance: Microservice dataforge_renderer overview

Provides information about the latency and success rate of the services’ calls. The latency is divided into four subcategories:

  • 99th percentile: The 99th percentile indicates which of a series of values was higher, and therefore worse, than 99% of the other values.
  • First percentile: The first percentile indicates which of a series of values was smaller, and therefore better, than 99% of the other values.
  • Median: Indicates the latency median.
  • Average: Indicates the latency average.

By hovering over a latency bar, you can see specific values about that category:

microserviceOverviewLatency
Instance: Microservice dataforge_renderer latency details

Details

microserviceDetails
Instance: Microservice dataforge_renderer details

Information about the latency of the different methods can be read from a graph showing the different methods. You can hover over a bar to see details about the method’s latency and error rate:

microserviceDetailsHover
Instance: Microservice dataforge_renderer details metrics

More details per method are displayed in the list below the graph:

microserviceDetailsStats
Instance: Microservice dataforge_renderer stats per method

  • Method: The name of the method.
  • Total calls: The total amount the method was called.
  • Total errors: The total amount of errors while calling the method.
  • Error rate: The relative amount of total errors compared to the total amount of calls.
  • Average latency: The average latency of the method call.
  • Median latency: The median latency of the method call.
  • 1st percentile latency: The 1st percentile latency of the method call.
  • 99th percentile latency: The 99th percentile latency of the method call.