Skip to main content
Version: v2026.0.0

Monitoring and observability

ARender exposes metrics, health endpoints, and structured logs across all rendition services.

Actuator endpoints

All rendition services expose Spring Boot Actuator endpoints. The Document Service Broker exposes three endpoints by default:

management.endpoints.web.exposure.include=prometheus,metrics,health

The Document Converter, Document Renderer, and Document Text Handler expose an additional shutdown endpoint used by the broker's health check loop to restart failed services in standalone mode:

management.endpoints.web.exposure.include=prometheus,metrics,health,shutdown

These defaults and all metrics export settings (Prometheus, Elasticsearch, Datadog, CloudWatch) are documented in the Rendition properties — Shared metrics settings.

Available endpoints

EndpointPathPurpose
Health/actuator/healthReturns UP/DOWN status for the service
Metrics/actuator/metricsLists all available metric names
Metrics (specific)/actuator/metrics/{name}Returns values for a specific metric
Prometheus/actuator/prometheusPrometheus scrape endpoint (disabled by default, see below)
Shutdown/actuator/shutdownTriggers graceful shutdown (converter, renderer, text handler only)

Enabling Prometheus

The Prometheus endpoint is disabled by default. To enable it, set:

management.endpoint.prometheus.access=unrestricted

Once enabled, Prometheus can scrape each service at /actuator/prometheus.


Metrics configuration

ARender uses Micrometer for metrics collection. Each service tags its metrics with a host identifier and application name.

Per-service host tags

ServiceHost tag
Document Service Brokerarender-broker
Document Converterarender-taskconversion
Document Rendererarender-jni
Document Text Handlerarender-pdfbox

All services share the same application tag:

management.metrics.tags.application=arender

HTTP request distribution

Every service configures percentile histograms and SLA buckets for HTTP request metrics:

management.metrics.distribution.percentiles-histogram.http.server.requests=true
management.metrics.distribution.sla.http.server.requests=100ms, 400ms, 500ms, 2000ms
management.metrics.distribution.percentiles.http.server.requests=0.5, 0.9, 0.95, 0.99

Disabled default meters

By default, ARender disables several standard Micrometer meters to reduce noise. These are disabled across all services:

management.metrics.enable.tomcat=false
management.metrics.enable.http=false
management.metrics.enable.logback=false
management.metrics.enable.jvm=false
management.metrics.enable.process=false
management.metrics.enable.system=false
management.metrics.enable.application=false
management.metrics.enable.executor=false
management.metrics.enable.disk=false

To re-enable any of these, set the corresponding property to true. For example, to monitor JVM memory:

management.metrics.enable.jvm=true

ARender endpoint metrics

The broker and all services support fine-grained per-endpoint metrics. Each endpoint metric is disabled by default and can be enabled individually:

# Enable metrics for document loading
arender.endpoint.metrics.export.load.document.enabled=true

# Enable metrics for image rendering
arender.endpoint.metrics.export.image.enabled=true

# Enable metrics for text search
arender.endpoint.metrics.export.search.enabled=true

# Enable metrics for document conversion
arender.endpoint.metrics.export.convert.enabled=true

# Enable metrics for document comparison
arender.endpoint.metrics.export.compare.enabled=true

The full list of toggleable endpoints:

PropertyOperation
arender.endpoint.metrics.export.has.document.enabledDocument existence check
arender.endpoint.metrics.export.bookmarks.enabledBookmark extraction
arender.endpoint.metrics.export.document.layout.enabledDocument layout retrieval
arender.endpoint.metrics.export.load.document.content.enabledDocument content loading
arender.endpoint.metrics.export.get.file.chunk.enabledFile chunk retrieval
arender.endpoint.metrics.export.text.position.enabledText position extraction
arender.endpoint.metrics.export.document.annotation.enabledAnnotation retrieval
arender.endpoint.metrics.export.transformation.enabledTransformation orders
arender.endpoint.metrics.export.document.metadata.enabledMetadata retrieval
arender.endpoint.metrics.export.image.enabledPage image rendering
arender.endpoint.metrics.export.page.contents.enabledPage content retrieval
arender.endpoint.metrics.export.search.enabledText search
arender.endpoint.metrics.export.advanced.search.enabledAdvanced text search
arender.endpoint.metrics.export.load.document.enabledDocument loading
arender.endpoint.metrics.export.evict.enabledCache eviction
arender.endpoint.metrics.export.annotation.enabledAnnotation operations
arender.endpoint.metrics.export.compare.enabledDocument comparison
arender.endpoint.metrics.export.named.destinationNamed destination extraction
arender.endpoint.metrics.export.weather.enabledSystem weather/status
arender.endpoint.metrics.export.readiness.enabledReadiness check
arender.endpoint.metrics.export.signature.enabledSignature verification
arender.endpoint.metrics.export.printable.pdf.enabledPrintable PDF generation
arender.endpoint.metrics.export.convert.enabledFormat conversion
arender.endpoint.metrics.export.health.record.enabledHealth record retrieval
arender.endpoint.metrics.export.document.xfa.check.enabledXFA form detection

Metric tags

Control which tags are included in exported metrics:

# Tags to include (comma-separated)
arender.endpoint.metrics.export.whitelist.tags=host,mimeType

# Include correlation ID as a tag (adds cardinality, use with caution)
arender.endpoint.metrics.export.correlation.id.tag.enabled=false

# Tags to exclude from system meters
arender.system.metrics.export.blacklist.tags=

Metric collection mode

Choose between timing requests (with duration percentiles) or counting them:

# TIMER records duration; COUNTER records invocation count only
arender.metric.meter.tool=COUNTER

External monitoring systems

ARender supports exporting metrics to four external systems. Each is disabled by default.

Prometheus

Enable the Prometheus scrape endpoint as described above, then configure your Prometheus server to scrape each service:

prometheus.yml
scrape_configs:
- job_name: 'arender-broker'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['broker-host:8761']
- job_name: 'arender-converter'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['converter-host:19999']
- job_name: 'arender-renderer'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['renderer-host:9091']
- job_name: 'arender-text-handler'
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['text-handler-host:8899']

Elasticsearch

management.elastic.metrics.export.enabled=true
management.elastic.metrics.export.step=5m
management.elastic.metrics.export.index=arender-micrometer-metrics
management.elastic.metrics.export.host=http://localhost:9200

Datadog

management.datadog.metrics.export.enabled=true
management.datadog.metrics.export.api-key=YOUR_KEY
management.datadog.metrics.export.step=5m
management.datadog.metrics.export.uri=https://app.datadoghq.com/

CloudWatch

management.cloudwatch.metrics.export.enabled=true
management.cloudwatch.metrics.export.namespace=brokerNameSpace
management.cloudwatch.metrics.export.step=5m
management.cloudwatch.metrics.export.batchSize=20
management.cloudwatch.metrics.export.region=eu-west-1

Each service uses its own namespace value. The defaults are brokerNameSpace, converterNameSpace, rendererNameSpace, and pdfboxNameSpace.


Health checks and probes

Kubernetes liveness and readiness probes

The Helm chart configures HTTP probes for all services:

ServiceLiveness pathReadiness pathLiveness delayReadiness delayPeriod
Document Service Broker/swagger-ui/index.html/health/readiness30s60s15s
Document Converter/actuator/health/health/readiness30s60s15s
Document Renderer/actuator/health/health/readiness30s60s15s
Document Text Handler/actuator/health/health/readiness30s60s15s

These values are configurable per service in values.yaml:

values.yaml
rendition:
broker:
deployment:
livenessProbe:
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 3
readinessProbe:
initialDelaySeconds: 60
periodSeconds: 15
timeoutSeconds: 1

Custom liveness and readiness paths can be set:

values.yaml
rendition:
broker:
deployment:
livenessProbe:
path: "/actuator/health"
readinessProbe:
path: "/health/readiness"

Internal health monitoring

The broker runs an internal health check for each registered microservice. This is separate from Kubernetes probes. The broker pings each service's health endpoint and tracks their status internally:

If a service is detected as DOWN and health.check.restart.enabled=true (standalone mode), the broker sends a POST to /actuator/shutdown and restarts the process.


Log configuration

ARender uses Logback for logging. There are two logging configurations: one for Docker images (jib-packaged) and one for local Spring Boot development.

Docker/Kubernetes log pattern

In the Helm chart, the logging ConfigMap generates a Logback configuration that routes log levels to stdout/stderr and optionally to rolling files:

values.yaml
rendition:
logging:
default:
consoleOnly: false
logLevels:
business: info # com.arondor.arender classes
technical: warn # Spring, Tomcat, Jetty classes
display:
date: true
podName: true

Set consoleOnly: true to disable file appenders entirely and output everything to the console, which is the recommended approach for container environments where a log aggregator collects stdout/stderr.

Log files

When file logging is enabled, each service writes to /arender/logs/:

ServiceLog fileAdditional log files
Document Service Brokerarender-broker.logarender-perf.log, arender-health.log
Document Converterarender-converter.log
Document Rendererarender-renderer.log
Document Text Handlerarender-handler.log

All log files use a FixedWindowRollingPolicy with a max size of 2 MB per file (Helm) or 50 MB (Docker image default), compressed with ZIP, up to 50 archived files.

Performance log

The broker writes a dedicated performance log via the LoggerInterceptor class. This log tracks per-request timing:

logger name: com.arondor.viewer.common.logger.LoggerInterceptor

Health log

The broker's health check activity is logged separately:

logger name: com.arondor.arender.micro.services.rendition.jobs.MicroServiceHealthCheckJob

Persistent log storage (Kubernetes)

To persist logs across pod restarts, enable log persistence in the Helm chart:

values.yaml
rendition:
logging:
persistance:
enabled: true
storage:
size: 1Gi
accessModes: "ReadWriteMany"

Changing log levels at runtime

To change a service's log level without restarting, mount a custom logback.xml via the Helm chart:

values.yaml
rendition:
broker:
logging:
useDefault: false
custom: |
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder>
<pattern>%date %level [%thread] %logger{10} [%file:%line] %msg%n</pattern>
</encoder>
</appender>
<logger name="com.arondor.arender" level="DEBUG" />
<root level="info">
<appender-ref ref="STDOUT" />
</root>
</configuration>

Key metrics to monitor

These metrics are the most useful for tracking ARender health and performance in production.

System-level

What to monitorHow
Service availabilityKubernetes liveness/readiness probes, or broker health check status
Pod restartskube_pod_container_status_restarts_total (from kube-state-metrics)
CPU and memory usageStandard container metrics from cAdvisor or node-exporter
Shared volume usageDisk usage on the /arender/tmp PVC

Application-level

What to monitorMetric / indicator
HTTP request latency (p50, p95, p99)http.server.requests percentile histogram
HTTP request error ratehttp.server.requests filtered by status 5xx
Conversion durationEnable arender.endpoint.metrics.export.convert.enabled=true
Rendering durationEnable arender.endpoint.metrics.export.image.enabled=true
Search durationEnable arender.endpoint.metrics.export.search.enabled=true
Document load countEnable arender.endpoint.metrics.export.load.document.enabled=true

Common operational issues

Microservice not discovered by broker

Symptoms: Viewer shows an error when loading a document. Broker logs: "Found 0 instance of document-converter".

Causes:

  • The microservice container is not running or has not started yet.
  • In Docker Compose: the DSB_KUBEPROVIDER_KUBE.HOSTS_* environment variables do not match the service hostname.
  • In Kubernetes: the service DNS name in the broker ConfigMap does not resolve. Check that the target service's Kubernetes Service object exists in the correct namespace.

Diagnosis:

# Check broker logs for discovery messages
kubectl logs deployment/arender-rendition-broker | grep "Found 0 instance"

# Verify DNS resolution from the broker pod
kubectl exec deployment/arender-rendition-broker -- nslookup arender-rendition-converter

Shared volume file-not-found errors

Symptoms: Conversion succeeds but rendering fails. Broker logs: "FileNotFoundException" referencing /arender/tmp/....

Causes:

  • The shared PVC is not mounted in all pods.
  • Different PVCs are used across services (check claimName consistency).
  • The storage class does not actually support ReadWriteMany.

Diagnosis:

# Verify all pods mount the same PVC
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.volumes[*].persistentVolumeClaim.claimName}{"\n"}{end}'

High conversion latency

Symptoms: Office or email documents take a long time to render the first page.

Causes:

  • LibreOffice is slow to start its first conversion. Subsequent conversions are faster.
  • The converter pod is CPU-constrained. Check resource limits.
  • Large documents or spreadsheets with many cells exceed practical limits. The excel.maximum.cell.count property (default 1,000,000) caps spreadsheet size.

Diagnosis:

  • Enable converter endpoint metrics: arender.endpoint.metrics.export.convert.enabled=true
  • Check converter logs for timeout messages.
  • Review pod CPU usage during conversion.

PDFOwl process pool exhaustion

Symptoms: Rendering requests time out. Renderer logs: watchdog timeout messages.

Causes:

  • All PDFOwl processes are busy with long-running renders.
  • A document with very high-resolution or many layers exhausts the memory limit.

Remediation:

  • Increase the memory limit: pdfowl.memlimit.mb=2048
  • Increase the watchdog timeout: pdfowl.client.watchdog=20000
  • Scale renderer replicas.