Skip to main content

Directors: Troubleshooting

This guide provides solutions to common Director deployment and operational issues. Issues are organized by category with step-by-step resolution procedures.

Installation Issues

Director Service Fails to Start

Symptoms:

  • Service does not start after installation
  • Error messages during service startup
  • Director not appearing in management console

Resolution Steps:

  1. Check System Requirements

    • Verify supported operating system version
    • Ensure minimum hardware requirements are met
    • Confirm administrator/root privileges for installation
  2. Permission Validation

    • Ensure installation user has sufficient privileges
    • Check file and directory permissions in installation path
    • Verify service account permissions if using dedicated account
  3. Port Availability

    • Confirm required ports are not in use by other services
    • Check local firewall settings
    • Identify port conflicts using network tools
  4. Review Service Logs

    • Check installation script output for error messages
    • Review system event logs for service startup failures
    • Examine Director logs for specific error details

Diagnostic Commands:

# Check service status
Get-Service vmetric-director

# Check port usage
Get-NetTCPConnection -LocalPort 443 -ErrorAction SilentlyContinue
netstat -an | Select-String ":443"

# View service events
Get-EventLog -LogName System -Source "Service Control Manager" -Newest 20 |
Where-Object { $_.Message -like "*vmetric*" }

VMMQ Connection Not Initialized

Symptoms:

  • Log message: "vmmq connection is not initialized"
  • Director starts but cannot process data
  • Internal communication failures

Resolution Steps:

  1. Verify NATS JetStream Configuration

    • Check that NATS server is running and accessible
    • Verify JetStream is enabled in the configuration
    • Confirm storage paths exist and have correct permissions
  2. Check Storage Directory

    • Verify <vm_root>/Director/storage/nats/ directory exists
    • Ensure adequate disk space for JetStream storage
    • Check directory permissions
  3. Review Configuration File

    • Validate YAML syntax in vmetric.yml
    • Confirm VMMQ settings are correctly specified
    • Check for configuration file corruption

Connection Issues

Director Cannot Connect to Cloud Platform

Symptoms:

  • Director status shows "Not Connected" in the web dashboard
  • Connection verification fails during setup
  • Timeout errors in Director logs

Resolution Steps:

  1. Identify Your Regional Endpoint

    Your Director connects to the same regional cloud platform where you created it. To identify your region, check the URL in your browser when logged into the VirtualMetric dashboard:

    RegionEndpoint
    Europehttps://app.eu-west.cloud.virtualmetric.com
    UShttps://app.us-east.cloud.virtualmetric.com
    Australiahttps://app.aus-east.cloud.virtualmetric.com
    info

    If you need to switch regions, log out and select a different region.

  2. Validate Network Connectivity

    • Ensure port 443 is open for outbound HTTPS connections from the Director host
    • Verify DNS resolution is working properly
    • Test connectivity to your regional endpoint using the diagnostic commands below
  3. Check Firewall Configuration

    • Allow outbound connections to *.cloud.virtualmetric.com domains
    • Ensure no SSL/TLS inspection is blocking certificate validation
    • Verify proxy configurations if applicable
  4. Validate API Key

    • Confirm the installation script was copied and executed correctly
    • Check for extra spaces or line breaks if you manually edited the script
    • Regenerate the API key from the Director's Connection tab in the dashboard if corruption is suspected
  5. Review System Time

    • Ensure system clock is synchronized
    • Use NTP to maintain accurate time synchronization
    warning

    TLS certificate validation requires accurate system time. A clock skew of more than a few minutes can cause silent connection failures.

Diagnostic Commands:

Use your regional endpoint (identified in step 1) in the following commands:

# Set your regional endpoint
$region = "eu-west" # Change to: us-east or aus-east

# Test DNS resolution
Resolve-DnsName "app.$region.cloud.virtualmetric.com"

# Check outbound connectivity
Test-NetConnection -ComputerName "app.$region.cloud.virtualmetric.com" -Port 443

# Verify TLS connectivity
Invoke-WebRequest -Uri "https://app.$region.cloud.virtualmetric.com" -Method Head -UseBasicParsing

Connection Drops Intermittently

Symptoms:

  • Director alternates between Connected and Not Connected states
  • Periodic timeout errors in logs
  • Data processing interruptions

Resolution Steps:

  1. Network Stability Check

    • Monitor network latency and packet loss to your regional endpoint
    • Verify network equipment stability
    • Check for bandwidth limitations or throttling
  2. Resource Monitoring

    • Ensure adequate CPU and memory resources
    • Monitor disk I/O performance
    • Check for resource contention with other services
  3. Proxy Configuration

    • Verify proxy server stability and configuration
    • Check proxy authentication credentials
    • Consider bypassing proxy for VirtualMetric endpoints

Data Processing Issues

Director Not Receiving Data

Symptoms:

  • No data appearing in target destinations
  • Zero throughput metrics in monitoring
  • Source systems showing successful transmission

Resolution Steps:

  1. Source Configuration Verification

    • Confirm correct Director IP address in source system configuration
    • Verify port numbers match between source and Director
    • Check protocol settings (TCP/UDP for syslog, HTTP/HTTPS for APIs)
  2. Network Connectivity Testing

    • Test connection from source system to Director
    • Verify routing and firewall rules allow traffic
    • Use packet capture tools to confirm data transmission
  3. Director Input Configuration

    • Review device configuration in Director settings
    • Verify enabled protocols and listening ports
    • Check for configuration syntax errors
  4. Log Analysis

    • Examine Director logs for input processing errors
    • Check for data parsing or validation failures
    • Review error messages for specific issues

Diagnostic Commands:

# Test syslog connectivity (UDP)
$udpClient = New-Object System.Net.Sockets.UdpClient
$bytes = [Text.Encoding]::ASCII.GetBytes("<14>Test message from PowerShell")
$udpClient.Send($bytes, $bytes.Length, "<director_ip>", 514)
$udpClient.Close()

# Check listening ports
Get-NetTCPConnection -LocalPort 514 -ErrorAction SilentlyContinue
Get-NetUDPEndpoint -LocalPort 514 -ErrorAction SilentlyContinue

Router Queue Errors

Symptoms:

  • Log message: "There is no available log file for [logtype] in the router queue"
  • Data processing appears stalled
  • Empty storage/logs directory

Resolution Steps:

  1. Verify Storage Configuration

    • Check that <vm_root>/Director/storage/ directory exists
    • Ensure subdirectories (data, logs, queue, sender) are present
    • Verify write permissions on storage directories
  2. Check Disk Space

    • Ensure adequate free disk space for queue operations
    • Monitor disk usage during high-volume periods
    • Configure log rotation if disk space is limited
    warning

    If disk space is exhausted, queued data may be lost. Monitor storage usage proactively in high-volume environments.

  3. Review Route Configuration

    • Verify routes are properly configured and enabled
    • Check that devices and targets are correctly linked
    • Validate pipeline configuration syntax

Data Processing Errors

Symptoms:

  • Partial data processing with error messages
  • Data transformation failures
  • Inconsistent output formatting

Resolution Steps:

  1. Pipeline Configuration Review

    • Validate YAML syntax in processing pipelines
    • Check field mappings and transformation rules
    • Verify regular expressions and parsing patterns
  2. Data Format Validation

    • Examine sample input data for format consistency
    • Check for unexpected characters or encoding issues
    • Verify timestamp formats match expected patterns
  3. Resource Monitoring

    • Monitor CPU and memory usage during processing
    • Check for disk space availability
    • Ensure adequate processing capacity for data volume

Performance Issues

High Resource Usage

Symptoms:

  • Excessive CPU or memory consumption
  • System performance degradation
  • Out of memory errors

Resolution Steps:

  1. Configuration Tuning

    • Adjust buffer sizes and queue lengths
    • Optimize processing batch sizes
    • Configure appropriate timeout values
  2. Pipeline Optimization

    • Identify resource-intensive transformation operations
    • Optimize data parsing and enrichment logic
    • Consider sampling for high-volume data streams
  3. System Monitoring

    • Implement comprehensive resource monitoring
    • Set up alerts for resource threshold breaches
    • Plan capacity upgrades based on usage patterns

Slow Data Processing

Symptoms:

  • High latency between data ingestion and output
  • Growing backlog of unprocessed data
  • Timeout errors in processing pipeline

Resolution Steps:

  1. Resource Optimization

    • Increase CPU and memory allocation to Director
    • Monitor resource utilization patterns
    • Consider vertical scaling for improved performance
  2. Pipeline Efficiency

    • Review processing pipeline for optimization opportunities
    • Simplify complex transformation rules where possible
    • Optimize regular expressions and parsing logic
  3. Output Destination Performance

    • Check target system capacity and response times
    • Verify network connectivity to destination systems
    • Consider batch processing for improved throughput

Log Analysis

Director Log Locations

Windows:

<vm_root>\Director\storage\logs\

Linux:

<vm_root>/Director/storage/logs/

Common Log Messages

MessageMeaningAction
vmmq connection is not initializedNATS JetStream not availableCheck NATS configuration and storage
Can not update device stateDevice status update failedVerify VMMQ connectivity
Health check failedDevice collector unhealthyService will auto-restart collector
Can not fetch messagesMessage queue read errorCheck NATS server health
Failed to start VMMQ serverInternal messaging startup failedReview configuration and permissions
note

The Director service includes automatic recovery for transient failures. Messages like "Health check failed" are often followed by automatic collector restart without manual intervention.

Log Analysis Commands

# Monitor real-time logs
Get-Content "<vm_root>\Director\storage\logs\director.log" -Wait -Tail 50

# Search for errors
Select-String -Path "<vm_root>\Director\storage\logs\*.log" -Pattern "error|failed" -AllMatches

Support and Escalation

When standard troubleshooting procedures don't resolve issues:

  1. Gather Diagnostic Information

    • Collect relevant log files and error messages
    • Document system configuration and environment details
    • Note specific symptoms and reproduction steps
  2. Contact Support

    • Include Director version (vmetric-director -version)
    • Provide timeline of when issues first occurred
    tip

    Submit support tickets at support.virtualmetric.com. Include log excerpts and configuration snippets to expedite resolution.

  3. Emergency Escalation

    • For critical production issues, use emergency contact procedures
    • Follow your organization's incident management processes