Directors: Troubleshooting
This guide provides solutions to common Director deployment and operational issues. Issues are organized by category with step-by-step resolution procedures.
Installation Issues
Director Service Fails to Start
Symptoms:
- Service does not start after installation
- Error messages during service startup
- Director not appearing in management console
Resolution Steps:
-
Check System Requirements
- Verify supported operating system version
- Ensure minimum hardware requirements are met
- Confirm administrator/root privileges for installation
-
Permission Validation
- Ensure installation user has sufficient privileges
- Check file and directory permissions in installation path
- Verify service account permissions if using dedicated account
-
Port Availability
- Confirm required ports are not in use by other services
- Check local firewall settings
- Identify port conflicts using network tools
-
Review Service Logs
- Check installation script output for error messages
- Review system event logs for service startup failures
- Examine Director logs for specific error details
Diagnostic Commands:
- PowerShell
- Bash
# Check service status
Get-Service vmetric-director
# Check port usage
Get-NetTCPConnection -LocalPort 443 -ErrorAction SilentlyContinue
netstat -an | Select-String ":443"
# View service events
Get-EventLog -LogName System -Source "Service Control Manager" -Newest 20 |
Where-Object { $_.Message -like "*vmetric*" }
# Check service status
systemctl status vmetric-director
# Check port usage
netstat -tulpn | grep :443
ss -tulpn | grep :443
# View service logs
journalctl -u vmetric-director -n 50
VMMQ Connection Not Initialized
Symptoms:
- Log message: "vmmq connection is not initialized"
- Director starts but cannot process data
- Internal communication failures
Resolution Steps:
-
Verify NATS JetStream Configuration
- Check that NATS server is running and accessible
- Verify JetStream is enabled in the configuration
- Confirm storage paths exist and have correct permissions
-
Check Storage Directory
- Verify
<vm_root>/Director/storage/nats/directory exists - Ensure adequate disk space for JetStream storage
- Check directory permissions
- Verify
-
Review Configuration File
- Validate YAML syntax in
vmetric.yml - Confirm VMMQ settings are correctly specified
- Check for configuration file corruption
- Validate YAML syntax in
Connection Issues
Director Cannot Connect to Cloud Platform
Symptoms:
- Director status shows "Not Connected" in the web dashboard
- Connection verification fails during setup
- Timeout errors in Director logs
Resolution Steps:
-
Identify Your Regional Endpoint
Your Director connects to the same regional cloud platform where you created it. To identify your region, check the URL in your browser when logged into the VirtualMetric dashboard:
Region Endpoint Europe https://app.eu-west.cloud.virtualmetric.comUS https://app.us-east.cloud.virtualmetric.comAustralia https://app.aus-east.cloud.virtualmetric.cominfoIf you need to switch regions, log out and select a different region.
-
Validate Network Connectivity
- Ensure port
443is open for outbound HTTPS connections from the Director host - Verify DNS resolution is working properly
- Test connectivity to your regional endpoint using the diagnostic commands below
- Ensure port
-
Check Firewall Configuration
- Allow outbound connections to
*.cloud.virtualmetric.comdomains - Ensure no SSL/TLS inspection is blocking certificate validation
- Verify proxy configurations if applicable
- Allow outbound connections to
-
Validate API Key
- Confirm the installation script was copied and executed correctly
- Check for extra spaces or line breaks if you manually edited the script
- Regenerate the API key from the Director's Connection tab in the dashboard if corruption is suspected
-
Review System Time
- Ensure system clock is synchronized
- Use NTP to maintain accurate time synchronization
warningTLS certificate validation requires accurate system time. A clock skew of more than a few minutes can cause silent connection failures.
Diagnostic Commands:
Use your regional endpoint (identified in step 1) in the following commands:
- PowerShell
- Bash
# Set your regional endpoint
$region = "eu-west" # Change to: us-east or aus-east
# Test DNS resolution
Resolve-DnsName "app.$region.cloud.virtualmetric.com"
# Check outbound connectivity
Test-NetConnection -ComputerName "app.$region.cloud.virtualmetric.com" -Port 443
# Verify TLS connectivity
Invoke-WebRequest -Uri "https://app.$region.cloud.virtualmetric.com" -Method Head -UseBasicParsing
# Set your regional endpoint
REGION="eu-west" # Change to: us-east or aus-east
# Test DNS resolution
nslookup "app.$REGION.cloud.virtualmetric.com"
# Check outbound connectivity
nc -zv "app.$REGION.cloud.virtualmetric.com" 443
# Verify TLS connectivity
curl -I "https://app.$REGION.cloud.virtualmetric.com"
Connection Drops Intermittently
Symptoms:
- Director alternates between Connected and Not Connected states
- Periodic timeout errors in logs
- Data processing interruptions
Resolution Steps:
-
Network Stability Check
- Monitor network latency and packet loss to your regional endpoint
- Verify network equipment stability
- Check for bandwidth limitations or throttling
-
Resource Monitoring
- Ensure adequate CPU and memory resources
- Monitor disk I/O performance
- Check for resource contention with other services
-
Proxy Configuration
- Verify proxy server stability and configuration
- Check proxy authentication credentials
- Consider bypassing proxy for VirtualMetric endpoints
Data Processing Issues
Director Not Receiving Data
Symptoms:
- No data appearing in target destinations
- Zero throughput metrics in monitoring
- Source systems showing successful transmission
Resolution Steps:
-
Source Configuration Verification
- Confirm correct Director IP address in source system configuration
- Verify port numbers match between source and Director
- Check protocol settings (TCP/UDP for syslog, HTTP/HTTPS for APIs)
-
Network Connectivity Testing
- Test connection from source system to Director
- Verify routing and firewall rules allow traffic
- Use packet capture tools to confirm data transmission
-
Director Input Configuration
- Review device configuration in Director settings
- Verify enabled protocols and listening ports
- Check for configuration syntax errors
-
Log Analysis
- Examine Director logs for input processing errors
- Check for data parsing or validation failures
- Review error messages for specific issues
Diagnostic Commands:
- PowerShell
- Bash
# Test syslog connectivity (UDP)
$udpClient = New-Object System.Net.Sockets.UdpClient
$bytes = [Text.Encoding]::ASCII.GetBytes("<14>Test message from PowerShell")
$udpClient.Send($bytes, $bytes.Length, "<director_ip>", 514)
$udpClient.Close()
# Check listening ports
Get-NetTCPConnection -LocalPort 514 -ErrorAction SilentlyContinue
Get-NetUDPEndpoint -LocalPort 514 -ErrorAction SilentlyContinue
# Test syslog connectivity
logger -n <director_ip> -P 514 "Test message"
# Check listening ports
netstat -tulpn | grep <director_ip>
# Monitor network traffic
tcpdump -i any port 514
Router Queue Errors
Symptoms:
- Log message: "There is no available log file for [logtype] in the router queue"
- Data processing appears stalled
- Empty storage/logs directory
Resolution Steps:
-
Verify Storage Configuration
- Check that
<vm_root>/Director/storage/directory exists - Ensure subdirectories (data, logs, queue, sender) are present
- Verify write permissions on storage directories
- Check that
-
Check Disk Space
- Ensure adequate free disk space for queue operations
- Monitor disk usage during high-volume periods
- Configure log rotation if disk space is limited
warningIf disk space is exhausted, queued data may be lost. Monitor storage usage proactively in high-volume environments.
-
Review Route Configuration
- Verify routes are properly configured and enabled
- Check that devices and targets are correctly linked
- Validate pipeline configuration syntax
Data Processing Errors
Symptoms:
- Partial data processing with error messages
- Data transformation failures
- Inconsistent output formatting
Resolution Steps:
-
Pipeline Configuration Review
- Validate YAML syntax in processing pipelines
- Check field mappings and transformation rules
- Verify regular expressions and parsing patterns
-
Data Format Validation
- Examine sample input data for format consistency
- Check for unexpected characters or encoding issues
- Verify timestamp formats match expected patterns
-
Resource Monitoring
- Monitor CPU and memory usage during processing
- Check for disk space availability
- Ensure adequate processing capacity for data volume
Performance Issues
High Resource Usage
Symptoms:
- Excessive CPU or memory consumption
- System performance degradation
- Out of memory errors
Resolution Steps:
-
Configuration Tuning
- Adjust buffer sizes and queue lengths
- Optimize processing batch sizes
- Configure appropriate timeout values
-
Pipeline Optimization
- Identify resource-intensive transformation operations
- Optimize data parsing and enrichment logic
- Consider sampling for high-volume data streams
-
System Monitoring
- Implement comprehensive resource monitoring
- Set up alerts for resource threshold breaches
- Plan capacity upgrades based on usage patterns
Slow Data Processing
Symptoms:
- High latency between data ingestion and output
- Growing backlog of unprocessed data
- Timeout errors in processing pipeline
Resolution Steps:
-
Resource Optimization
- Increase CPU and memory allocation to Director
- Monitor resource utilization patterns
- Consider vertical scaling for improved performance
-
Pipeline Efficiency
- Review processing pipeline for optimization opportunities
- Simplify complex transformation rules where possible
- Optimize regular expressions and parsing logic
-
Output Destination Performance
- Check target system capacity and response times
- Verify network connectivity to destination systems
- Consider batch processing for improved throughput
Log Analysis
Director Log Locations
Windows:
<vm_root>\Director\storage\logs\
Linux:
<vm_root>/Director/storage/logs/
Common Log Messages
| Message | Meaning | Action |
|---|---|---|
vmmq connection is not initialized | NATS JetStream not available | Check NATS configuration and storage |
Can not update device state | Device status update failed | Verify VMMQ connectivity |
Health check failed | Device collector unhealthy | Service will auto-restart collector |
Can not fetch messages | Message queue read error | Check NATS server health |
Failed to start VMMQ server | Internal messaging startup failed | Review configuration and permissions |
The Director service includes automatic recovery for transient failures. Messages like "Health check failed" are often followed by automatic collector restart without manual intervention.
Log Analysis Commands
- PowerShell
- Bash
# Monitor real-time logs
Get-Content "<vm_root>\Director\storage\logs\director.log" -Wait -Tail 50
# Search for errors
Select-String -Path "<vm_root>\Director\storage\logs\*.log" -Pattern "error|failed" -AllMatches
# Monitor real-time logs
tail -f <vm_root>/Director/storage/logs/director.log
# Search for errors
grep -i "error\|failed" <vm_root>/Director/storage/logs/*.log
Support and Escalation
When standard troubleshooting procedures don't resolve issues:
-
Gather Diagnostic Information
- Collect relevant log files and error messages
- Document system configuration and environment details
- Note specific symptoms and reproduction steps
-
Contact Support
- Include Director version (
vmetric-director -version) - Provide timeline of when issues first occurred
tipSubmit support tickets at support.virtualmetric.com. Include log excerpts and configuration snippets to expedite resolution.
- Include Director version (
-
Emergency Escalation
- For critical production issues, use emergency contact procedures
- Follow your organization's incident management processes