Skip to main content
Version: 1.2.0

Deployment: On Azure VM

This guide covers deploying DataStream on Azure Virtual Machines, offering a balance of performance, control, and cloud scalability. This deployment model is ideal for production environments that require customized configurations while leveraging Azure's infrastructure capabilities.

Considerations

Azure VMs provide several advantages for DataStream deployments:

  • Customizable resources: Scale CPU, memory, and storage based on workload
  • Network performance: Premium network options for high-throughput scenarios
  • OS flexibility: Support for various Linux and Windows operating systems
  • Integration: Native connectivity with other Azure services
  • Managed infrastructure: Azure handles the physical hardware maintenance

VM Size Recommendations

Select an Azure VM size based on your expected workload:

WorkloadVM SizeSpecificationsNotes
SmallStandard_D2s_v32 vCPU, 8GB RAMDevelopment or light production
MediumStandard_D4s_v34 vCPU, 16GB RAMStandard production workloads
LargeStandard_D8s_v38 vCPU, 32GB RAMHigh-volume data collection
High-PerformanceStandard_E8s_v38 vCPU, 64GB RAMMemory-intensive processing

For disk-intensive operations, consider Premium SSD storage options.

Deployment Steps

1. Create the Azure VM

  1. Navigate to the Azure Portal and create a new Virtual Machine

    • Select your preferred OS (Ubuntu 20.04 LTS recommended for Linux, Windows Server 2019 for Windows)
    • Choose the appropriate VM size based on your workload
    • Configure networking to allow required ports (e.g., 514 for syslog)
    • Set authentication method (SSH key recommended for Linux, password for Windows)
  2. Configure disk settings:

    • OS disk: Premium SSD recommended for production
    • Data disk: Add a separate managed disk for DataStream data
    • Host caching: Consider enabling for read-intensive workloads
  3. Configure networking:

    • Create or select a Virtual Network
    • Configure Network Security Group (NSG) rules:
      • Allow SSH/RDP for administration
      • Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
    • Consider using a private endpoint for secure connectivity
    • Create or select a Virtual Network
    • Configure Network Security Group (NSG) rules:
      • Allow SSH/RDP for administration
      • Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
    • Consider using a private endpoint for secure connectivity
    • Create or select a Virtual Network
    • Configure Network Security Group (NSG) rules:
      • Allow SSH/RDP for administration
      • Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
      • Consider using a private endpoint for secure connectivity
      • Create or select a Virtual Network
    • Configure Network Security Group (NSG) rules:
      • Allow SSH/RDP for administration
      • Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
      • Consider using a private endpoint for secure connectivity
    • Create or select a Virtual Network
    • Configure Network Security Group (NSG) rules:
      • Allow SSH/RDP for administration
      • Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
    • Consider using a private endpoint for secure connectivity

2. Prepare the VM

  1. Connect to the VM using RDP (Windows) or SSH (Linux)

  2. Update the OS -

    Install Windows updates

    Install-WindowsUpdate -AcceptAll -AutoReboot
  3. Format and mount the data disk -

    Initialize and format the disk using Disk Management. Use the following commands to automate this process.

    • Get the disk number of the raw disk:

      $disk = Get-Disk | Where-Object PartitionStyle -eq 'RAW'
    • Initialize the disk:

      Initialize-Disk -Number $disk.Number -PartitionStyle GPT.
    • Create a new partition using the maximum size:

      $partition = New-Partition -DiskNumber $disk.Number -UseMaximumSize -AssignDriveLetter
    • Format the volume:

      Format-Volume -Partition $partition -FileSystem NTFS -NewFileSystemLabel "DataStream" -Confirm:$false
  4. Install dependencies -

    • Install .NET Runtime

      Invoke-WebRequest -URI "https://dotnet.microsoft.com/download/dotnet/scripts/v1/dotnet-install.ps1" -OutFile "dotnet-install.ps1"
      ./dotnet-install.ps1 -Runtime dotnet -Version 6.0.0 -InstallDir "C:\Program Files\dotnet"
    • Add dotnet to PATH if not done so

      $env:Path += ";C:\Program Files\dotnet"
      [Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)

3. Install DataStream

  1. Download the DataStream installer -

    Invoke-WebRequest -Uri "https://download.datastream.example.com/latest/datastream-installer.exe" -OutFile "datastream-installer.exe"
  2. Run the installer -

    Assuming E: is the data disk drive letter...

    .\datastream-installer.exe --data-dir "E:\DataStream"

4. Configure DataStream

  1. Create basic configuration -

    • Create configuration directory if it doesn't exist:

      New-Item -ItemType Directory -Force -Path "E:\DataStream\config"
    • Create or edit the configuration file:

      notepad E:\DataStream\config\config.yaml
  2. Add sample devices configuration -

    devices:
    - id: 1
    name: azure_syslog
    type: syslog
    properties:
    port: 514
  3. Configure Azure-specific targets -

    targets:
    - name: azure_storage
    type: azure_blob
    connection_string: "${AZURE_STORAGE_CONNECTION_STRING}"
    container: "datastream-logs"
    path_format: "{date}/{hour}/{source}"
  4. Set up environment variables (if needed) -

    Set system environment variable:

    [Environment]::SetEnvironmentVariable("AZURE_STORAGE_CONNECTION_STRING", "DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=yourkey;EndpointSuffix=core.windows.net", [System.EnvironmentVariableTarget]::Machine)

5. Start and Test DataStream

  1. Start the service -

    • Set service to automatic start:

      Set-Service -Name DataStream -StartupType Automatic
    • Start the service:

      Start-Service DataStream
  2. Check status -

    Get-Service DataStream
  3. Test the deployment -

    Using PowerShell to send a UDP message to port 514...

    $Message = [Text.Encoding]::ASCII.GetBytes("Test message for Azure VM deployment")
    $UdpClient = New-Object System.Net.Sockets.UdpClient
    $UdpClient.Connect("127.0.0.1", 514)
    $UdpClient.Send($Message, $Message.Length)

    ...check logs:

    Get-Content -Path "E:\DataStream\logs\service.log" -Tail 10

Azure VM Scaling Options

For scaling DataStream on Azure VMs, consider:

  1. Vertical scaling: Resize the VM to a larger size for increased capacity. Can be done with minimal downtime but requires service restart.

  2. Horizontal scaling: Deploy multiple DataStream instances with a load balancer. Requires additional configuration for coordination. Consider using Azure Load Balancer for distributing incoming traffic.

  3. Azure VM Scale Sets: For automated scaling based on metrics. Configure appropriate auto-scaling rules based on CPU/memory usage

Monitoring and Management

Enhance your Azure VM deployment with:

  1. Azure Monitor: Enable VM insights for resource utilization metrics

  2. Azure Log Analytics: Forward DataStream logs for centralized analysis

    targets:
    - name: log_analytics
    type: azure_monitor
    workspace_id: "${WORKSPACE_ID}"
    workspace_key: "${WORKSPACE_KEY}"
  3. Azure Security Center: Enable for security recommendations and monitoring

  4. Backup: Configure Azure Backup to protect configuration and data

Optimizing Costs

For Azure VM deployments, control costs with:

  1. Right-sizing: Select the appropriate VM size for your workload
  2. Reserved Instances: Purchase reservations for long-term deployments
  3. Auto-shutdown: For non-production environments, schedule VM shutdowns
  4. Disk optimization: Use the appropriate storage tier for your performance needs
  5. Spot instances: For fault-tolerant workloads, consider spot VMs at lower cost

For detailed cost estimates, use the Azure Pricing Calculator.