Deployment: On Azure VM
This guide covers deploying DataStream on Azure Virtual Machines, offering a balance of performance, control, and cloud scalability. This deployment model is ideal for production environments that require customized configurations while leveraging Azure's infrastructure capabilities.
Considerations
Azure VMs provide several advantages for DataStream deployments:
- Customizable resources: Scale CPU, memory, and storage based on workload
- Network performance: Premium network options for high-throughput scenarios
- OS flexibility: Support for various Linux and Windows operating systems
- Integration: Native connectivity with other Azure services
- Managed infrastructure: Azure handles the physical hardware maintenance
VM Size Recommendations
Select an Azure VM size based on your expected workload:
Workload | VM Size | Specifications | Notes |
---|---|---|---|
Small | Standard_D2s_v3 | 2 vCPU, 8GB RAM | Development or light production |
Medium | Standard_D4s_v3 | 4 vCPU, 16GB RAM | Standard production workloads |
Large | Standard_D8s_v3 | 8 vCPU, 32GB RAM | High-volume data collection |
High-Performance | Standard_E8s_v3 | 8 vCPU, 64GB RAM | Memory-intensive processing |
For disk-intensive operations, consider Premium SSD storage options.
Deployment Steps
1. Create the Azure VM
-
Navigate to the Azure Portal and create a new Virtual Machine
- Select your preferred OS (Ubuntu 20.04 LTS recommended for Linux, Windows Server 2019 for Windows)
- Choose the appropriate VM size based on your workload
- Configure networking to allow required ports (e.g., 514 for syslog)
- Set authentication method (SSH key recommended for Linux, password for Windows)
-
Configure disk settings:
- OS disk: Premium SSD recommended for production
- Data disk: Add a separate managed disk for DataStream data
- Host caching: Consider enabling for read-intensive workloads
-
Configure networking:
- Create or select a Virtual Network
- Configure Network Security Group (NSG) rules:
- Allow SSH/RDP for administration
- Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
- Consider using a private endpoint for secure connectivity
- Create or select a Virtual Network
- Configure Network Security Group (NSG) rules:
- Allow SSH/RDP for administration
- Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
- Consider using a private endpoint for secure connectivity
- Create or select a Virtual Network
- Configure Network Security Group (NSG) rules:
- Allow SSH/RDP for administration
- Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
- Consider using a private endpoint for secure connectivity
- Create or select a Virtual Network
- Configure Network Security Group (NSG) rules:
- Allow SSH/RDP for administration
- Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
- Consider using a private endpoint for secure connectivity
- Create or select a Virtual Network
- Configure Network Security Group (NSG) rules:
- Allow SSH/RDP for administration
- Allow specific ports for your collectors (514/UDP, 1514/TCP, etc.)
- Consider using a private endpoint for secure connectivity
2. Prepare the VM
-
Connect to the VM using RDP (Windows) or SSH (Linux)
-
Update the OS -
- Windows
- Linux
Install Windows updates
Install-WindowsUpdate -AcceptAll -AutoReboot
For Ubuntu
sudo apt update && sudo apt upgrade -y
-
Format and mount the data disk -
- Windows
- Linux
Initialize and format the disk using Disk Management. Use the following commands to automate this process.
-
Get the disk number of the raw disk:
$disk = Get-Disk | Where-Object PartitionStyle -eq 'RAW'
-
Initialize the disk:
Initialize-Disk -Number $disk.Number -PartitionStyle GPT.
-
Create a new partition using the maximum size:
$partition = New-Partition -DiskNumber $disk.Number -UseMaximumSize -AssignDriveLetter
-
Format the volume:
Format-Volume -Partition $partition -FileSystem NTFS -NewFileSystemLabel "DataStream" -Confirm:$false
-
Identify the disk:
sudo lsblk
-
Format the disk:
sudo mkfs.ext4 /dev/sdc
-
Mount the disk:
sudo mkdir -p /datadisk
sudo mount /dev/sdc /datadisk -
Make the mount persistent:
echo '/dev/sdc /datadisk ext4 defaults,nofail 0 0' | sudo tee -a /etc/fstab
-
Install dependencies -
- Windows
- Linux
-
Install .NET Runtime
Invoke-WebRequest -URI "https://dotnet.microsoft.com/download/dotnet/scripts/v1/dotnet-install.ps1" -OutFile "dotnet-install.ps1"
./dotnet-install.ps1 -Runtime dotnet -Version 6.0.0 -InstallDir "C:\Program Files\dotnet" -
Add dotnet to
PATH
if not done so$env:Path += ";C:\Program Files\dotnet"
[Environment]::SetEnvironmentVariable("Path", $env:Path, [System.EnvironmentVariableTarget]::Machine)
-
Install .NET Runtime
sudo apt install -y apt-transport-https
wget https://packages.microsoft.com/config/ubuntu/20.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo apt update
sudo apt install -y dotnet-runtime-6.0
3. Install DataStream
-
Download the DataStream installer -
- Windows
- Linux
Invoke-WebRequest -Uri "https://download.datastream.example.com/latest/datastream-installer.exe" -OutFile "datastream-installer.exe"
wget https://download.datastream.example.com/latest/datastream-installer.sh
chmod +x datastream-installer.sh -
Run the installer -
- Windows
- Linux
Assuming
E:
is the data disk drive letter....\datastream-installer.exe --data-dir "E:\DataStream"
sudo ./datastream-installer.sh --data-dir /datadisk/datastream
4. Configure DataStream
-
Create basic configuration -
- Windows
- Linux
-
Create configuration directory if it doesn't exist:
New-Item -ItemType Directory -Force -Path "E:\DataStream\config"
-
Create or edit the configuration file:
notepad E:\DataStream\config\config.yaml
sudo mkdir -p /datadisk/datastream/Director/config
sudo nano /datadisk/datastream/Director/config/config.yaml -
Add sample devices configuration -
devices:
- id: 1
name: azure_syslog
type: syslog
properties:
port: 514 -
Configure Azure-specific targets -
targets:
- name: azure_storage
type: azure_blob
connection_string: "${AZURE_STORAGE_CONNECTION_STRING}"
container: "datastream-logs"
path_format: "{date}/{hour}/{source}" -
Set up environment variables (if needed) -
- Windows
- Linux
Set system environment variable:
[Environment]::SetEnvironmentVariable("AZURE_STORAGE_CONNECTION_STRING", "DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=yourkey;EndpointSuffix=core.windows.net", [System.EnvironmentVariableTarget]::Machine)
echo 'AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=youraccount;AccountKey=yourkey;EndpointSuffix=core.windows.net"' | sudo tee -a /etc/environment
5. Start and Test DataStream
-
Start the service -
- Windows
- Linux
-
Set service to automatic start:
Set-Service -Name DataStream -StartupType Automatic
-
Start the service:
Start-Service DataStream
sudo systemctl enable datastream
sudo systemctl start datastream -
Check status -
- Windows
- Linux
Get-Service DataStream
sudo systemctl status datastream
-
Test the deployment -
- Windows
- Linux
Using PowerShell to send a UDP message to port 514...
$Message = [Text.Encoding]::ASCII.GetBytes("Test message for Azure VM deployment")
$UdpClient = New-Object System.Net.Sockets.UdpClient
$UdpClient.Connect("127.0.0.1", 514)
$UdpClient.Send($Message, $Message.Length)...check logs:
Get-Content -Path "E:\DataStream\logs\service.log" -Tail 10
-
Send test message:
logger -n localhost -P 514 "Test message for Azure VM deployment"
-
Check logs:
sudo journalctl -u datastream -f
Azure VM Scaling Options
For scaling DataStream on Azure VMs, consider:
-
Vertical scaling: Resize the VM to a larger size for increased capacity. Can be done with minimal downtime but requires service restart.
-
Horizontal scaling: Deploy multiple DataStream instances with a load balancer. Requires additional configuration for coordination. Consider using Azure Load Balancer for distributing incoming traffic.
-
Azure VM Scale Sets: For automated scaling based on metrics. Configure appropriate auto-scaling rules based on CPU/memory usage
Monitoring and Management
Enhance your Azure VM deployment with:
-
Azure Monitor: Enable VM insights for resource utilization metrics
-
Azure Log Analytics: Forward DataStream logs for centralized analysis
targets:
- name: log_analytics
type: azure_monitor
workspace_id: "${WORKSPACE_ID}"
workspace_key: "${WORKSPACE_KEY}" -
Azure Security Center: Enable for security recommendations and monitoring
-
Backup: Configure Azure Backup to protect configuration and data
Optimizing Costs
For Azure VM deployments, control costs with:
- Right-sizing: Select the appropriate VM size for your workload
- Reserved Instances: Purchase reservations for long-term deployments
- Auto-shutdown: For non-production environments, schedule VM shutdowns
- Disk optimization: Use the appropriate storage tier for your performance needs
- Spot instances: For fault-tolerant workloads, consider spot VMs at lower cost
For detailed cost estimates, use the Azure Pricing Calculator.