# EVTX Parsing Strategy for SecureWatch SIEM ## Overview This document outlines the comprehensive strategy for parsing Windows Event Log (EVTX) files in SecureWatch SIEM, enabling ingestion of historical Windows security events for threat hunting and forensic analysis. **Updated with Enhanced EVTX Parser v2.0** featuring comprehensive MITRE ATT&CK detection and Sysmon support. ## Enhanced EVTX Parser v2.0 ### Key Enhancements - **MITRE ATT&CK Integration**: Automatic technique detection with 50+ supported techniques - **Sysmon Support**: Full coverage of Events 1-29 with enhanced field extraction - **Attack Pattern Recognition**: 50+ regex patterns for malicious behavior detection - **Risk Scoring Algorithm**: Intelligent threat prioritization (0-100 scale) - **Web Upload Interface**: Real-time file parsing via frontend component - **EVTX-ATTACK-SAMPLES Testing**: Validated against 329 attack samples ### Implementation Status ✅ **Completed**: Enhanced parser with MITRE ATT&CK detection ✅ **Completed**: Comprehensive testing against EVTX-ATTACK-SAMPLES ✅ **Completed**: Web-based upload component ✅ **Completed**: Integration with SecureWatch platform ✅ **Completed**: Risk scoring and confidence assessment ## Architecture Design ### 1. Parser Service Architecture ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ File Upload │────▶│ EVTX Parser │────▶│ Log Normalizer │ │ API Endpoint │ │ Service │ │ │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ TimescaleDB │ │ │ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌──────────────────┐ │ File Storage │ │ Processing │ │ (Temporary) │ │ Queue │ └─────────────────┘ └──────────────────┘ ``` ### 2. Technology Stack - **Python EVTX Library**: `python-evtx` for parsing binary EVTX files - **Node.js Integration**: Child process or microservice for Python execution - **File Processing**: Multer for file uploads, Bull for job queuing - **Storage**: Temporary file storage with automatic cleanup ## Implementation Strategy ### Phase 1: EVTX Parser Module Create a Python-based EVTX parser that: 1. Reads binary EVTX files using python-evtx 2. Extracts all event records with full metadata 3. Converts Windows XML schema to JSON format 4. Preserves all fields for comprehensive analysis ### Phase 2: API Integration Extend log-ingestion service with: 1. File upload endpoint: `POST /api/ingest/evtx` 2. Support for multiple file formats: - .evtx (binary Windows Event Log) - .xml (exported Windows Event XML) - .json (pre-processed event data) 3. Async processing with job queue 4. Progress tracking and status API ### Phase 3: Enhanced Field Mapping Map Windows Event fields to SecureWatch schema: 1. Security Events (4624/4625) - Authentication 2. Process Events (4688) - Process creation 3. Service Events (7045) - Service installation 4. Task Events (4698/106) - Scheduled tasks 5. PowerShell Events (4103/4104) - Script execution 6. Audit Events (1102) - Log clearing ### Phase 4: Batch Processing Implement efficient batch processing: 1. Chunk large EVTX files (>100MB) 2. Parallel processing of event records 3. Memory-efficient streaming 4. Progress reporting via WebSocket ## Detailed Component Design ### 1. Python EVTX Parser (`evtx_parser.py`) ```python import Evtx.Evtx as evtx import json from datetime import datetime import xml.etree.ElementTree as ET class EVTXParser: def __init__(self, file_path): self.file_path = file_path def parse(self): """Parse EVTX file and yield JSON events""" with evtx.Evtx(self.file_path) as log: for record in log.records(): yield self.record_to_json(record) def record_to_json(self, record): """Convert EVTX record to JSON format""" # Extract XML and parse xml_str = record.xml() root = ET.fromstring(xml_str) # Build comprehensive event object event = { "EventID": self.extract_event_id(root), "TimeCreated": self.extract_timestamp(root), "Computer": self.extract_computer(root), "Channel": self.extract_channel(root), "Provider": self.extract_provider(root), "Level": self.extract_level(root), "Task": self.extract_task(root), "Keywords": self.extract_keywords(root), "EventData": self.extract_event_data(root), "UserData": self.extract_user_data(root), "System": self.extract_system_data(root), "RawXML": xml_str } return event ``` ### 2. Node.js Integration Service ```typescript // evtx-parser.service.ts import { spawn } from 'child_process'; import { EventEmitter } from 'events'; export class EVTXParserService extends EventEmitter { async parseEVTXFile(filePath: string): Promise { const python = spawn('python3', ['evtx_parser.py', filePath]); python.stdout.on('data', (data) => { const events = data.toString().split('\n') .filter(line => line.trim()) .map(line => JSON.parse(line)); for (const event of events) { this.emit('event', event); } }); python.stderr.on('data', (data) => { this.emit('error', new Error(data.toString())); }); python.on('close', (code) => { if (code === 0) { this.emit('complete'); } else { this.emit('error', new Error(`Parser exited with code ${code}`)); } }); } } ``` ### 3. API Endpoint Implementation ```typescript // evtx-upload.route.ts import multer from 'multer'; import { Router } from 'express'; import { EVTXParserService } from './evtx-parser.service'; import { LogNormalizer } from '../processors/log-normalizer'; const router = Router(); const upload = multer({ dest: '/tmp/evtx-uploads/', limits: { fileSize: 1024 * 1024 * 500 } // 500MB limit }); router.post('/api/ingest/evtx', upload.single('file'), async (req, res) => { const { file } = req; const parser = new EVTXParserService(); const normalizer = new LogNormalizer(); let processedCount = 0; const batchSize = 100; let batch = []; parser.on('event', async (event) => { const normalized = normalizer.normalizeWindowsEvent(event); batch.push(normalized); if (batch.length >= batchSize) { await ingestBatch(batch); processedCount += batch.length; batch = []; // Send progress update res.write(JSON.stringify({ status: 'processing', processed: processedCount }) + '\n'); } }); parser.on('complete', async () => { if (batch.length > 0) { await ingestBatch(batch); processedCount += batch.length; } res.end(JSON.stringify({ status: 'complete', total: processedCount })); // Cleanup uploaded file fs.unlink(file.path, () => {}); }); parser.on('error', (error) => { res.status(500).json({ error: error.message }); }); await parser.parseEVTXFile(file.path); }); ``` ## Field Mapping Strategy ### Windows Event ID Mappings #### 1. Authentication Events (4624/4625) ```typescript { // Windows Fields → SecureWatch Fields "LogonType": "auth.logon_type", "TargetUserName": "user.name", "TargetDomainName": "user.domain", "TargetUserSid": "user.id", "IpAddress": "source.ip", "IpPort": "source.port", "WorkstationName": "source.hostname", "LogonProcessName": "process.name", "AuthenticationPackageName": "auth.package", "Status": "event.outcome", "SubStatus": "auth.sub_status" } ``` #### 2. Process Creation (4688) ```typescript { "NewProcessName": "process.executable", "CommandLine": "process.command_line", "ProcessId": "process.pid", "ParentProcessName": "process.parent.executable", "SubjectUserName": "user.name", "SubjectDomainName": "user.domain", "TokenElevationType": "process.elevation_type" } ``` #### 3. PowerShell Events (4103/4104) ```typescript { "ScriptBlockText": "powershell.script_block", "Path": "file.path", "ScriptBlockId": "powershell.script_id", "MessageNumber": "powershell.message_number", "MessageTotal": "powershell.message_total" } ``` ## Performance Considerations ### 1. Memory Management - Stream processing for large files - Chunk size: 1000 events per batch - Memory limit: 512MB per parser instance - Automatic garbage collection ### 2. Processing Speed - Target: 10,000 events/second - Parallel processing: 4 worker threads - Database batch inserts: 1000 records - Index optimization for common queries ### 3. Storage Optimization - Compress raw XML data - Index frequently queried fields - Partition by event timestamp - Retention policies by event type ## Security Considerations ### 1. File Validation - Verify EVTX file signatures - Scan for malicious content - Enforce file size limits - Validate XML structure ### 2. Access Control - Role-based upload permissions - Audit trail for uploads - Data classification tags - Encryption at rest ### 3. Data Privacy - PII detection and masking - Compliance field mapping - Retention policy enforcement - Export restrictions ## Testing Strategy ### 1. Unit Tests - Parser accuracy validation - Field mapping verification - Error handling scenarios - Performance benchmarks ### 2. Integration Tests - End-to-end upload flow - Database persistence - API response validation - Progress tracking ### 3. Load Tests - Large file processing (>1GB) - Concurrent uploads - Memory usage monitoring - Database performance ## Deployment Plan ### Phase 1: Development (Week 1-2) 1. Implement Python EVTX parser 2. Create Node.js integration service 3. Add file upload API endpoint 4. Basic field mapping ### Phase 2: Enhancement (Week 3-4) 1. Advanced field mappings 2. Batch processing optimization 3. Progress tracking API 4. Error handling ### Phase 3: Testing (Week 5) 1. Unit test coverage 2. Integration testing 3. Performance optimization 4. Security review ### Phase 4: Deployment (Week 6) 1. Docker containerization 2. Production deployment 3. Monitoring setup 4. Documentation ## Success Metrics 1. **Performance** - Parse 1GB EVTX in <2 minutes - Support 10+ concurrent uploads - 99.9% parsing accuracy 2. **Coverage** - Support all Windows versions (7/8/10/11/Server) - Parse 100+ event ID types - Extract 50+ unique fields 3. **Reliability** - Zero data loss - Automatic retry on failure - Comprehensive error logging ## Future Enhancements 1. **Format Support** - EVT (legacy format) - PCAP with Windows events - Sysmon enhanced events 2. **Processing Features** - Real-time streaming - Deduplication - Correlation rules 3. **Integration** - Direct Windows collector - WMI/WinRM support - Active Directory enrichment