Overview
Onboarding a new log source into your SIEM is not as simple as pointing a syslog feed at the collector and calling it done. Without proper parsing, normalization, and validation, those logs are noise that fills your storage without improving detection. This checklist walks through the complete process of bringing a new log source online, from requirements gathering through production deployment and first detection rule creation.
Onboarding Steps
- Identify the log source and its security value to your detection program
- Determine the log format (syslog, JSON, CEF, CSV, custom format)
- Define the transport method (syslog, API, agent, file upload)
- Configure log forwarding from the source to the SIEM collector
- Develop and test parsing rules to extract structured fields
- Map parsed fields to your SIEM normalized schema (CIM, ECS, OCSF)
- Validate event completeness by comparing source counts to SIEM ingestion counts
- Create initial dashboards and search queries for the new data
- Develop at least one detection use case leveraging the new log source
- Document the onboarding in your log source inventory
Log Source Priority Matrix
| Priority | Source type | Examples | Security value |
|---|---|---|---|
| Critical | Authentication and identity | Active Directory, SSO, MFA platforms | Detect credential attacks, unauthorized access, privilege abuse |
| Critical | Endpoint telemetry | EDR, antivirus, OS event logs | Detect malware, lateral movement, persistence |
| High | Network security | Firewall, IDS/IPS, web proxy, DNS | Detect network attacks, C2, exfiltration |
| High | Cloud infrastructure | AWS CloudTrail, Azure Activity Log, GCP Audit | Detect cloud misconfigurations, unauthorized API activity |
| Medium | Email security | Email gateway, O365/Google Workspace | Detect phishing, BEC, mail rule abuse |
| Medium | Application logs | Web applications, databases, SaaS platforms | Detect application-layer attacks, data access anomalies |
| Low | Physical security | Badge access, CCTV integrations | Correlate physical and logical access events |
Parsing and Normalization
Raw logs need to be broken into structured fields before they become useful for search and detection. Identify the key fields in each log type: timestamp, source IP, destination IP, user, action, result, and any event-specific data. Map these fields to your SIEM schema so that a "user" field from Active Directory and a "user" field from your web proxy can be correlated in the same query. Test parsing with a representative sample of events including edge cases like multi-line logs, escaped characters, and unusual field values. Validate that timestamps are correctly parsed and time-zone adjusted.
Validation and Quality Assurance
After onboarding, validate that logs are arriving consistently. Compare the event count at the source with what your SIEM is ingesting to catch gaps. Check for parsing failures where events arrive but fields are not extracted correctly. Verify that time synchronization is accurate across sources. Monitor ingestion latency to ensure real-time alerts are not delayed by slow log delivery. Set up health monitoring alerts that fire when a log source stops sending data, ingestion drops below a threshold, or parsing error rates spike.
Documentation and Maintenance
- Maintain a log source inventory documenting every onboarded source: name, format, transport, ingestion rate, and owner
- Include the contact person at the source system team for troubleshooting
- Document any custom parsing rules and their purpose
- Schedule regular reviews to identify stale log sources that are no longer sending data
- Update parsing rules when source systems are upgraded and log formats change
- Track storage costs per log source to manage SIEM licensing and capacity
Frequently Asked Questions
What log sources should we onboard first?
Start with high-fidelity sources that support your most critical detection use cases: authentication logs, endpoint telemetry, and network security devices. These cover the greatest number of attack techniques with the least onboarding effort.
How do we handle logs from applications with custom formats?
Write custom parsers using your SIEM regular expression or parsing capabilities. Start with a small sample of representative events, build the parser, and validate against a larger dataset before production deployment.
What is log normalization and why does it matter?
Normalization maps different log formats into a common schema so you can correlate events across sources using the same field names. Without it, you cannot write detection rules that connect activity across your Active Directory, VPN, and cloud platforms.
How much log data should we retain?
Retention depends on regulatory requirements and operational needs. Hot storage for real-time search typically covers 30 to 90 days. Warm storage for investigation covers 6 to 12 months. Cold archival for compliance may extend to 3 to 7 years depending on your industry.
What if onboarding a log source exceeds our SIEM license?
Evaluate the security value of each source against its volume. High-volume, low-value sources like debug-level application logs can be filtered or summarized before ingestion. Consider tiered storage architectures or data lake integrations for cost-effective retention.
Ready to use this resource?
Download it now or schedule a demo to see how Hunto AI can automate your security workflows.
