What log sources should we onboard first?

Start with high-fidelity sources that support your most critical detection use cases: authentication logs, endpoint telemetry, and network security devices. These cover the greatest number of attack techniques with the least onboarding effort.

How do we handle logs from applications with custom formats?

Write custom parsers using your SIEM regular expression or parsing capabilities. Start with a small sample of representative events, build the parser, and validate against a larger dataset before production deployment.

What is log normalization and why does it matter?

Normalization maps different log formats into a common schema so you can correlate events across sources using the same field names. Without it, you cannot write detection rules that connect activity across your Active Directory, VPN, and cloud platforms.

How much log data should we retain?

Retention depends on regulatory requirements and operational needs. Hot storage for real-time search typically covers 30 to 90 days. Warm storage for investigation covers 6 to 12 months. Cold archival for compliance may extend to 3 to 7 years depending on your industry.

What if onboarding a log source exceeds our SIEM license?

Evaluate the security value of each source against its volume. High-volume, low-value sources like debug-level application logs can be filtered or summarized before ingestion. Consider tiered storage architectures or data lake integrations for cost-effective retention.

Log Source Onboarding Checklist | Hunto AI

Overview

Onboarding a new log source into your SIEM is not as simple as pointing a syslog feed at the collector and calling it done. Without proper parsing, normalization, and validation, those logs are noise that fills your storage without improving detection. This checklist walks through the complete process of bringing a new log source online, from requirements gathering through production deployment and first detection rule creation.

Onboarding Steps

Identify the log source and its security value to your detection program
Determine the log format (syslog, JSON, CEF, CSV, custom format)
Define the transport method (syslog, API, agent, file upload)
Configure log forwarding from the source to the SIEM collector
Develop and test parsing rules to extract structured fields
Map parsed fields to your SIEM normalized schema (CIM, ECS, OCSF)
Validate event completeness by comparing source counts to SIEM ingestion counts
Create initial dashboards and search queries for the new data
Develop at least one detection use case leveraging the new log source
Document the onboarding in your log source inventory

Log Source Priority Matrix

Priority	Source type	Examples	Security value
Critical	Authentication and identity	Active Directory, SSO, MFA platforms	Detect credential attacks, unauthorized access, privilege abuse
Critical	Endpoint telemetry	EDR, antivirus, OS event logs	Detect malware, lateral movement, persistence
High	Network security	Firewall, IDS/IPS, web proxy, DNS	Detect network attacks, C2, exfiltration
High	Cloud infrastructure	AWS CloudTrail, Azure Activity Log, GCP Audit	Detect cloud misconfigurations, unauthorized API activity
Medium	Email security	Email gateway, O365/Google Workspace	Detect phishing, BEC, mail rule abuse
Medium	Application logs	Web applications, databases, SaaS platforms	Detect application-layer attacks, data access anomalies
Low	Physical security	Badge access, CCTV integrations	Correlate physical and logical access events

Parsing and Normalization

Raw logs need to be broken into structured fields before they become useful for search and detection. Identify the key fields in each log type: timestamp, source IP, destination IP, user, action, result, and any event-specific data. Map these fields to your SIEM schema so that a "user" field from Active Directory and a "user" field from your web proxy can be correlated in the same query. Test parsing with a representative sample of events including edge cases like multi-line logs, escaped characters, and unusual field values. Validate that timestamps are correctly parsed and time-zone adjusted.

Validation and Quality Assurance

After onboarding, validate that logs are arriving consistently. Compare the event count at the source with what your SIEM is ingesting to catch gaps. Check for parsing failures where events arrive but fields are not extracted correctly. Verify that time synchronization is accurate across sources. Monitor ingestion latency to ensure real-time alerts are not delayed by slow log delivery. Set up health monitoring alerts that fire when a log source stops sending data, ingestion drops below a threshold, or parsing error rates spike.

Documentation and Maintenance

Maintain a log source inventory documenting every onboarded source: name, format, transport, ingestion rate, and owner
Include the contact person at the source system team for troubleshooting
Document any custom parsing rules and their purpose
Schedule regular reviews to identify stale log sources that are no longer sending data
Update parsing rules when source systems are upgraded and log formats change
Track storage costs per log source to manage SIEM licensing and capacity

Log Source Onboarding Checklist