Smart Log Tiering: Balancing Costs and Accessibility for Observability Data

The growing volume of observability data, especially logs, presents a significant cost challenge due to expensive hot storage. However, much of this data is infrequently accessed after an initial period. This discussion explores the concept of intelligent tiering for logs—automatically moving data to cheaper storage tiers based on access patterns or age—to optimize costs while retaining necessary access.

A Real-World Three-Tier Logging Strategy

One commenter shared a successful tiered approach implemented at their startup:

Hot Tier (Last 7 Days): All logs are stored in Elasticsearch. This tier is optimized for speed, supporting real-time debugging and immediate operational needs, despite its higher cost.
Warm Tier (7-90 Days): After 7 days, logs are automatically archived by a log shipper (e.g., Fluentd) to AWS S3 Standard. These logs can still be queried directly using tools like AWS Athena. While querying is slower than Elasticsearch, it's suitable for occasional, deeper investigations, offering substantial cost savings.
Cold Tier (After 90 Days): Logs older than 90 days are transitioned to S3 Glacier Deep Archive using S3 lifecycle policies. This tier is extremely cost-effective for long-term storage, primarily for compliance or "break glass in case of emergency" scenarios, with the understanding that retrieval is a slow process.

Data-Driven Decisions: The Importance of Query Patterns

A crucial lesson learned was the importance of realistically assessing actual log query patterns. By discovering that over 95% of their queries were for logs less than 3 days old, the team could confidently implement an aggressive tiering strategy. This data-driven approach ensured that critical visibility for recent events was maintained while older, less-frequently accessed data was moved to more economical storage.

While the method for obtaining these query patterns wasn't detailed, its impact on shaping the tiering policy was emphasized.

Other Considerations and Tools

The discussion also touched on other points:

Object Storage Benefits: Logging systems that utilize object storage (like S3 or Tigris) for persistence can inherently benefit from the storage tiering options offered by these platforms. However, it was noted that Amazon S3's Intelligent-Tiering primarily covers warm to archive tiers, not typically the high-performance "hot" tier, which often requires a separate solution like Elasticsearch.
Existing Tools: The capabilities of existing commercial tools like Splunk were briefly debated. While one user suggested Splunk has long offered such tiering, another countered that it might be limited to static retention policies rather than dynamic, access-based tiering.

Ultimately, implementing an intelligent tiering strategy for logs, informed by actual usage data, can lead to significant cost reductions in observability infrastructure.