Be generous but selective with what you log and how you log it
The right log message can make the difference between tracking down a problem in 2 minutes vs. spending days investigating mysterious behaviour. Unfortunately, knowing what to log and how to log it can often feel like an art. It’s certainly easy to forget to add enough log messages, that’s pretty common. But it’s also easy to log too much, which
- can make it hard to interpret those logs.
- makes your software slow due to the overhead of logging
- increases the cost in cases where you have to pay for log processing and storage.
Most logging infrastructure supports different logging levels to help classify and filter log messages. For example, you’ll often see:
- Error ⛔️
- Warning ⚠️
- Information ℹ️
- Debug 🧠
The levels may have different names and there are sometimes more levels, but the above are pretty common.
Here are some rules of thumb on when to log and what level to use:
- Permanent Failures: Always log whenever and wherever a permanent failure is encountered. A permanent failure is when an error condition is detected and the component gives up and fails the operation. Use the Error level to report permanent failures.
- Transient Failures: Always log whenever a transient failure is encountered. A transient failure is typically recoverable through retries. A transient failure becomes a permanent failure if it happens several times in a row and the operation is terminated. In that case, it becomes a permanent failure and should be logged as such. Logging here helps you understand scenarios like how many your network requests actually failed in the first try, and that can help you make your system more resilient. Use the Warning level to report transient failures.
- Significant Events: Log more significant events for your component. Significant events are frequently associated with the life cycle of key resources for the component. For example, a component that records video would treat the creation, editing, and deleting of videos as significant events. Significant events usually occur at a relatively low frequency given the scale of the component. Use the Information level to report significant events.
Less Significant Events: Less significant events, such as simple state transitions, creation of transient objects, temporary connection setup, tend to occur at a higher frequency and are usually not desirable in common logs. They can add substantial overhead to produce the logs, flood the logs with mundane details, and increase log processing and storage costs. However, logging these events can be invaluable when trying to diagnose specific problems and so they remain valuable.
Use the Debug level for to report less significant events. And configure your component to discard log messages at the Debug level by default, these messages must thus be enabled explicitly when trying to track down a problem.
- Execution Anomalies: Execution anomalies (crashes, memory/data corruption, and assorted unexpected conditions) should always be logged when they are detected. If your logging infrastructure provides a level higher than Error (some have Critical), report execution anomalies using that level. Otherwise, use the Error level.
If you liked what you read 🧑🏫 and got to learn new things, do hit like 👍 and subscribe 🔖 to my newsletter to get instantly notified whenever I drop in new content. And don't forget to follow 🚀 me on
Hashnode - Rajat Jain
Twitter - @rajat_codes
LinkedIn - Rajat Jain