Most applications produce some kind of log. Often it is the primary or only resource available to debug a problem. Yet few applications have any unified approach to figure out what should be logged, how it should be logged, and why it should be logged. Most of the time, while there is a shared logging API available, developers use it haphazardly and to record a wide variety of information. There are a few documents on the web such as this one that try to indicate some logging best practices. Over the next few posts I will try to explain some of my own best logging practices and contribute to this body of knowledge.
At the most basic level, you need to ask yourself, if my log files were all I had available, how confident would I be in my ability to fix bugs? If you have little confidence, then you are probably doing it wrong. Before discussing how to create logs that build confidence, let's talk about some common mistakes:
- One Log File Syndrome - There is a misguided desire to put all that is logged into a single file. This is a very bad idea, for reasons that will soon become clear. Having a logging standard that assumes more than one log file offers a built-in way to segregate data, and quick access to the data that you need is critical in a distance debugging situation.
- Logorrhea - Developers like to put things in the log file, and they hate to take them out. This often leads to a situation where the log becomes an indiscriminant mess of random garbage. Worse, it kills system performance, often mysteriously (until you put it in a profiler and see 100M calls to log()).
- Alarmist Logging - Have you ever opened a log file to see messages like "FATAL Error: File Already Open"? The terminiology used within the log can send your distance resource into fits if they see a bunch of "fatal" errors, even if the error was in some sense "fatal" in that it caused the operation to fail. The way that logs are phrased, and the overall tone of the messages can help guide a remote reader to the actual issues in the log. Alarmist logging will make them think that their system is milliseconds away from disaster.
- Logging what ain't broke - Logs are for capturing information that will probably be useful for debugging, should the need arise. Sometimes it's not clear what is useful, and so having examples of success to compare against are useful. But I've seen too many logs that are 99.9% lines that say "Commit Successful". If you have to spend more time ignoring success data than you do reading and analyzing failure data, you need to revisit your logging.
Tomorrow: Logging at a Distance, Part 2
