Source: http://www.javacodegeeks.com/2011/01/10-tips-proper-application-logging.html
10 Tips for Proper Application Logging
Our latest
JCP partner,
Tomasz Nurkiewicz, has submitted a number of posts describing the basic principles of proper application
logging.
I found them quite interesting, thus I decided to aggregate them in a
more compact format and present them to you. So, here are his
suggestions for clean and helpful logs:
(NOTE: The original posts have been slightly edited to improve readability)
1) Use the appropriate tools for the job
Many programmers seem to forget how important is logging an application’s behavior and its current activity. When somebody puts:
1 | log.info( "Happy and carefree logging" ); |
happily
somewhere in the code, he probably doesn’t realize the importance of
application logs during maintenance, tuning and failure identification.
Underestimating the value of good logs is a terrible mistake.
In my opinion, SLF4J is the best logging API available, mostly because of a great pattern substitution support:
1 | log.debug( "Found {} records matching filter: '{}'" , records, filter); |
In Log4j you would have to use:
1 | log.debug( "Found " + records + " records matching filter: '" + filter + "'" ); |
This
is not only longer and less readable, but also inefficient because of
extensive use of string concatenation. SLF4J adds a nice {} substitution
feature. Also, because string concatenation is avoided and toString()
is not called if the logging statement is filtered, there is no need for
isDebugEnabled() anymore. BTW, have you noticed
single quotes around filter string parameter?
SLF4J is just a façade. As an implementation I would recommend the
Logback framework,
already advertised, instead of the well established
Log4J. It has many interesting features and,
in contrary to Log4J, is actively developed.
The last tool to recommend is
Perf4J. To quote their motto:
Perf4J is to System.currentTimeMillis() as log4j is to System.out.println()
I’ve
added Perf4J to one existing application under heavy load and seen it
in action in few other. Both administrators and business users were
impressed by the nice graphs produced by this simple utility. Also we
were able to discover performance flaws in no time. Perf4J itself
deserves its own article, but for now just check their
Developer Guide.
Additionally, note that
Ceki Gülcü (founder of the
Log4J,
SLF4J and
Logback projects) suggested a simple approach to get rid of
commons-logging dependency (see his
comment).
2) Don’t forget, logging levels are there for you
Every
time you make a logging statement, you think hard which logging level
is appropriate for this type of event, don’t you? Somehow 90% of
programmers never pay attention to logging levels, simply logging
everything on the same level, typically INFO or DEBUG. Why? Logging
frameworks have two major benefits over System.out., i.e. categories and
levels. Both allow you to selectively filter logging statements
permanently or only for diagnostics time. If you really can’t see the
difference, print this table and look at it every time you start typing
“log.” in your IDE:
ERROR – something terribly wrong had
happened, that must be investigated immediately. No system can tolerate
items logged on this level. Example: NPE, database unavailable, mission
critical use case cannot be continued.
WARN – the process
might be continued, but take extra caution. Actually I always wanted to
have two levels here: one for obvious problems where work-around exists
(for example: “Current data unavailable, using cached values”) and
second (name it: ATTENTION) for potential problems and suggestions.
Example: “Application running in development mode” or “Administration
console is not secured with a password”. The application can tolerate
warning messages, but they should always be justified and examined.
INFO
– Important business process has finished. In ideal world,
administrator or advanced user should be able to understand INFO
messages and quickly find out what the application is doing. For example
if an application is all about booking airplane tickets, there should
be only one INFO statement per each ticket saying “[Who] booked ticket
from [Where] to [Where]“. Other definition of INFO message: each action
that changes the state of the application significantly (database
update, external system request).
DEBUG – Developers stuff. I will discuss later what sort of information deserves to be logged.
TRACE
– Very detailed information, intended only for development. You might
keep trace messages for a short period of time after deployment on
production environment, but treat these log statements as temporary,
that should or might be turned-off eventually. The distinction between
DEBUG and TRACE is the most difficult, but if you put logging statement
and remove it after the feature has been developed and tested, it should
probably be on TRACE level.
The list above is just a suggestion,
you can create your own set of instructions to follow, but it is
important to have some. My experience is that always everything is
logged without filtering (at least from the application code), but
having the ability to quickly filter logs and extract the information
with proper detail level, might be a life-saver.
The last thing worth mentioning is the infamous is*Enabled() condition. Some put it before every logging statement:
1 | if (log.isDebugEnabled()) |
2 | log.debug( "Place for your commercial" ); |
Personally, I find this idiom being just clutter that should be avoided. The performance improvement (especially when using
SLF4J pattern substitution discussed previously) seems irrelevant and smells like a
premature optimization.
Also, can you spot the duplication? There are very rare cases when
having explicit condition is justified – when we can prove that
constructing logging message is expensive. In other situations, just do
your job of logging and let logging framework do its job (filtering).
3) Do you know what you are logging?
Every
time you issue a logging statement, take a moment and have a look at
what exactly will land in your log file. Read your logs afterwards and
spot malformed sentences. First of all, avoid NPEs like this:
1 | log.debug( "Processing request with id: {}" , request.getId()); |
Are you absolutely sure that request is not null here?
Another
pitfall is logging collections. If you fetched collection of domain
objects from the database using Hibernate and carelessly log them like
here:
1 | log.debug( "Returning users: {}" , users); |
SLF4J will call toString() only when the statement is actually printed, which is quite nice. But if it does… Out of memory error,
N+1 select problem,
thread starvation (logging is synchronous!), lazy initialization
exception, logs storage filled completely – each of these might occur.
It
is a much better idea to log, for example, only ids of domain objects
(or even only size of the collection). But making a collection of ids
when having a collection of objects having getId() method is
unbelievably difficult and cumbersome in Java. Groovy has a great
spread operator (users*.id), in Java we can emulate it using the
Commons Beanutils library:
1 | log.debug( "Returning user ids: {}" , collect(users, "id" )); |
Where collect() method can be implemented as follows:
1 | public static Collection collect(Collection collection, String propertyName) { |
2 | return CollectionUtils.collect(collection, new BeanToPropertyValueTransformer(propertyName)); |
The
last thing to mention is the improper implementation or usage of
toString(). First, create toString() for each class that appears
anywhere in logging statements, preferably using
ToStringBuilder (but not its
reflective
counterpart). Secondly, watch out for arrays and non-typical
collections. Arrays and some strange collections might not have
toString() implemented calling toString() of each item. Use
Arrays #deepToString JDK utility method. And read your logs often to spot incorrectly formatted messages.
4) Avoid side effects
Logging
statements should have no or little impact on the application’s
behavior. Recently a friend of mine gave an example of a system that
threw Hibernates’ LazyInitializationException only when running on some
particular environment. As you’ve probably guessed from the context,
some logging statement caused lazy initialized collection to be loaded
when session was attached. On this environment the logging levels were
increased and collection was no longer initialized. Think how long would
it take you to find a bug without knowing this context?
Another
side effect is slowing the application down. Quick answer: if you log
too much or improperly use toString() and/or string concatenation,
logging has a performance side effect. How big? Well, I have seen server
restarting every 15 minutes because of a thread starvation caused by
excessive logging. Now this is a side effect! From my experience, few
hundreds of MiB is probably the upper limit of how much you can log onto
disk per hour.
Of course if logging statement itself fails and
causes business process to terminate due to exception, this is also a
huge side effect. I have seen such a construct to avoid this:
2 | log.trace( "Id=" + request.getUser().getId() + " accesses " + manager.getPage().getUrl().toString()) |
3 | } catch (NullPointerException e) {} |
This is a real code, but please make the world a bit better place and don’t do it, ever.
5) Be concise and descriptive
Each logging statement should contain both data and description. Consider the following examples:
1 | log.debug( "Message processed" ); |
2 | log.debug(message.getJMSMessageID()); |
4 | log.debug( "Message with id '{}' processed" , message.getJMSMessageID()); |
Which
log would you like to see while diagnosing failure in an unknown
application? Believe me, all the examples above are almost equally
common. Another anti-pattern:
1 | if (message instanceof TextMessage) |
4 | log.warn( "Unknown message type" ); |
Was
it so hard to include thee actual message type, message id, etc. in the
warning string? I know something went wrong, but what? What was the
context?
A third anti-pattern is the “magic-log”. Real life
example: most programmers in the team knew that 3 ampersands followed by
exclamation mark, followed by hash, followed by pseudorandom
alphanumeric string log means “Message with XYZ id received”. Nobody
bothered to change the log, simply someone hit the keyboard and chose
some unique “&&&!#” string, so that it can be easily found
by himself.
As a consequence, the whole logs file looks like a
random sequence of characters. Somebody might even consider that file to
be a valid
Perl program. Instead, a
log file should be readable, clean and descriptive. Don’t use magic
numbers, log values, numbers, ids and include their context. Show the
data being processed and show its meaning. Show what the program is
actually doing. Good logs can serve as a great documentation of the
application code itself.
Did I mention not to log passwords and any personal information? Don’t!
6) Tune your pattern
Logging
pattern is a wonderful tool, that transparently adds a meaningful
context to every logging statement you make. But you must consider very
carefully which information to include in your pattern. For example,
logging date when your logs roll every hour is pointless as the date is
already included in the log file name. On the contrary, without logging
the thread name you would be unable to track any process using logs when
two threads work concurrently – the logs will overlap. This might be
fine in single-threaded applications – that are almost dead nowadays.
From
my experience, the ideal logging pattern should include (of course
except the logged message itself): current time (without date,
milliseconds precision), logging level, name of the thread, simple
logger name (not fully qualified) and the message. In
Logback it is something like:
1 | < appender name = "STDOUT" class = "ch.qos.logback.core.ConsoleAppender" > |
3 | < pattern >%d{HH:mm:ss.SSS} %-5level [%thread][%logger{0}] %m%n</ pattern > |
You
should never include file name, class name and line number, although
it’s very tempting. I have even seen empty log statements issued from
the code:
because
the programmer assumed that the line number will be a part of the
logging pattern and he knew that “If empty logging message appears in
67th line of the file (in authenticate() method), it means that the user
is authenticated”. Besides, logging class name, method name and/or line
number has a serious performance impact.
A somewhat more advanced feature of a logging frameworks is the concept of Mapped Diagnostic Context.
MDC
is simply a map managed on a thread-local basis. You can put any
key-value pair in this map and since then every logging statement issued
from this thread is going to have this value attached as part of the
pattern.
7) Log method arguments and return values
When
you find a bug during development, you typically run a debugger trying
to track down the potential cause. Now imagine for a while that you
can’t use a debugger. For example, because the bug manifested itself on a
customer environment few days ago and everything you have is logs.
Would you be able to find anything in them?
If you follow the
simple rule of logging each method input and output (arguments and
return values), you don’t even need a debugger any more. Of course, you
must be reasonable but every method that: accesses external system
(including database), blocks, waits, etc. should be considered. Simply
follow this pattern:
1 | public String printDocument(Document doc, Mode mode) { |
2 | log.debug( "Entering printDocument(doc={}, mode={})" , doc, mode); |
4 | log.debug( "Leaving printDocument(): {}" , id); |
Because
you are logging both the beginning and the end of method invocation,
you can manually discover inefficient code and even detect possible
causes of deadlocks and starvation – simply by looking after “entering”
without corresponding “leaving”. If your methods have meaningful names,
reading logs would be a pleasure. Also, analyzing what went wrong is
much simpler, since on each step you know exactly what has been
processed. You can even use a simple AOP aspect to log a wide range of
methods in your code. This reduces code duplication, but be careful,
since it may lead to enormous amount of huge logs.
You should
consider DEBUG or TRACE levels as best suited for these types of logs.
And if you discover some method are called too often and logging might
harm performance, simply decrease logging level for that class or remove
the log completely (maybe leaving just one for the whole method
invocation?) But it is always better to have too much rather than too
few logging statements. Treat logging statements with the same respect
as unit tests – your code should be covered with logging routines as it
is with unit tests. No part of the system should stay with no logs at
all. Remember, sometimes observing logs rolling by is the only way to
tell whether your application is working properly or hangs forever.
8) Watch out for external systems
This
is the special case of the previous tip: if you communicate with an
external system, consider logging every piece of data that comes out
from your application and gets in. Period. Integration is a tough job
and diagnosing problems between two applications (think two different
vendors, environments, technology stacks and teams) is particularly
hard. Recently, for example, we’ve discovered that logging full messages
contents,
including SOAP and HTTP headers in
Apache CXF web services is extremely useful during integration and system testing.
This
is a big overhead and if performance is an issue, you can always
disable logging. But what is the point of having a fast, but broken
application, that no one can fix? Be extra careful when integrating with
external systems and prepare to pay that cost. If you are lucky and all
your integration is handled by an ESB, then the bus is probably the
best place to log every incoming request and response. See for example
Mules’
log-component.
Sometimes
the amount of information exchanged with external systems makes it
unacceptable to log everything. On the other hand during testing and for
a short period of time on production (for example when something wrong
is happening), we would like to have everything saved in logs and are
ready to pay performance cost. This can be achieved by carefully using
logging levels. Just take a look at the following idiom:
1 | Collection<Integer> requestIds = |
2 | if (log.isDebugEnabled()) |
3 | log.debug( "Processing ids: {}" , requestIds); |
5 | log.info( "Processing ids size: {}" , requestIds.size()); |
If
this particular logger is configured to log DEBUG messages, it will
print the whole requestIds collection contents. But if it is configured
to print INFO messages, only the size of collection will be outputted.
If you are wondering why I forgot about isInfoEnabled() condition, go
back to tip #2. One thing worth mentioning is that requestIds collection
should not be null in this case. Although it will be logged correctly
as null if DEBUG is enabled, but big fat NullPointerException will be
thrown if logger is configured to INFO. Remember my lesson about side
effects in tip #4?
9) Log exceptions properly
First
of all, avoid logging exceptions, let your framework or container
(whatever it is) do it for you. There is one, ekhem, exception to this
rule: if you throw exceptions from some remote service (RMI, EJB remote
session bean, etc.), that is capable of serializing exceptions, make
sure all of them are available to the client (are part of the API).
Otherwise the client will receive NoClassDefFoundError:
SomeFancyException instead of the “true” error.
Logging exceptions
is one of the most important roles of logging at all, but many
programmers tend to treat logging as a way to handle the exception. They
sometimes return default value (typically null, 0 or empty string) and
pretend that nothing has happened. Other times they first log the
exception and then wrap it and throw it back:
1 | log.error( "IO exception" , e); |
2 | throw new MyCustomException(e); |
This
construct will almost always print the same stack trace two times,
because something will eventually catch MyCustomException and log its
cause. Log, or wrap and throw back (which is preferable), never both,
otherwise your logs will be confusing.
But if we really do WANT to
log the exception? For some reason (because we don’t read APIs and
documentation?), about half of the logging statements I see are wrong.
Quick quiz, which of the following log statements will log the NPE
properly?
04 | } catch (Exception e) { |
08 | log.error(e.toString()); |
09 | log.error(e.getMessage()); |
13 | log.error( "{}" , e.getMessage()); |
14 | log.error( "Error reading configuration file: " + e); |
15 | log.error( "Error reading configuration file: " + e.getMessage()); |
16 | log.error( "Error reading configuration file" , e); |
Surprisingly,
only G and preferably L are correct! A and B don’t even compile in
SLF4J, others discard stack traces and/or print improper messages. For
example, E will not print anything as NPE typically doesn’t provide any
exception message and the stack trace won’t be printed as well.
Remember, the first argument is always the text message, write something
about the nature of the problem. Don’t include exception message, as it
will be printed automatically after the log statement, preceding stack
trace. But in order to do so, you must pass the exception itself as the
second argument.
10) Logs easy to read, easy to parse
There
are two groups of receivers particularly interested in your application
logs: human beings (you might disagree, but programmers belong to this
group as well) and computers (typically shell scripts written by system
administrators). Logs should be suitable for both of these groups. If
someone looking from behind your back at your application logs sees
(source Wikipedia):
then you probably have not followed my tips. Logs should be readable and easy to understand just like the code should.
On
the other hand, if your application produces half GB of logs each hour,
no man and no graphical text editor will ever manage to read them
entirely. This is where old-school
grep,
sed and
awk
come in handy. If it is possible, try to write logging messages in such
a way, that they could be understood both by humans and computers, e.g.
avoid formatting of numbers, use patterns that can be easily recognized
by regular expressions, etc. If it is not possible, print the data in
two formats:
1 | log.debug( "Request TTL set to: {} ({})" , new Date(ttl), ttl); |
4 | final String duration = DurationFormatUtils.formatDurationWords(durationMillis, true , true ); |
5 | log.info( "Importing took: {}ms ({})" , durationMillis, duration); |
Computers
will appreciate “ms after 1970 epoch” time format, while people would
be delighted seeing “1 day 10 hours 17 minutes 36 seconds” text. BTW
take a look at
DurationFormatUtils, nice tool.