Handling Java Garbage

Java
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

How do you feel about the Java garbage collector? Pick a side. You can side with the vast majority of Java programmers who believe that the language’s garbage collection feature will save you from any memory concerns ever. Or, you can side with the naysayers who argue that the Java garbage collector can’t be trusted, can’t save you from memory leaks, and ensures that your program will never perform well. You can choose between ignorant bliss and hostile pessimism. Where do you choose to stand? Who is right? I would like to show you that the truth is somewhere between the extremes. (It usually is, by the way.) What you need to build robust Java programs is a general understanding of how the garbage collector works and what things you can do to make the garbage collector do what you want, when you want.

A Simple Example

The idea behind having a garbage collector is that application programmers do not have to worry about the length of time an object is alive. Look at the following example:

public void myMethod() {

Object foo = new Object();

// work with foo
}

If you call this in a program, an object named Foo gets created when the first line is executed. You then do some work with Foo, and finally the method returns. Never in here did you have to worry about the fact that Foo has to be cleaned up. If you were to code this way in languages such as C or C++, you would have a memory leak that would eventually cause your problem to fail. In Java, you could call this method over and over forever. When it needs to, the garbage collector would just take care of freeing up the resources you have left lying around.

Java Takes Care of Memory. So Why Do I Care?

The fact that Java can take care of memory for you is great, but as I am fond of saying, nothing is ever free. The automatic memory allocation for objects and subsequent de-


allocation of those objects by Java’s garbage collection mechanism can add significant overhead. Constructors do have to run for every new object created. Also, if finalize() methods exist, the garbage collector must run them before the object’s memory can be freed. And, while you are not keeping track of all the objects created, the Java Virtual Machine (JVM) is.

Furthermore, the memory footprint for your application can be significantly larger than the amount of memory you actually need because the garbage collector runs asynchronously. That means that, while you may no longer be using an object, it isn’t immediately gone. The object is simply eligible for collection. When the object is actually collected is up to the JVM.

In order to write simple applications, you don’t need to care about object allocation and garbage collection. In order to write industrial strength applications that are robust and have predictable performance characteristics, you are going to have to understand some basics of object life-cycle management and garbage collection. Objects go through various stages during their life. However, not all objects go through all stages. Here are the possible stages objects may go through:

• 2nd created—The memory for the object is allocated off the heap (the storage area that memory is allocated and de-allocated from), is run. All objects are created and moved into the in-use state as long as they are assigned to some variable after creation.

• 2nd in use— An object is in use as long as there is at least one strong reference to the object, which users can still access in their program. Basically, if the program can still use the object, the object is still in an in-use state.

• 2nd invisible—An object is considered invisible when it has gone out of program scope but is still strongly reachable from a JVM perspective. Not all objects go through this stage during their life. The best way to show this is by example:

public void myMethod() {

try {

Object foo = new Object();

// work with foo

} catch (Exception e) { /* Ignore error */ }

// Do more work?
}

Object Foo is no longer reachable by the application code after the try block is exited, but optimized JVMs will generally not check in the middle of the method for references to the object. Therefore, Foo is likely to be considered strongly reachable until the end of myMethod().

• 2nd unreachable—An object is unreachable when there are no strong references that the program still has access to that can get to the object. Note that it does not mean that there are no strong references. Many objects can circularly reference each other, but if the program can’t get to any of the objects, then the objects are no longer strongly reachable.

• 2nd collected—Once the garbage collector has recognized an unreachable object, it will move the object to the collected state. Objects that have finalize() methods will be marked for finalization processing. Just having a finalize() method will slow down the garbage collection process as the object goes through this extra step. This extra step doesn’t mean that using finalize() methods is a bad idea. It does mean that you should make sure you need finalization processing done and are not using finalization processing as merely a simple convenience.


• 2nd finalized—An object’s finalize() method runs at the collected stage. If the object is still unreachable after the finalize() method is run, the object moves to this state—which is really just a holding ground for objects that have not had their memory returned to the heap. It is up to the JVM to determine when object resources will be returned to the heap. You might have gathered from this description that it is possible for a finalize() method to make an object reachable again. It is, and, this is called object resurrection. To resurrect an object, the finalizer simply has to make the object strongly reachable again by assigning a reference to the object from somewhere in the program that is still reachable. A resurrected object is just like any other object in the in use state, except that its finalize() method will never be run again (Finalize() methods are run once per object.) Resurrecting objects is almost always a bad idea due to poor design, and it should generally be avoided.

• 2nd de-allocated—This is really the state of an object after it no longer exists. At this point, the object is gone, and the memory is available to be reused for other objects.

Garbage Collection Tips

Cleaning Up After Your Code

As a Java programmer, you should be conscious of places where your code creates expensive object references that are hung onto for periods of time longer than really needed. Your code can ensure that the object references become unreachable in a timely manner by setting these object references to null when the application no longer needs access to these objects. Refer to the code in Figure 1 as an example.

By setting the expensive local references to null once you are finished using them, you give the garbage collector the best chance of recognizing the references as unneeded. And you, therefore, have a better chance of getting the resources (always memory, but sometimes network or database connections or file handles) reused quickly. But, keep in mind the earlier definitions of invisible vs. unreachable. There is no guarantee that the garbage collector will recognize references as unneeded just because you set them to null (they are technically in an invisible state). In situations like this in your code, you really should have two separate methods doing the work. One method uses the expensive resources up front, and the other carries out the long-running task, which is not dependent on the expensive resources. Having each method accomplish a single task in this way is good object-oriented design (OOD).

Finalizer Chaining

Unlike constructors, finalize() methods are not chained for object hierarchies. Therefore, the final act of a finalize() method should be to call the superclass finalize() method. This should be done in a finally block to ensure that the call is made regardless of exceptions raised in the initial method.

This call is especially important when you wrap the functionality of someone else’s code. For example, if you build your own layer on top of Java Database Connectivity (JDBC) and put in your own finalize() method, you should ensure that you call the superclass finalize() method. JDBC drivers typically use finalize() methods to ensure that database resources can be reused in cases where the user forgets to call close, but the garbage collector recognizes that the object is reusable. If you wrap the JDBC objects and add finalizers that do not call the superclass finalize() methods, you effectively eliminate this feature.

Invoking the Garbage Collector

There are cases in which it makes sense to call the garbage collector to run directly. This should be done rarely, and the user should have a good idea of the intended goal in doing


so. Typically, it is better to allow the JVM to determine when to call the garbage collector for you.

A situation, for example, in which to call the garbage collector directly is in a JDBC driver. When the driver runs out of database resources to hand out, one of two things has happened: The user is trying to use more resources than can be used at one time, or the user has leaked resources along the way. A resource leak is a situation in which there is a reference that makes an object technically in use, but that the application has no intention of ever using again. Resource leaks typically cause little harm initially, but eventually the small leaks end up consuming all of some resource on a system and the system fails. The garbage collector might not run on its own because the system still has plenty of memory for object allocations. (It is the database handles that you have run out of.) By telling the garbage collector to run, the driver can help handle leak conditions in the user program. The methods System.gc() and System.runFinalization() allow the user to try to force the garbage collector to run.

Object Pooling

If you pool objects, you don’t have to create and garbage-collect them continuously. Instead, you are getting objects from and returning them to a pool. Pooling is a great way to eliminate the costs of short-lived or very expensive objects. I recommend you check out my COMMON presentation handouts available online at www.as400.ibm. com/developer/jdbc/index.html for more information on building database object pools. This technique is easily extendable to other types of Java objects.

New with Java2

With the release of the Java2 platform, users have more flexibility than ever to work with the garbage collector. A new series of classes called Refs were added (the java.lang.ref package). There are three types of references other than the traditional strong reference added, and, while a detailed discussion of how to use them is beyond the scope of this article, here are brief descriptions of each of them:

• 2nd soft references—These are references to resources that can be collected if memory becomes tight. The typical scenario here is the large file in memory. For performance, you would like to leave the file there. But if you are getting tight on memory, this object can be garbage-collected and reconstructed when you need it later. When soft references are collected is largely outside the programmer’s control.

• 2nd weak references—These are references that are needed but should not be considered when deciding whether or not an object should be collected. OS/400’s Native JDBC driver uses this feature to keep track of user resources allocated. If you allocate 100 statement objects and leak them all, the Native JDBC driver has to know how to find those resources to them clean up. But, you don’t want the fact that you have a reference to the object to stop the garbage collector from collecting the object.

• 2nd phantom references—These are references to objects that are ready for garbage collection. These references are used to allow more flexible pre-mortem cleanup actions than the normal finalization process allows.

Java’s garbage collection feature probably does as much to enhance programmer productivity as any other single feature of the language. This feature saves you from many of the issues of memory allocation and de-allocation that cause some of the hardest to debug problems in languages like C and C++. But, like most powerful features of any language, the programmer’s skill with the feature will determine how well the feature


Finalize

works. If you want your applications to scale well and behave in an orderly manner, knowing a bit about how to work with the garbage collector can be a great benefit.

public void myMethod() {

File myFile;

long[] indexes = new long[100000];

// use the local variables of myFile and indexes

// Do your own cleanup

myFile = null;

indexes = null;

// Some other long long running piece of work.
}

Figure 1: By setting the value of an object to null, you tag an object for garbage collection.


BLOG COMMENTS POWERED BY DISQUS