Garbage collection: JVM and other platforms
JVM and other platforms
In computer science, garbage collection (gc) can mean few different things depending on context and definition. In this post, it means “freeing allocated memory which is no longer needed by at least somewhat automated means.” Since this text is mostly about Java and JVM (Java Virtual Machine), “object” and “memory area” are used interchangeably.
The Old Way
The original way of handling memory was not to allocate or free it at all – or rather allocate it during compile time and free it when the program exits. This way is still used in use today, usually in smaller realtime and critical systems. While this design can’t leak memory it is very limiting and building larger programs with it means that you either need lots of memory or you need to share same memory areas between different functions which can’t be run at the same time which can get complex rather quickly.
The other traditional way of allocating and freeing memory is by doing it manually (as opposed to automatic garbage collection). Every object allocation needs to be followed by freeing it once, only once, and only after its last usage. While this can be a very efficient method, it is also error prone, especially if the code is doing optional transformations to objects and/or has complex control flow (eg., using exceptions). To make it somewhat harder to make errors, most recent C++ standards (C++14) have five different types of pointers for managing this (old “dumb” ones and four “smart” ones). If you use the smart pointers correctly they will take care freeing the object once it can no longer be used.
If you forget to free an object, the program leaks memory (ie., it has allocated memory it can no longer use) and if this happens regularly, the program grows over time and will eventually run out of memory and crash.
However, if you continue using an object you have already freed and allocated for a different object, or (in most implementations) free some memory area twice (ie., two different allocations can get the same area), you end up with a memory area which is claimed by two different objects. When two objects share memory, it means that a change in one will affect other as well, which can lead to interesting debug sessions. Fortunately, there are some pretty good tools for detecting these kinds of problems (Valgrind is probably the best of those).
The easy way out
The easiest way to free unused objects automatically is to allocate them on stack – once the function returns, all stack-allocated objects are automatically freed.
Of course, this approach has many limitations, the most obvious being that you can’t have objects whose lifetime is longer than the function itself. The second limitation is maximum stack size, which depends on operating system and/or threading implementation; while there are ways to extend the default limit on some platforms, on platforms with lots of threads and a 32-bit environment, you might start running out of memory quickly. Finally, while most threading implementations will allocate stack on demand (up to the limit), they usually can’t free unused stack memory back to the system before the thread dies.
Maybe the simplest way of performing automatic freeing of objects is reference counting. It means when an object is created and a reference to is given out, the reference count is set as one. Every time the reference is copied somewhere, the counter is incremented by one and every time a reference is deleted or goes out of scope, it is decremented by one. When the counter reaches zero, the object is deleted. This is actually the same thing that the C++ smart pointer std::shared_ptr (and partly std::weak_ptr) will do. It is also used internally in many “lighter” programming languages like Perl 5 and CPython.
Automatic garbage collection removes the burden of freeing objects from the programmer and can significantly simplify many programs. On the other hand, reference counting can incur quite a large overhead on the program, especially if there are many short-lived objects. This is especially true for multithreaded programs which need to ensure that the counters are used atomically.
Simple reference counting alone can not free objects which (directly or indirectly) refer to themselves, since the reference count will never go to zero, but many implementations provide a way to overcome this in some way. For example, in C++, std::weak_ptr is a reference which doesn’t increment the reference counter, so if you know you will be making cyclical references, you can still make sure the group of objects will get freed when all outside references to them are gone. There are also some special tricks in Perl and Python which will also try to get rid of self referencing objects, ie. Python will do a tracing garbage collection after certain amount of allocations have happened or when the program explicitly requests it.