|
Performance Improvement techniques in Collections
This topic illustrates the performance improvement techniques in Collections with the following sections:
Overview of Collections
Collection is a group of objects. java.util package provides important types of collections. There are two fundamental types of
collections they are Collection and Map. Collection types hold a
group of objects, Eg. Lists and Sets where as Map types hold
group of objects as key, value pairs Eg. HashMap and Hashtable.
This section examples are tested on Windows 2000, millennium, 320mb RAM, JDK 1.3
and JDK1.2.2
Note: This section assumes that reader has some basic knowledge of Java Collections.
Optimization techniques in Lists
List types represent an ordered collection of objects.
ArrayList, Vector, Stack and
LinkedList are the List implementation classes.
All List types support basic operations - adding objects, removing objects, accessing objects and iterating
through the list. So which one to choose since all the list implementations support
these basic operations? Performance is different for each class based on
specific operations. So your choice is driven by the performance and the requirement options.
Your requirement could be
1. Thread safe collection
2. Size of collection (large or small collection)
3. Type of operation ( adding, removing, accessing or iterating )
If you want your collection to be thread safe then Vector or Stack must be used
because both have synchronized methods. While ArrayList and
LinkedList are not thread safe. Stack is meant for specific LIFO (last in - first
out) operations, this can be filtered down based on this specific requirement.
If you don't want your collection to be thread safe then you have can choose between
ArrayList or LinkedList. General concept from performance
point of view is that ArrayList gives better performance when accessing
and iterating objects whereas LinkedList gives better performance when adding and
removing objects. Although true in most cases, sometimes there is an exception. The
following are the benchmarks shown for each operation separately. Adding
objects: The table below shows time taken when I run ListAddTest.java
which adds elements at different positions and is run on different JDK versions to measure
performance bench marks. Time shown here is in milli seconds.
| Size of collection |
Type of operation |
JDK version |
ArrayList with out initialization |
ArrayList with initialization |
Vector with out initialization |
Vector with initialization |
LinkedList |
| 50000 objects |
Adding objects at end |
1.2 |
195 |
15 |
30 |
25 |
390 |
| 1.3 |
70 |
10 |
60 |
20 |
50 |
| Adding objects at middle |
1.2 |
2488 |
2513 |
2578 |
2508 |
162413 |
| 1.3 |
2543 |
2553 |
2619 |
2574 |
96553 |
| Adding objects at first |
1.2 |
5538 |
6053 |
5472 |
8377 |
45 |
| 1.3 |
6499 |
5848 |
5568 |
6078 |
30 |
| 10000 objects |
Adding objects at end |
1.2 |
215 |
35 |
70 |
60 |
325 |
| Adding objects at middle |
1.2 |
11191 |
11416 |
11001 |
10875 |
662172 |
| Adding objects at first |
1.2 |
115871 |
119962 |
118831 |
120293 |
85 |
conclusion is
| Type of operation |
ArrayList
with out initialization |
ArrayList
with initialization |
Vector with out initialization |
Vector with initialization |
LinkedList |
|
Adding objects at end |
fast (but slower than initialization) |
fast |
fast (but sligtly slower than initialization and slower than
ArrayList because of synchronization) |
fast(but sligtly slower than ArrayList because of
synchronization) |
fast ( but slightly slower than ArrayList and Vector) |
|
Adding objects at middle |
slow ( slower than when adding objects at last) |
slow ( slower than when adding objects at last) |
slow ( slower than when adding objects at last) |
slow ( slower than when adding objects at last) |
worse( worse than every operation) |
|
Adding objects at first |
slow ( slower than when adding objects at last and middle) |
slow ( slower than when adding objects at last and middle) |
slow ( slower than when adding objects at last and middle) |
slow ( slower than when adding objects at last and middle) |
slow ( slower than when adding objects at last and middle) |
The reason is
1. JDK 1.3 gives best performance because of HotSpot Virtual Machine.
2. The initial size for ArrayList and Vector is 10. ArrayList increases its
capacity by half approximately whenever its capacity reaches maximum (10) but
Vector increases its capacity by double whenever its capacity reaches maximum.
That is the reason why ArrayList takes more time than Vector if it is not
initialized with proper size though ArrayList is not synchronized. As soon as it reaches
its maximum capacity when adding objects, it creates one
more bigger array ( with 15 capacity for ArrayList approximately and 20 capacity
for Vector) and copies the previous and new objects into new array. Obviously it
is expensive to create new array and copy objects. So best approach is to
initialize the ArrayList and Vector with proper size using constructors or using
ensureCapacity(int capacity) which gives good performance. If you
initialize with proper size then the ArrayList gives better performance than
Vector.
ArrayList with initialization gives better performance than others because its
methods are non-synchronized. Synchronized methods are bit expensive because JVM
has to lock the objects whenever it finds synchronized methods.
Vector takes slightly more time than ArrayList
when you use JDK1.3 Hotspot JVM ,if you are not sure that
whether your collection needs to be thread safe or not then it is better to use Vector
to have higher safety.
You can convert an ArrayList as thread safe collection using
Collections.synchronizedList(ArrayList object) but it is more expensive than using a
Vector.
3. ArrayList and Vector maintain internal Object array ( Object[]) to store
objects. So whenever you add an object, they add it to the end of the array which is
fine as long as it doesn't reach its maximum capacity. If you want to
add an object at any other position, it creates a new object array and recopies
all the objects which is expensive. That is the reason why adding objects at
middle and beginning of collection takes a long time than when it is adding at the end.
4. LinkedList gives good performance when adding elements at the end and beginning
but it is worse when adding objects at middle because it needs to scan the node
whenever it needs to add an object. LinkedList cannot be initialized.
The constructors for ArrayList and Vector to initialize with proper size are
ArrayList( int initialcapacity)
Vector( int initialcapacity)
Vector( int initialcapacity, int capacityIncrement)
You can give incremental capacity in Vector to change the default increment in
capacity.
Here is the ListAddTest.java source code.
|
package
com.performance.util;
//This class gives the
List classes benchmarks for adding objects at end,middle and
first
import java.util.List;
import
java.util.ArrayList;
import
java.util.LinkedList;
import java.util.Vector;
public class ListAddTest
{
private
static final int NUM = 50000;
private
static String[] objs = null;
public void addLast(List
list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.add(objs[i]);
}
long
endTime = System.currentTimeMillis();
System.out.println("Time taken for adding Objects at End: "
+ (endTime
- startTime) );
}
public void addFirst(List
list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.add(0,objs[i]);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for adding Objects at First : "
+ (endTime
- startTime) );
}
public void
addMiddle(List list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.add(i/2,objs[i]);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for adding Objects at Middle : "
+ (endTime
- startTime) );
}
public void doTest(List
list) {
addLast(list);
clear(list);
addMiddle(list);
clear(list);
addFirst(list);
clear(list);
}
public void clear(List
col){
if(!col.isEmpty())
col.clear();
}
public static void
main(String[] args){
objs = new
String[NUM];
for(int i=0;i<NUM;i++){
objs[i] = "Object"+i;
}
ListAddTest
col = new ListAddTest();
ArrayList
collection1 = new ArrayList();
col.doTest(collection1);
ArrayList
collection1A = new ArrayList(NUM);
col.doTest(collection1A);
Vector
collection2 = new Vector();
col.doTest(collection2);
Vector
collection2A = new Vector(NUM);
col.doTest(collection2A);
LinkedList
collection4 = new LinkedList();
col.doTest(collection4);
}
} |
| |
Removing objects:
The table below shows time taken when I run
ListRemoveTest.java on removing elements from different positions to measure
performance bench marks. Time shown here is in milliseconds.
| Size of collection |
JDK version |
Type of operation |
ArrayList |
Vector |
LinkedList |
| 20000 objects |
1.3 |
Removing objects from end |
0 |
50 |
12 |
| Removing objects from middle |
370 |
302 |
2240 |
| Removing objects from first |
615 |
617 |
0 |
| 50000 objects |
Removing objects from end |
60 |
50 |
15 |
| Removing objects from middle |
1810 |
1810 |
46850 |
| Removing objects from first |
6155 |
5862 |
0 |
| 80000 objects |
Removing objects from end |
15 |
52 |
0 |
| Removing objects from middle |
5427 |
5972 |
139952 |
| Removing objects from first |
30827 |
36370 |
27 |
The conclusion for removing objects is
- All classes take approximately same time when removing objects from end
- ArrayList and Vector give similar performance with slight difference
because of JDK1.3 Hotspot JVM.
- LinkedList gives worst performance when removing objects from middle
(similar to adding objects at middle).
- LinkedList gives better performance when removing objects from the
beginning.
- Only LinkedList gives better performance when removing objects from the beginning.
Here is the ListRemoveTest.java source code
|
package
com.performance.util;
// This class shows the
benchmarks of List types for removing objects at end, middle and first
import java.util.List;
import
java.util.ArrayList;
import
java.util.LinkedList;
import java.util.Vector;
import java.util.Arrays;
public class
ListRemoveTest {
private
static final int NUM = 20000;
private
static Object[] objs = null;
public void
removeLast(List list) {
long
startTime = System.currentTimeMillis();
for(int i=NUM;i>0;i--){
list.remove(i-1);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for removing Objects at End: "
+ (endTime
- startTime) );
}
public void
removeFirst(List list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.remove(0);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for removing Objects at First : "
+ (endTime
- startTime) );
}
public void
removeMiddle(List list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.remove((NUM-i)/2);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for removing Objects at Middle : "
+ (endTime
- startTime) );
}
public void doTest(List
collection) {
collection.addAll(getList()); // fill the List
removeLast(collection);
clear(collection);
collection.addAll(getList()); // fill the List
removeMiddle(collection);
clear(collection);
collection.addAll(getList()); // fill the List
removeFirst(collection);
clear(collection);
}
public void clear(List
col){
if(!col.isEmpty())
col.clear();
}
public List getList(){
objs = new
Object[NUM];
for(int i=0;i<NUM;i++){
objs[i] = new Object();
}
return
Arrays.asList(objs);
}
public static void
main(String[] args){
ListRemoveTest col = new ListRemoveTest();
ArrayList
collection1 = new ArrayList();
col.doTest(collection1);
Vector
collection2 = new Vector();
col.doTest(collection2);
LinkedList
collection4 = new LinkedList();
col.doTest(collection4);
}
} |
Accessing objects:
The table below shows time taken when I run
ListAccessTest.java for accessing elemnts at different positions to measure
performance bench marks. Time shown here is in milliseconds.
| Size of collection |
JDK version |
Type of operation |
ArrayList |
Vector |
LinkedList |
| 25000 objects |
1.3 |
Accessing objects from end |
0 |
0 |
4701 |
| Accessing objects from middle |
0 |
0 |
21002 |
| Accessing objects from first |
0 |
0 |
15 |
| 50000 objects |
Accessing objects from end |
0 |
45 |
45100 |
| Accessing objects from middle |
0 |
0 |
112100 |
| Accessing objects from first |
0 |
20 |
15 |
| 100000 objects |
Accessing objects from end |
0 |
50 |
211850 |
| Accessing objects from middle |
0 |
30 |
457090 |
| Accessing objects from first |
0 |
0 |
15 |
The conclusion is
- ArrayList and Vector give best performance because
they access objects using index. Vector takes slightly more time but it is negligible.
- LinkedList gives worst performance when accessing objects at end and
middle because it has to scan nodes to access objects.
Here is the ListAccessTest.java source code
|
package
com.performance.util;
// This class shows the
benchmarks of List types for accessing objects at end, middle and first
import java.util.List;
import
java.util.ArrayList;
import
java.util.LinkedList;
import java.util.Vector;
import java.util.Arrays;
public class
ListAccessTest {
private
static final int NUM = 25000;
private
static Object[] objs = null;
public void
getFromLast(List list) {
long
startTime = System.currentTimeMillis();
for(int i=NUM;i>0;i--){
list.get(i-1);
}
long
endTime = System.currentTimeMillis();
System.out.println("Time taken for getting Objects at Last: "
+ (endTime
- startTime) );
}
public void
getFromFirst(List list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.get(0);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for getting Objects at First : "
+ (endTime
- startTime) );
}
public void
getFromMiddle(List list) {
long
startTime = System.currentTimeMillis();
for(int i=0;i<NUM;i++){
list.get(NUM/2);
}
long endTime
= System.currentTimeMillis();
System.out.println("Time taken for getting Objects at Middle : "
+ (endTime
- startTime) );
}
public void doTest(List
list) {
list.addAll(getList());
getFromLast(list);
getFromMiddle(list);
getFromFirst(list);
}
public void clear(List
col){
if(!col.isEmpty())
col.clear();
}
public static List
getList(){
objs = new
Object[NUM];
for(int i=0;i<NUM;i++){
objs[i] = new Object();
}
return
Arrays.asList(objs);
}
public static void
main(String[] args){
ListAccessTest col = new ListAccessTest();
ArrayList
collection1 = new ArrayList();
col.doTest(collection1);
Vector
collection2 = new Vector();
col.doTest(collection2);
LinkedList
collection4 = new LinkedList();
col.doTest(collection4);
}
} |
Iterating collection:
Iterating collection using all three types of classes ,ArrayList ,Vector and
LinkedList gives similar performance because they need not do any extra work
they simply iterate one by one. So I did not give any benchmark results.
You can use any Iterator . But using ListIterator gives
more flexibility than Iterator and Enumeration. You can traverse both sides
Optimization techniques in Sets
Set is a collection of unique objects, it doesn't allow duplicate objects and
modification of existing objects. Set types also allow basic operations
like adding objects, removing objects, accessing objects, iterating objects but
do not allow modification. There are two implementations of the Set
interface they are HashSet and TreeSet.
HashSet gives better performance than TreeSet because , TreeSet is an ordered
collection of objects and the objects are sorted while they are inserted in the
TreeSet where as in case of HashSet objects are added in an adhoc manner. It is
expensive to do all basic operations in TreeSet because it has to compare and
sort every object. We can get better performance by using a HashSet and
converting it to a TreeSet later on.
HashSet and TreeSet are backed by HashMap and TreeMap respectively. Whenever
we use a HashSet we can specify an initial capacity and load factor using
constructors. The default
size for a HashSet is 11 and it's load factor is 0.75. Load factor determines at
which capacity HashSet has to be resized. It's internal structure will become
double in size when it reaches it's maximum capacity based on load factor. HashSet
scales well when it is initialized with proper size and default load
factor. When you know the the number of objects to be added, it is better to
initialize with that capacity and put load factor as 1.0f. The objects in
HashSet are stored and retrieved through hash code which provides constant look
up time. I did not give any
bench marks for these two Sets because We don't have many options here to
compare and evaluate. We have two Sets to choose. Use TreeSet if you want sorted
collection otherwise use HashSet.
The constructors for the HashSet to initialize with proper size are:
HashSet(int initialcapacity)
HashSet(int initialcapacity, float loadfactor)
Optimization techniques in
Maps
Map is a collection of key and value object associations. You can do all basic operations as in Lists and Sets such as adding , removing ,and accessing key-value pairs.
There are four types of Map implementations
they are HashMap, Hashtable, WeakHashMap and TreeMap.
HashMap, Hashtable and WeakHashMap have similar implementations. TreeMap is
meant for sorted collection, this can be filtered down based on the requirement.
Then we have other three types to choose from. The choice again depends upon your requirement. Your requirement
could be
1. Thread safe
2. Type of operation ( basic operations )
HashMap and WeakHashMap are not synchronized whereas Hashtable is
synchronized.
WeakHashMap is a special purpose map which uses an internal hashtable. When
there are no more references to key object except weak reference maintained by
WeakHashMap, the garbage collector reclaims the key object and mapping between
key object and value object is also reclaimed, if the value object does not have
any other references then the value object is also reclaimed by the garbage
collector.
If you want your Map type collection to be thread safe, then you need to use Hashtable otherwise use HashMap. HashMap gives better performance than Hashtable
because of it's non-synchronized methods. The reason I did not give any bench
marks for Map types is that it is pretty straight forward to choose Map type
depending on requirement .
You can improve performance by using proper initial size and
load factor in the constructor for all types of Map types except TreeMap.
The constructors are
HashMap(int initialcapacity)
HashMap(int initialcapacity, float loadfactor)
Hashtable(int initialcapacity)
Hashtable(int initialcapacity, float loadfactor)
WeakHashMap(int initialcapacity)
WeakHashMap(int initialcapacity, float loadfactor)
When the number of objects exceed loadfactor capacity, then the capacity of the class increases to (2*capacity + 1). The default load factor is 0.75.
All these classes work in accordance with hash values which are
used to identify the value objects. The key objects are converted to integer
called hash code by using hashing algorithms which is used as an index for value
objects. Hash code must be same for two equal objects, i.e. when tested with the equals() method,it must return true. The hash code is determined by using hashCode() method.
The hash code of a collection is determined by
cumulating all the hash codes of the associated objects.
Key Points
Lists:
-
Use ArrayList with proper initialization if you don't
want thread safe for the collection whenever you add/remove/access
objects at end and middle of collection.
- Use Vector with proper initialization if you want
thread safe for the collection whenever you add/remove/access objects at
end and middle of collection.
- Use LinkedList if you don't want thread safe for
the collection whenever you add/remove/access objects at beginning of
collection.
- Use synchronized LinkedList if you want thread safe
for the collection whenever you add/remove/access objects at beginning of
collection.
- Use ListIterator than Iterator and Enumeration for
List types
Sets:
- Use HashSet for maintaining unique objects if you
don't want thread safe for the collection for all basic(add/remove/access)
operations otherwise use synchronized HashSet for thread safe.
- Use TreeSet for ordered and sorted set of unique
objects for non-thread safe collection otherwise use synchronized TreeSet
for thread safe
Maps:
- Use HashMap for non-thread safe map collection
otherwise use Hashtable for thread safe collection.
- Use TreeMap for non-thread safe ordered map
collection otherwise use synchronized TreeMap for thread safe.
|
|