Technical Musings: September 2013

In this blog post, I want to talk about various java references. In java there are four types of references

Strong Reference
Soft Reference
Weak Reference
Phantom Reference

The strongest of these four is Strong Reference and the weakest is Phantom Reference. I am not going to talk about the Strong Reference as that is your normal reference but concentrate on the other three reference as that is more interesting.

There are lot of links available that describe the three references but I could not find a post that has code sample explaining the behaviour of them. Hopefully this post will supplement the other posts in your understanding of references. One good link that you should read is this https://weblogs.java.net/blog/2006/05/04/understanding-weak-references. This link explains all the references types in details as such I will not try to explain them but try to supplement them with code samples

Soft Reference

Soft Reference is a reference that is weaker than Strong reference as such JVM will only garbage collect it when it is running out of memory. I will quote a line from the javadoc that would be of our interest http://docs.oracle.com/javase/7/docs/api/java/lang/ref/SoftReference.html

All soft references to softly-reachable objects are guaranteed to have been cleared before the virtual machine throws an OutOfMemoryError

Ok, let us test this. I will have a object named Person as shown below

public class Person {
    private final String name;
    private final int age;

    public Person(final String name, final int age) {
        this.name = name;
        this.age = age;
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    @Override
    public String toString() {
        return "Person [name=" + name + ", age=" + age + "]";
    }
}

So, what we will do in our test case is to create a multiple instances of Person object causing it to reclaim the Soft Reference when it is reaching out of memory error

        
        Map <String,SoftReference<Person>> map = new HashMap<String,SoftReference<Person>>();         
        Person person = new Person("tabiul", 35);
        map.put("tabiul",new SoftReference<Person>(person));
        for(int i = 0 ; i < 30000; i++){
            SoftReference<Person> ref = new SoftReference<Person>(new Person("tabiul" + i, 35));
            map.put("tabiul" + i, ref);
            
        }
        System.out.println(map.size());
        System.out.println(map.get("tabiul").get());

Running the above code with the java option -Xms 2m -Xmx 2m we get the correct output

   3001
   Person [name=tabiul, age=35]

But, now if we change the loop from 3000 to let say 4000, the output is as below

   
   4001
   null

As you can see now, as the JVM is reaching its memory limit it garbage collected some of the Soft Reference. So, now the question when is should I use Soft Reference. A good answer is from the javadoc for SoftReference

Direct instances of this class may be used to implement simple caches; this class or derived subclasses may also be used in larger data structures to implement more sophisticated caches. As long as the referent of a soft reference is strongly reachable, that is, is actually in use, the soft reference will not be cleared. Thus a sophisticated cache can, for example, prevent its most recently used entries from being discarded by keeping strong referents to those entries, leaving the remaining entries to be discarded at the discretion of the garbage collector.

Weak Reference

A weak reference is weaker then Soft Reference. JVM will reclaim the memory of a weak reference when it is weakly accessible. Below is relevant excerpt from javadoc http://docs.oracle.com/javase/7/docs/api/java/lang/ref/WeakReference.html

Suppose that the garbage collector determines at a certain point in time that an object is weakly reachable. At that time it will atomically clear all weak references to that object and all weak references to any other weakly-reachable objects from which that object is reachable through a chain of strong and soft references. At the same time it will declare all of the formerly weakly-reachable objects to be finalizable. At the same time or at some later time it will enqueue those newly-cleared weak references that are registered with reference queues.

Let us try to simulate this behaviour as shown below

   Person person = new Person("tabiul", 35);
   WeakReference<Person> ref = new WeakReference<Person>(person);
   Map<String,WeakReference<Person>> map = new HashMap<String,WeakReference<Person>>();
   map.put("tabiul", ref);
   person = null;
   System.out.println(map.get("tabiul").get());

You might be surprised to see that it displays this

  Person [name=tabiul, age=35]

But, this make sense as even though the person object is eligible for gc, it is not garbage collected yet. But we can see the correct behaviour if we add the statement below (line 6)

   Person person = new Person("tabiul", 35);
   WeakReference<Person> ref = new WeakReference<Person>(person);
   Map<String,WeakReference<Person>> map = new HashMap<String,WeakReference<Person>>();
   map.put("tabiul", ref);
   person = null;
   System.gc();
   System.out.println(map.get("tabiul").get());

Now we will see what we really want to see

  null

At this point, you might wonder what is the difference between using WeakReference and using normal Strong Reference. Let us find out by changing our code as below

   Person person = new Person("tabiul", 35);
   Map<String,Person> map = new HashMap<String, Person>();
   map.put("tabiul", person);
   person = null;
   System.gc();
   System.out.println(map.get("tabiul"));

Below is the output

  Person [name=tabiul, age=35]

So, you can see that for WeakReference, the gc is more willing to reclaim the memory compare to a normal Strong reference. So, now when should we use a WeakReference. Let me quote from the javadoc itself

Weak references are most often used to implement canonicalizing mappings

Now, you might wonder what the heck is canonicalizing mapping. A good link for this is http://c2.com/cgi/wiki?CanonicalizedMapping

Phantom Reference

Phantom Reference is the weakest of all the references and there is nothing much you can do with it as by the time you know about it, it is ready to be garbage collected. So, you might wonder what is the point of Phantom Reference. Let us turn over to javadoc on this http://docs.oracle.com/javase/7/docs/api/java/lang/ref/PhantomReference.html

Phantom references are most often used for scheduling pre-mortem cleanup actions in a more flexible way than is possible with the Java finalization mechanism.

As such, we should use phantom reference if we wish to be know when a certain reference is ready to be garbage collected so that we can do any necessary cleanup.
This situation is very hard to simulate as there is no guarantee on when the gc will decide to garbage collect as such I do not have any code for this. What I can show is here is how we can use it. One possible use case is that we have a large image that is being loaded and we do not want to load a new image until the old one is ready to be garbage collected. Let us see how we can do that

       ImageIcon img = new ImageIcon("icon1.jpg");
       Map<String,PhantomReference<ImageIcon>> map = new HashMap<String,PhantomReference<ImageIcon>>(); 
       ReferenceQueue<ImageIcon> q = new ReferenceQueue<ImageIcon>();
       map.put("icon", new PhantonRefrence<ImageIcon>(img,queue));
       Thread t = new Thread(new LoadNewImage(q,map));  // create a new thread that loads new image when the old is gced
       t.start();

       class LoadNewImage implements Runnable {
           private ReferenceQueue<ImageIcon> q;
           private Map<String,PhantomReference<ImageIcon>> map
           public LoadNewImage(ReferenceQueue<ImageIcon> q, Map<String,PhantomReference<ImageIcon>> map){
              this.q = q;
              this.map = map;
           }

          public void run() {
               //we wait for a object
               try {
                     Reference<ImageIcon> image = (Reference<ImageIcon>) q.remove();
                     if( image != null){
                        map.put("icon", new ImageIcon("icon2.jpg"));  // load new image
                     }
               } catch (InterruptedException e) {
                  e.printStackTrace();
               }
          }
      }

That's about it. Any comments and feedback are welcomed.

This is a summary of the section "5.5" from the wonderful book Java Concurrency in Practice. Do read that book for the full details.

Latches

Latches are like gates. You can imagine that this gate will open only after certain conditions are met, till then it will be closed. Latches could be ideal for cases like below

Need some initialization to be completed before other thread can proceed
Ensuring that a service does not start until all the other services on which it depends on have started

CountDownLatch is one implementation of Latch. Let see some code on how we can use this. Let say we want to do some initialization and want our worker thread to only start processing after the initialization is completed

  
void foo() {
    CountDownLatch latch = new CountDownLatch(1);
    Thread t = new Thread(new Init(latch));
    t.start();  // start the Initialization thread

    int numberOfWorkers = 10;

    for(i = 0 ; i < numberOfWorkers; i++){
      Thread t = new Thread(new Worker(latch));
      t.start();  // start the worker thread
   }
}

class Init implements runnable {
     private final CountdownLatch latch;
      worker(CountdownLatch latch){
          this.latch = latch;
      }

     public void run(){
       doInitialization();
        latch.countDown();  // open the gate
    }
}

class Worker implements runnable {
      private final CountdownLatch latch;

       worker(CountdownLatch latch){
          this.latch = latch;
       }

     public void run(){
        latch.await(); // wait for the gate to open
        doProcessing();
    }
}

Future Task

FutureTask is one of the implementation of the Future interface. The idea of Future aka Promise is simple. You submit a job for which you do not need the result immediately but somewhere in the Future ( get it now :) ). So once you have submitted your job, you can go around in doing other important stuff and when the time comes around to use the value of the job submitted, you call the future object that you have created. Now, in this case there are two possibility, either

The job is completed and you get the result immediately.
The job is yet to be completed/not started yet. In this case, you have two options

If the job is in the middle of processing then you can wait for it (blocking) or cancel the job
If the job is not yet started, then you can ask it to start and wait for the result (blocking) or cancel the job

Time for some code. Let say we have a job that require intensive computational processing. We do not need the result immediately but will need it later as part of our larger formula.

void process() throws InterruptedException, ExecutionException {
     FutureTask<integer> f = new FutureTask<integer>(new ComplicatedProcessing(1, 2));
     Thread t = new Thread(f);
      t.start();
      doOtherProcessing();
      if (f.isDone()) { // job is done
         System.out.println(f.get());
      }
      else {

         try {
            System.out.println(f.get(1, TimeUnit.MINUTES)); // we wait for 1 minute and see if it get done or not
         }
         catch (TimeoutException e) {
            f.cancel(true); // we cancel the job, even if it is in between
                            // processing
         }

      }

   }

   class ComplicatedProcessing implements  Callable<integer> {
      private int a;
      private int b;

      public ComplicatedProcessing(int val1, int val2) {
         this.a = val1;
         this.b = val2;
      }

      public Integer call() {

         return a * b;
      }
   }

Semaphores

A semaphore allows us to give permits (permission) to access a certain resource. You can think of semaphores as licensing. So assume I have some resource and at a certain given time I can only allow four people to access it. In this case I will give out four license. Once a thread have a license then that thread can access the resource otherwise it need to wait. Once a thread is done with the resource, it will return the license back to the semaphore.

Semaphores are useful for implementing resource pools such as database connection. Let say we want to allow at any given time maximum four connection to the database. We can easily do that by using semaphore. Let see how we can do that in some simple code. In here we will use java class semaphore

class DatabasePool {
   private final Semaphore sem;
   public DatabasePool(int maximumConnection){
      this.sem = new Semaphore(maximumConnection);
      //do other necessary stuff to setup database connection and create the pool
   } 

   public Connection getConnection() throws InterruptedException {
        sem.acquire(); // try to get the license. If maximum is reached then it will block. There are other options
                       // tryAcquire does not block. It is possible to set the timeOut option in tryAcquire 
        return connectionFromPool();
   }

   //once the connection is no longer needed then it can be returned to the pool
   public void done(Connection connection){
      sem.release(); // release the license back to semaphore
      returnConnectionToPool(connection);
   }
}

Barriers

Barrier are similar to latches in that they block a group of threads until some event occurs. A good analogy from the book is "Everyone meet at McDonald's at 6:00, once you get there, stay there until everyone shows up, and then we'll figure out what we're doing next.". Whereas a latch is like "Everyone go to McDonald and you go in have your meal if McDonald is open". As you can see, in case of latch, as long as the gate is open each thread does its own processing at its own time, there is no waiting for other thread. Whereas in Barrier we wait for each thread to complete and only when all the thread is completed then the barrier opens.

So what scenario is Barrier good for? It is good for cases where there is some dependency among thread. Assuming I have some task that I need to perform and that task I break it down into smaller task run by different thread. Since different part of the task might take different time, some will be slower while other faster. Barrier allows us to make each thread wait for the other thread before executing the next step.

Let see some code for using barrier. Java CyclicBarrier implements this concept. Let say I want to add all numbers in a 2D array. Each worker thread will process one row. The final thread will sum up the total for each row.

  
private List<integer> list = Collections.synchronizedList(new ArrayList<integer>());

   public void process() {
      int[][] numbers =
         new int[][] { { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }, { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 } };
      CyclicBarrier barrier = new CyclicBarrier(numbers.length, new Runnable() {
         //executed once all the threads are completed
         @Override
         public void run() {
            int total = 0;
            for (int n : list) {
               total += n;
            }
            System.out.println("The total is :" + total);
         }
      });

      int i = 0;
      for (i = 0; i < numbers.length; i++) {
         Thread t = new Thread(new Add(barrier, numbers[i]));
         t.start();
      }
   }

   private class Add implements Runnable {
      private int[] array;
      private final CyclicBarrier barrier;

      public Add(CyclicBarrier barrier, int[] array) {
         this.barrier = barrier;
         this.array = array;
      }

      public void run() {
         try {
            int sum = 0;
            for (int i = 0; i < array.length; i++) {
               sum += array[i];
            }
            list.add(sum);
            this.barrier.await();  // wait for the other thread
         }
         catch (InterruptedException e) {
            e.printStackTrace();
         }
         catch (BrokenBarrierException e) {
            e.printStackTrace();
         }
      }
   }

That's about it. I hope that you have learned something from this blog post.

Technical Musings

Pages

Thursday, 26 September 2013

Java References