Managing Selfish Threads in ColdFusion

As per usual, this is something that came out of my work with Transfer, but is something that applies to any ColdFusion application that exists.

So in the context of ColdFusion, what exactly are we referring to when we say Thread?  Generally the first thing we think of is <cfthread>,
which executes some code on its own given Thread.  But, we should also
remember that the original page that was executing, is its own Thread
as well.  If we run a scheduled task, that is also it's own Thread.

Wikipedia defines a Thread very well:

"A thread in computer science is short for a thread of execution. Threads are a way for a program to split itself into two or more simultaneously (or pseudo-simultaneously) running tasks…"

So when looking at Threads, we can consider:

  • Any Coldfusion page execution,
  • Any remote CFC execution
  • Any Scheduled Task execution
  • Any CFThread execution

To be its own Thread, because, it is!

So what defines a Selfish Thread?

A Selfish Thread is a thread that takes up almost all of the CPU's processing, without allowing any other Thread to be able to utilise the CPU at all.

Some code like this, would be a good example of a Thread being selfish –


<cfscript>
    for(counter = 1; counter <= 10000; counter++)
    {
       writeOutPut(counter & "<br/>");


    }
</cfscript>


It's a very tight loop, and there is no waiting, or pausing, or 'room'
for any other processing to do anything else while this loop processes.

Now it should be noted, that a Selfish Thread may not necessarily be a
bad thing.  In many instances, we want this loop to completed, without
waiting for any other Thread to interrupt it.  But in cases when Thread
execution can go on for a long time, this can be highly disruptive to
an application, as nothing else can be done during that time.

The common CF solution I often see for this, is the scheduled task that
runs at 3:00am, so that it doesn't bother any of the users.  This can
work perfectly well for many applications, but what if your application
is 24 hours? Or is something that has to run every hour, what do you do
then?

Before we get into this too much, I want to make note of something – managing Threads is bit black magic, and a bit trial and error.  Since Threads are managed differently per OS, and there are differences per JVM, some of these techniques will work, and some will not, so make sure you test everything thoroughly so you know that it is affective for your OS and JVM configuration.

The other thing to note, is that any Thread that is running, is an actual instance of th Java object java.lang.Thread.  If at any point and time we want access to the actual Thread object that the given process is running we can run:


currentThread = createObject("java", "java.lang.Thread").currentThread();


Will return a reference to the currently executing Thread object, which will be very handy as we move along.

The first thing we should look at, is <cfthread>.  CFThread has a 'priority' attribute that can be set to 'HIGH', 'NORMAL' or 'LOW', which should  control
the level of priority that a Thread has.  For example, a HIGH priority
thread should have processing precedence over a LOW priority thread.

For example:

<cfthread action="run" name="foo1" priority="LOW">
<!--- do some processing --->
</cfthread>

In reality, I've not seen this actually do much (in my tests), and it does not seem to actually effect a Thread's Thread.getPriority(),
which we will talk about later.  That being said, there may be some
other mechanism under the hood, and its not going to hurt anything if
you choose to use it.

From here, we can look at setting a Threads priority, which can be
applied to any CF based Thread (i.e. pages, scheduled tasks, cfthread
etc).  A Threads priority goes from 1 to 10, where 1 is the lowest
priority, and 10 is the highest.  5 is usually considered 'Normal'.

In theory, a lower priority Thread should give way to a high priority
Thread whenever the higher priority Thread requires CPU processing
time.  As stated earlier, depending onJVM and OS, this may, or may not happen.

To set the Thread's priority, all you need to do is grab the current thread, like we did above and call:


currentThread.setPriority(2); //set it to a lower priority.
//do some processing...


Since ColdFusion tends to pool Thread (i.e. stores them for reuse), we
should reset the Thread's priority after we are done with it, so that
it doesn't stay that when it gets used to execute another piece of
code. e.g.


priority = currentThread.getPriority();

currentThread.setPriority(2); //set it to a lower priority.

//do some expensive, long running processing...

currentThread.setPriority(priority); //reset it


This way, when the Thread get re-used, the Priority is not set to
something that is inappropriate for the processing it is doing.

There are also mechanisms in Java that allow you tell the JVM when a good time is for the current thread to yield to other threads that need to do some processing.

This simply hints to the JVM that 'hey! now would be a really good time for me to pause for a second, if you wanted to do something else'.  The JVM can totally ignore this if it chooses, and depending on OS and JVM, it may well do.

To do this, we call the static method yield(), on java.lang.Thread, like so:

createObject("java", "java.lang.Thread").yield();

So we can now take our very selfish loop above, and do something similar to:

<cfscript>
    for(counter = 1; counter <= 10000; counter++)
    {
       writeOutPut(counter & "<br/>");
       createObject("java", "java.lang.Thread").yield(); //here is a good place to pause
    }
</cfscript>

This is actually a very poor use of yield(), simply because in a
display, we would never want the server to pause when displaying some
data, but it displays how it works reasonably well.

The interesting thing is, yield() automatically resolves what the
current thread it is processing on, and works that way, rather than the
setPriority() method we saw above, which required us to use a specific Thread.

Quite probably the least useful, but the most consistent way of managing selfish threads, is by putting the thread to sleep, which will allow other Threads access to the CPU while that thread is asleep.

This is the least useful, as no matter what, the Thread
will pause.  Nothing else may be happening on the server, but the
Thread will pause anyway, which can mean wasted cycles for whatever it
is you are doing.

That being said, this will always work, no matter what JVM or OS you are on, so there is a trade off.

There are three ways we can make the current thread sleep in ColdFusion,

In cfscript:

<cfscript>
    sleep(1000);
</cfscript>

via a Tag

<cfthread action="sleep" duration="1000" />

And via Java,

currentThread = createObject("java", "java.lang.Thread").currentThread();
currentThread.sleep(1000);

Either way, in the above example, the current thread will pause for 1000 milliseconds.

In a real world example, there is no reason we can't combine these
techniques.  If we had some processing we wanted to happen
asynchronously, but we knew it was going to take a while to complete,
we could do something like the following:

<cfthread action="run" name="foo1" priority="LOW">
    c
urrentThread = createObject("java", "java.lang.Thread").currentThread();

    priority = currentThread.getPriority();

    currentThread.setPriority(3); //set it to a lower priority.

    for(counter = 1; counter <= 10000; counter++)
    {
       doSomethingExpensive();
       createObject("java", "java.lang.Thread").yield(); //could pause here
    }

    currentThread.setPriority(priority); //reset it
</cfthread>

Which gives us multiple ways in which to tell Java to make sure that other Threads are able to access the CPU.

Next time you are looking at a long running, expensive process, you now have multiple options about how you want to manage it.

Leave a Comment

Comments

  • Danilo Celic | October 20, 2008

    I’ll preface this by stating I haven’t looked into what methods the Java Thread object has available to it, or what is available with the object that is returned by the currentThread method.

    Is there a specific reason that you’re creating a new thread object to perform the yield() rather than referencing the variable currentThread, or creating a thread variable once and then call yeild() on it rather than creating one for each iteration of your loop?

  • Mark | October 21, 2008

    @Danilo –
    yield() is a static method, so I was just highlighting that.

    I could quite happily have called it on currentThread, or used my own Thread variable, but I think that would have made it harder to understand.

    Remember this is example code, meant to explain a concept.

  • Danilo Celic | October 21, 2008

    @Mark, If I need this in practice, I’ll be sure avoid creating an object 10,000 times since I can get away with it. Thanks for the response.