Lec 11 - Parallelization and Asynchronous
Threads
A thread is a single path of execution within a program. Normally, when we run a Java program, it's single-threaded, which means it does one thing at a time. But sometimes, we want a program to do multiple things at once, like downloading a file while keeping the UI responsive. That’s where threads come in.
Create a Thread
This can be done using the Thread class with a Runnable. For example,
// Thread 1 prints _
new Thread(() -> {
for (int i = 0; i < 50; i++) {
System.out.print("_");
}
}).start();
// Thread 2 prints *
new Thread(() -> {
for (int i = 0; i < 50; i++) {
System.out.print("*");
}
}).start();Runnable
Runnable is a functional interface (has only one method: run()). It is used to define a task to be executed by a thread. It can be implemented using a lambda expression (as shown above). For example,
Name of Thread
Every thread has a name, we can use getName() to get the name of a thread, which can help us debug or understand how threads are being used. For example,
To have a deeper glimpse of getName(), we can use a parallel stream example
Sequential Stream means single-threaded.
Parallel stream means multi-threaded.
Pause a Thread
This can be done using Thread.sleep(milliseconds), which is a static method that pauses the current thread for a specified number of milliseconds. Things to note down when we pause a thread,
The thread is put to sleep, meaning it temporarily stops execution.
During that time, the thread doesn’t do anything — it’s like it’s napping.
After the time is up, the thread becomes eligible to run again, but doesn't necessarily start immediately. The OS scheduler decides when exactly it gets CPU time again.
While a thread is sleeping, other threads can keep running.
For example,
After Line 10, the findPrime thread will start doing some heavy computation at the background. And, within the main thread, which contains a while loop,
The
mainthread checks iffindPrimeis still running using.isAlive().If yes, the
mainthread sleeps for 1 second, then prints".".Repeats until the
findPrimethread finishes.
Asynchronous Programming
We have seen much above Thread from above. However, Thread may have the following limitations:
no return value: there are no method in
Threadthat returns a value.hard to specify the execution order: there is no method that specifies which thread starts after another thread completes.
overhead: the creation and deletion of
Threadin Java takes up "some" resources.
So, to overcome all these limitations, luckily, we have CompletableFuture in Java. In CS2030S, the use of CompletableFuture is to save us from the trouble of dealing with Thread.
CompletableFuture<T>
CompletableFuture<T>Basically, ComputableFuture<T> is a monad that encapsulates a value that is either there or not there yet. (Just another wrapper like Maybe, Lazy we have learned before) Such an abstraction is also known as a promise in other languages — it encapsulates the promise to produce a value.
CompletableFuture Rule of Thumb
CF(f).then(g)means: startgonly afterfhas been completed.
Examples:
thenRun,thenCombine,thenApply, etc.
static CF.async(g)means: startgon a new thread
Examples:
supplyAsync,runAsync, etc.
CF(f).then...async(g)means: startgonly afterfhas been completed, but use a new thread.
Examples:
thenRunAsync,thenApplyAsync, etc.Difference between
runandsupply:runexecutes a void function whilesupplyexecutes a function with a return value.
Create a CompletableFuture
CompletableFutureIn the lecture, the following four methods are introduced to create a CompletedFuture.
Use the completedFuture method
Creates a CompletableFuture that is already completed with the given value.
Example
This is useful when you need to return a CompletableFuture but already have the result.
Use the runAsync method that takes in a Runnable lambda expression
Creates a CompletableFuture that runs the given Runnable asynchronously and completes when the Runnable finishes. Since Runnable doesn't return any value, the CompletableFuture completes with null.
Example
Use this when you need to perform an operation asynchronously but don't need a result
Use the supplyAsync method that takes in a Supplier<T> lambda expression
Creates a CompletableFuture that computes a result asynchronously using the given Supplier and completes with that result.
Example
Use this when you need to calculate something in the background and use the result later.
Rely on other CompletableFuture instances
The example given in CS2030s is anyOf/allOf. Basically, it creates a CompletableFuture that completes when all or any of the given futures complete.
Example
The example of calling anyOf() and allOf() is shown as follows
Use this when you need to wait for multiple independent operations to finish
Use this when implementing timeouts or getting the fastest result from multiple sources.
After creating a CompletableFuture, we often chain them together.
Chain CompletableFuture
CompletableFutureThe usefulness of CompletableFuture comes from the ability to chain them up and specify a sequence of computations to be run. We have the following methods:
thenApply, which is analogous to map
Transforms the result of a CompletableFuture using the provided function.
Example
This is similar to map in FP. The thenApply method takes the result of the original future (10) and applies a function to transform it into a new value ("Result: 10"). The transformation happens in the same thread that completed the original future.
thenCompose, which is analogous to flatMap
Chains two CompletableFuture operations where the second operation depends on the result of the first.
Example
This is similar to flatMap in FP. Unlike thenApply, which returns a simple value wrapped in a CompletableFuture, thenCompose expects a function that returns another CompletableFuture. This is useful when you have operations that depend on previous results and also need to be performed asynchronously.
thenCombine, which is analogous to combine
Combines the results of two independent CompletableFuture operations.
Example
This waits for both futures to complete and then applies a function to their results. The function receives the results of both futures as parameters. In this example, it calculates the sum of the two results (10 + 20) and creates a string with the result ("Sum: 30").
thenRun
Executes an action after the CompletableFuture completes, without using its result.
Example
The thenRun method executes a Runnable after the current CompletableFuture completes, ignoring its result. This is useful for performing side effects like logging or notifications after an operation completes. In this example, it prints "Computation finished!" after the original future completes.
runAfterBoth
Executes an action after both the current CompletableFuture and another specified CompletableFuture complete.
Example
This method waits for both the original future and another future to complete before running a specified action. The action doesn't use the results of either future. In this example, "Both tasks finished!" is printed only after both future1 and future2 have completed.
runAfterEither
Executes an action after either the current CompletableFuture or another specified CompletableFuture completes.
Example
This method runs an action as soon as either the original future or another specified future completes. In this example, since future2 completes faster (1 second vs 2 seconds), "At least one task is done!" will print after "Fast task completed" and before "Slow task completed".
After we chain our CompletableFuture, we are more interested in getting the result right! Till now, the process is also called set-up the tasks to run asynchronously.
Get the result
There are two methods to get the result,
get()
It may throw a couple of checked exceptions, which we must catch and handle.
Example
We create a CompletableFuture that completes with the string "Hello"
We call
get()to retrieve the result, which will block until thefuture1completesSince
get()can throw checked exceptions, we must handle them:InterruptedException: thrown if the current thread is interrupted while waitingExecutionException: thrown if the computation completed exceptionally (wraps the actual exception)
After we get the result, we print it out
join()
It won't throw any checked exception.
Example
We create a CompletableFuture that completes with the string "World"
We call
join()to retrieve the result, which will also block until the future completesUnlike
get(),join()doesn't throw checked exceptions, so we don't need a try-catch blockIf the computation fails,
join()will throw an uncheckedCompletionExceptioninsteadAfter we get the result, we print it out
For the sake of cleaner code, we may prefer .join() over .get().
get()/join() is a sychronous call, i.e., it blocks until the CompletableFuture completes. So, to maximize concurrency, we should only call them as the final step in our code.
Example
Given two numbers i and j, we want to find the difference between the i-th prime number and the j-th prime number. We can do the following:
Handle Exceptions
During the chaining operations for our CompletableFuture<T>, we may get some exceptions. For example,
There will be a NullPointerException after Line 2 and if we execute the code, we will get a CompletionException. So, instead of using a try-catch block to deal with such an exception, we can use the inline .handle(), which takes in a Combiner, to deal with such an exception. For example,
In Line 2, especially inside the Combiner,
the first parameter of the
Combineris the value,the second is the exception,
the third is the return value.
Only one of them will be non-null:
If everything went fine:
e == null, so we returnt.If an exception happened:
t == null,e != null, so we return0as a fallback/default.
So instead of the program crashing or needing to catch the exception manually, we catch it inside the chain and return a safe value (0 in this case).
Fork and Join
Recall that creating and destroying threads is not cheap, and as much as possible we should reuse existing threads to perform different tasks. This goal can be achieved by using a thread pool.
Thread Pool
A thread pool is a system that:
Has some threads already created and waiting.
Holds tasks (units of work) in a queue. (Usually, "task" is a
Runnable, orSupplier)Lets threads pick tasks from the queue and execute them.
Reuses threads instead of creating a new one every time.
This avoids the overhead of constantly creating and destroying threads, which is expensive.
To understand the concept better, let's use a small example,
Create a simple Thread Pool
In this simple thread pool, we have the following:
queue: A shared task queue where all tasks are added.Runnable: Each task is aRunnable, which means it can be run by a thread.new Thread(...): This creates a single worker thread (the thread pool only has 1 thread in this simple example).while (true) {...}: This thread runs forever, constantly checking the queue.queue.dequeue()andr.run(): If there’s a task, it takes it out of the queue and executes it.
More vividely speaking, you can think it of as
One person (thread) standing by.
There's a basket (queue) where you keep dropping small notes (tasks).
The person picks up a note when available, and does what it says.
Add tasks to the Thread Pool
This loop is creating 100 tasks, where each task is:
These are lambdas that implement Runnable (because they have a run() method internally). They just print a number. (See more from Functional Interface if you are unfamiliar with the lambda expression for Runnable)
So what happens is:
You enqueue 100 print tasks.
The single worker thread goes through them one by one, running each.
To summarize, the following is the general working mechanism of Thread Pool in Java
You create a thread pool — this includes:
One or more worker threads (which will run in the background),
A task queue where jobs (tasks) are temporarily stored.
You submit tasks to the thread pool — these tasks get queued first.
The threads in the pool pick up tasks from the queue and run them one by one (or in parallel, depending on how many threads there are).
Real Fork and Join
Imagine you have an array, you want to get the sum of each element, (I know you have lots of methods to do that 😂), but you could just use one loop (sequentially) to do that, like
But in Fork/Join, you want to speed this up by using multiple threads. So you split the task into parts and let different threads work on the parts at the same time, then combine (join) their results. In simple words,
Fork: Break the task into smaller tasks and run them in parallel (with multiple threads).
Join: Wait for the results from the smaller tasks and combine them to get the final answer.
In Java, we can create a task that we can fork and join as an instance of the abstract class RecursiveTask<T>. It has an abstract method compute(), which we, as the client, have to define to specify what computation we want to compute. For example,
You define a class like Summer that extends RecursiveTask<Integer>. This means:
You’re writing a task that eventually returns an Integer.
You’ll define the
compute()method, which decides whether to:Split the work into smaller subtasks (
fork()).Or solve it directly if it’s small enough (base case).
To run it, we can use
The line task.compute() above is just like another method invocation.
In Line 1, it creates a pool with 4 threads.
Behind the scene of pool.invoke(task)
pool.invoke(task)
Each thread has a deque of tasks.
When a thread is idle, it checks its deque of tasks.
If the deque is not empty, it picks up a task at the head of the deque to execute (e.g., invoke its
compute()method).Otherwise, if the deque is empty, it picks up a task from the tail of the deque of another thread to run. This is a mechanism called work stealing.
When
fork()is called, the caller adds itself to the head of the deque of the executing thread. This is done so that the most recently forked task gets executed next, similar to how normal recursive calls.When
join()is called, several cases might happen.
If the subtask to be joined hasn't been executed, this subtaks will be popped out first, and then its
compute()method is called and the subtask is executed.If the subtask to be joined has been completed (some other thread has stolen this and completed it), then the result is read, and
join()returns.If the subtask to be joined has been stolen and is being executed by another thread, then the current thread either finds some other tasks to work on from its local deque, or steals another task from another deque.
For more examples, please find it in:
Order of fork() and join()
fork() and join()TL;DR, your fork(), compute(), join() order should form a palindrome and there should be no crossing. This is because the most recently forked task is likely to be executed next, we should join() the most recent fork() task first.
Example 1
Example 2
Last updated