omni-mark Omni/ST: StackThreads/MP implementation for Nested Irregular Parallelism


Compiling and Using Omni/ST

What are StackThreads and Omni/ST?

Omni/ST is an experimental nested parallelism support for Omni. It is implemented as an external library called "StackThreads/MP" and a runtime library that calls StackThreads/MP. With Omni/ST enabled, the user can compile your programs both with and without this nested parallelism support. For programs with deeply nested parallelism (e.g., parallel recursion), Omni/ST generally outperforms Omni, especially when the number of processors is large.

the default Omni supports a limited form of nested parallelism. See
for details.

Supported platforms

Currently, Omni/ST is available on SPARC, MIPS, and i386 architecture machines. Omni/ST assumes the stackStreads compiler "stgcc" is already available in the platform. For StackThreads supported platforms, check


There are two steps. First, you must install StackThreads/MP library separately. Second, you must enable Omni/ST when you build Omni.

  1. Installing StackThreads/MP library

    Download StackThreads/MP library from
    and install it. You can see the details in the document of the software.

    Before you proceed, make sure command `stgcc' is in your path.

  2. Install with enabling Omni/ST

    Add the option `--enable-stackThreads' when you ./configure Omni.

             % configure --enable-stackThreads other_options ...

Compiling OpenMP programs with Omni/ST

When Omni/ST is enabled, you can compile your programs both with and without Omni/ST. To compile a program with Omni/ST, add option `-omniconfig=st' in the command line of the compiler.

     % omcc -omniconfig=st your_program.c

This links your program with StackThreads/MP and the runtime library that calls StackThreads/MP.

Without "-omniconfig=st" option, the default Omni runtime library is linked and the executable is identical to the case where Omni/ST is disabled (i.e., without --enable-stackThreads).

In this way, a single source can be compiled in two ways.

How Omni/ST basically works

Omni/ST creates a fixed number of underlying threads (LWPs). OpenMP-level threads are dynamically mapped on the fixed number of LWPs. For example, when you set OMPC_NUM_PROCS=10 and your program creates 100,000 threads, it only creates 10 LWPs (using the underlying thread package such as Pthreads) and the 100,000 threads are `dynamically' mapped onto 10 LWPs. Therefore, when you observe the number of threads used by your OpenMP program using `top' command, you will see the number of threads is (close to) 10, no matter how many logical threads are created.

To maximize CPU utilization, OpenMP-level threads migrate between LWPs when an LWP runs out of threads. This way, Omni/ST tries to fill LWPs with work (i.e., threads) as much as possible.

Tips for using Omni/ST

Omni/ST will be primarily useful for programs with nested parallelism, such as those using parallel recursions. For such programs, Omni/ST generally exhibits a better speedup than the default Omni. For programs that make no use of nested parallelism, penalty is generally less than 10%.

We are trying to make Omni/ST as `transparent' as possible, in the sense that programs written with the default Omni execution model in mind simply run as fast as or faster than Omni. There are, however, some circumstances where Omni/ST-specific tips are necessary to make effective use of it. Below, we give description of them and suggested programming styles in each situation.