File openmp.cpp

Defines

OMP_FUNC(str)
OMP_FUNC_ARGS(formatStr, ...)
namespace wasm

SYSCALL NUMBERING

Have a look in the sysroot at include/bits/syscall.h to determine the system call numbering.

Enums

enum sched_type

Values:

enumerator sch_lower

lower bound for unordered values

enumerator sch_static_chunked
enumerator sch_static

static unspecialized

Functions

static std::shared_ptr<faabric::transport::PointToPointGroup> getExecutingPointToPointGroup()
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_thread_num", I32, omp_get_thread_num)
Returns

the thread number, within its team, of the thread executing the function.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_num_threads", I32, omp_get_num_threads)
Returns

the number of threads currently in the team executing the parallel region from which it is called.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_max_threads", I32, omp_get_max_threads)

This function returns the max number of threads that can be used in a new team if no num_threads value is provided.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_level", I32, omp_get_level)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_max_active_levels", I32, omp_get_max_active_levels)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_set_max_active_levels", void, omp_set_max_active_levels, I32 maxLevels)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_push_num_threads", void, __kmpc_push_num_threads, I32 loc, I32 globalTid, I32 numThreads)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_set_num_threads", void, omp_set_num_threads, I32 numThreads)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_global_thread_num", I32, __kmpc_global_thread_num, I32 loc)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_wtime", F64, omp_get_wtime)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_barrier", void, __kmpc_barrier, I32 loc, I32 globalTid)

Synchronization point at which threads in a parallel region will not execute beyond the omp barrier until all other threads in the team complete all explicit tasks in the region. Concepts used for reductions and split barriers.

Parameters
  • loc

  • global_tid

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_critical", void, __kmpc_critical, I32 loc, I32 globalTid, I32 crit)

Enter code protected by a critical construct. This function blocks until the thread can enter the critical section.

Parameters
  • loc – source location information.

  • global_tid – global thread number.

  • crit – identity of the critical section. This could be a pointer to a lock associated with the critical section, or some other suitably unique value. The lock is not used because Faasm needs to control the locking mechanism for the team.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_end_critical", void, __kmpc_end_critical, I32 loc, I32 globalTid, I32 crit)

Exits code protected by a critical construct, releasing the held lock. This function blocks until the thread can enter the critical section.

Parameters
  • loc – source location information.

  • global_tid – global thread number.

  • crit – compiler lock. See __kmpc_critical for more information

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_flush", void, __kmpc_flush, I32 loc)

The omp flush directive identifies a point at which the compiler ensures that all threads in a parallel region have the same view of specified objects in memory. Like clang here we use a fence, but this semantic might not be suited for distributed work. People doing distributed DSM OMP synch the page there.

Parameters

loc – Source location info

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_master", I32, __kmpc_master, I32 loc, I32 globalTid)

Note: we only ensure the master section is run once, but do not handle assigning to the master section.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_end_master", void, __kmpc_end_master, I32 loc, I32 globalTid)

Only called by the thread executing the master region.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_single", I32, __kmpc_single, I32 loc, I32 globalTid)

Test whether to execute a single construct. There are no implicit barriers in the two “single” calls, rather the compiler should introduce an explicit barrier if it is required.

Parameters
  • loc

  • globalTid

Returns

1 if this thread should execute the single construct, zero otherwise.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_end_single", void, __kmpc_end_single, I32 loc, I32 globalTid)

See comment on __kmpc_single

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_fork_call", void, __kmpc_fork_call, I32 locPtr, I32 nSharedVars, I32 microtaskPtr, I32 sharedVarPtrs)

The LLVM version of this function is implemented in the openmp source at: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_csupport.cpp

It calls into __kmp_fork call to do most of the work, which is here: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_runtime.cpp

The structs passed in are defined in this file: https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp.h

Arguments:

  • locPtr = pointer to the source location info (type ident_t)

  • nSharedVars = number of non-global shared variables

  • microtaskPtr = function pointer for the microtask itself (microtask_t)

  • sharedVarPtrs = pointer to an array of pointers to the non-global shared variables

NOTE: the non-global shared variables include:

  • those listed in a shared() directive

  • those listed in a reduce() directive

template<typename T>
void for_static_init(I32 schedule, I32 *lastIter, T *lower, T *upper, T *stride, T incr, T chunk)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_for_static_init_4", void, __kmpc_for_static_init_4, I32 loc, I32 gtid, I32 schedule, I32 lastIterPtr, I32 lowerPtr, I32 upperPtr, I32 stridePtr, I32 incr, I32 chunk)

The functions compute the upper and lower bounds and strides to be used for the set of iterations to be executed by the current thread.

The guts of the implementation in openmp can be found in __kmp_for_static_init in runtime/src/kmp_sched.cpp

See sched_type for supported scheduling.

Parameters
  • loc – Source code location

  • gtid – Global thread id of this thread

  • schedule – Scheduling type for the parallel loop

  • lastIterPtr – Pointer to the “last iteration” flag (boolean)

  • lowerPtr – Pointer to the lower bound

  • upperPtr – Pointer to the upper bound of loop chunk

  • stridePtr – Pointer to the stride for parallel loop

  • incr – Loop increment

  • chunk – The chunk size for the parallel loop

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_for_static_init_8", void, __kmpc_for_static_init_8, I32 loc, I32 gtid, I32 schedule, I32 lastIterPtr, I32 lowerPtr, I32 upperPtr, I32 stridePtr, I64 incr, I64 chunk)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_for_static_fini", void, __kmpc_for_static_fini, I32 loc, I32 gtid)
void startReduceCritical(faabric::Message *msg, std::shared_ptr<threads::Level> level, int32_t numReduceVars, int32_t reduceVarPtrs, int32_t reduceVarsSize)

Called to start a reduction.

void endReduceCritical(faabric::Message *msg, bool barrier)

Called to finish off a reduction.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_reduce", I32, __kmpc_reduce, I32 loc, I32 gtid, I32 numReduceVars, I32 reduceVarsSize, I32 reduceVarPtrs, I32 reduceFunc, I32 lockPtr)

This function is called to start the critical section required to perform the reduction operation by each thread. It will then call __kmpc_end_reduce (and its nowait equivalent), when it’s finished.

It seems that in our case, always returning 1 for both kmpc_reduce and kmpc_reduce_nowait gets the right result.

In the OpenMP source we can see a more varied set of return values, but these are for cases we don’t yet support (notably teams): https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/kmp_csupport.cpp

Note that the reduce vars passed into this function are the LOCAL copies on the thread’s own stack used to hold intermediate results. There is apparently no way to get a reference to the final destination of the reduction result in this function, that is only known in kmpc_fork_call.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_reduce_nowait", I32, __kmpc_reduce_nowait, I32 loc, I32 gtid, I32 numReduceVars, I32 reduceVarsSize, I32 reduceVarPtrs, I32 reduceFunc, I32 lockPtr)

See __kmpc_reduce

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_end_reduce", void, __kmpc_end_reduce, I32 loc, I32 gtid, I32 lck)

Finalises a blocking reduce, called by all threads.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__kmpc_end_reduce_nowait", void, __kmpc_end_reduce_nowait, I32 loc, I32 gtid, I32 lck)

Finalises a non-blocking reduce, called by all threads

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_get_num_devices", int, omp_get_num_devices)

Get the number of devices (different CPU sockets or machines) available to that user

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "omp_set_default_device", void, omp_set_default_device, int defaultDeviceNumber)

Switches between local and remote threads.

WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__atomic_load", void, __atomic_load, I32 a, I32 b, I32 c, I32 d)
WAVM_DEFINE_INTRINSIC_FUNCTION (env, "__atomic_compare_exchange", I32, ___atomic_compare_exchange, I32 a, I32 b, I32 c, I32 d, I32 e, I32 f)
void ompLink()