High Performance OPC UA Server SDK  1.4.0.256
Memory

The SDK provides two kinds of memory.

Private memory which is solely used for the own process and shared memory which is used for IPC (inter process communication).

Private Memory

The functions mem_alloc(), mem_free() and mem_realloc() can be used to allocate private memroy. These functions are mapped to platform layer functions ua_p_malloc(), ua_p_free() and ua_p_realloc(). By default these PL functions are implemented using the default C library functions malloc, free and realloc. But you can change this in the platform layer.

IPC Memory

IPC memory can be in a shared memory area when using in a multi-process configuration, or also in private memory when using single process for the whole OPC UA application. The IPC memory is managed by different kinds of memory pools which are provided by the SDK.

IPC Memory Layout

At startup the memory layout manager (src/memory/memory_layout.c) allocates one big memory block from the system and initializes a bunch of memory pools using this memory block. The size of these pools and the whole memory block depends on the configuration in appconfig.c. All further memory allocations work on these memory pools.

The SDK provides two kinds of memory pools. A heap like memory pool (memory/mempool.c) which manages variable size memory blocks like standard malloc/free and object pools which manages fixed size memory blocks. The later is used for many internal structures like buffers, IPC messages, and so on.

IPC Heap

When allocating data which should be passed over IPC boundaries this memory must be allocated using ipc_malloc(). Memory allocated by ipc_malloc(), ipc_calloc() or ipc_realloc() must be freed using the according function ipc_free().

The IPC heap is based on the mempool implementation described in the next section.

Memory Pool

The used mempool algorithm is based on the memory allocator from Doug Lea (http://g.oswego.edu/dl/html/malloc.html).

The memory pool is first initialized with one big memory chunk as shown in figure "Initial State". When the application allocated memory this chunk is split into two chunks. One chunk which is returned to the application, and the remaining big chunk. On the other hand when memory chunks are freed they get coalesced with bordering unused chunks. This minimizes the number of unusable small chunks.

This gets best explained by a simple example. After startup the mempool is in the "Initial State".

mempool_fragmentation_1.png
Initial State

Then the appliation calls mempool_alloc four times.

mempool_fragmentation_2.png
Allocated 4 blocks

After this the worst case scenario is to free only every second block which leads to fragmented memory. The first free blocks cannot get coalesced, because the bordering blocks are still in use. The last block gets merged with the "Big Chunk".

mempool_fragmentation_3.png
Fragmented memory

After this the application frees one more block. As you can see this gets coalesced with the bordering blocks which results in a bigger block.

mempool_fragmentation_4.png

After freeing the last block the "Initial State" is restored.

mempool_fragmentation_1.png
Initial State Restored

This strategy works very effectively in avoiding memory fragmentation, but it depending on the usage pattern it cannot be avoided completely.

Binning

Free memory chunks are maintained in bins, grouped by size. The individual chunks of one bin are linked using a double linked list. When allocating memory the smallest fitting bin is searched for a free chunk first. If a bin is empty the next bin gets checked. Only if no chunk could be found the "Big Chunk" of the last bin is used. This allows recycling free chunks that could not get coalesced yet. The following picture shows an example for such bins.

mempool_binning.png
Binning

Example: Lets assume bin_16 and bin_32 are empty, but bin_48 contains a free chunk. Now the application allocates memory and the allocator computes a chunk size of 16 Byte. It then searches in bin_16, bin_32 and finds the first free chunk in bin_48. It removes the chunk from this bin and splits the chunk into a 16 Byte and a 32 Byte chunk. The 32 Byte chunk is added to bin_32 and the 16 Byte Chunk is returned to the application.

Internal Fragmentation

Every memory chunk has some overhead due to header and footer information necessary for managing memory chunks. Beside this there is often some unused memory inside a chunk. Lets assume the application wants to allocate 8 Byte. Lets further assume the overhead of a chunk is 8 Byte. The minimum chunk size is 16 Byte, minus the 8 Byte overhead and minus 4 Byte user data there are 4 Bytes left unused. This is called internal fragmentation and is shown in figure below. In this example 50% of the memory allocated for the chunk is used due to internal fragmentation. If we also include the node overhead into the calculation we can see that only 25% of the memory is used by the application.

mempool_fragmentation_internal.png
Internal Fragmentation - Smallest Chunk

Managing lots of small memory chunks is not very efficient this way, but bigger memory chunks are more effective. Another example: The application allocates 54 Bytes of memory. A 64 Byte chunk fits this requirement leaving only 8 Bytes overhead and just 2 Bytes internal fragmentation. The application then uses 87.5% of the memory of the chunk. This example is illustrated in the picture below.

mempool_fragmentation_internal2.png
Internal Fragmentation - Bigger Chunk

Memory Chunk Layout

The memory chunk overhead depends on the CPU architecture and some compilation flags. The size and ptr fields depend on the address bus width and is 4 Byte on 32bit systems and 8 Byte on 64bit systems. The size fields at the beginning and end of the chunk are used for coalescing and always exist. Because the chunk size is a multiple of 16 (or higher) the lower bits of the size field are not used. The memory allocator make use of this and stores the free bit in the first size field. The ptr fields are use to link free chunks in a list. When the chunk is allocated this fields are used for user data.

mempool_chunk_1.png
Basic Memory Chunk Layout

The allocator implements some optional debugging functionality for analyzing problems. This is pure developer functionality and is not recommended for production. When enabling the compiler flag MEMORY_ENABLE_DEBUG this adds some additional memory boundaries in the chunk with the patterns 0xBBBB... (before) and 0xAAAA... (after) the user data. When freeing a chunk these boundaries get validated by the allocator to find memory corruptions. In addition new allocated memory is initialized with 0xCD and freed memory is set to 0xDD (deleted).

mempool_chunk_2.png
Memory Boundaries

In addition (or instead) you can enable MEMORY_ENABLE_TRACE which logs all memory allocations with file and line number into a separate memory trace buffer. This can be used to track down memory leaks. This features adds another field to the memory chunk header and requires additional memory for the trace buffer.

mempool_chunk_3.png
Memory Trace

Additional Memory Statistics

By enabling MEMORY_ENABLE_TRACK_USED_MEM the mempool tracks how much memory is currently allocated. This costs only one additional field in the mempool header structure. This field gets checked in mempool_cleanup to detect memory leaks. This feature can safely be enabled, as it has minimal memory and CPU requirements. In the trace output you will either see "No memory leaks." or "n Bytes leaked!". In the later case you can enable MEMORY_ENABLE_TRACE to track down the origin of the leak.

Another feature is MEMORY_ENABLE_STAT. By enabling this the mempool also tracks what was the maximum amount of used memory. This can be useful to figure out optimal mempool configurations.

Object Pools

The object pool manages an area of preallocated memory. It is used to allocate a number of fixed size objects rather than dynamic size as in conventional malloc implementations. The advantage is that managing a set of fixed size objects is very easy and fast compared to malloc and even more important: It does not suffer from memory fragmentation.

The object pool manages an preallocated memory area. This can either be a static area which is already reserved at compile time, or it can be an malloc’ed area which has been allocated at startup. In both cases, we know from the beginning how many objects can be allocated and thus can avoid out-of-memory scenarios at runtime.

When using statically preallocated memory, the linker can already produce out-of-memory errors when compiling. When using the second approach, you will get an out-of-memory error during the initialization phase. The latter option has the advantage that the pool sizes can be changed without recompiling. So it combines the best of both worlds: It is as flexible as using pure malloc, but it is as fast and reliable as using statically preallocated arrays.

Implementation

The object pool is a contiguous memory area with a small header at the beginning followed by an array of fixed size objects. The header contains the object size, the number of objects maintained and the index of the first free object, which is the anchor of the freelist.

Each object contains a next pointer (index) that points to the next free object. This next field is only used if the object is free and part of the freelist as shown in the following figure.

object_pool_1.png
Object Pool Memory Layout (Initial State)

When the object becomes allocated, it gets unlinked from the list and the whole object, including the next field, can be used for user data as shown below.

object_pool_2.png
Object Pool Memory Layout (One Object Allocated)

Usage

The following listing shows an example how to use the object pool.

struct foo {
double foo;
float bla;
int more;
int data;
};
extern struct appconfig g_appconfig;
static mem_objectpool *g_foopool;
int startup(void)
{
size_t size;
void *mem;
// allocate memory for objectpool
size = mem_objectpool_size(g_appconfig.num_foo, sizeof(struct foo));
mem = mem_malloc(size);
if (mem == NULL) return UA_EBADNOMEM;
// initialize object pool
g_foopool = mem_objectpool_init(g_appconfig.num_foo, sizeof(struct foo), mem);
return 0;
}
void useit(void)
{
// allocate object of pool
struct foo *obj = mem_objectpool_alloc(g_foopool);
// ...
// free object
mem_objectpool_free(g_foopool, obj);
}
void cleanup(void)
{
// free memory
mem_free(g_foopool);
}

How the memory for the objectpool is provided is up to the developer. The SDK normally follows the principle above of allocating this memory dynamically at startup. This allows configuration changes of pool sizes without recompiling.

Alternatively one could used preallocated static arrays. This way mallocs can be avoided, but the downside is lost flexibility.

struct foo {
double foo;
float bla;
int more;
int data;
};
#define NUM_FOOS 100
#define POOL_SIZE MEM_OBJECTPOOL_SIZE(NUM_FOOS, sizeof(struct foo))
static uint32_t g_memory[POOL_SIZE / sizeof(uint32_t) + 1];
extern struct appconfig g_appconfig;
static mem_objectpool *g_foopool;
int startup(void)
{
// initialize object pool
g_foopool = mem_objectpool_init(NUM_FOOS, sizeof(struct foo), g_memory);
return 0;
}
void useit(void)
{
// allocate object of pool
struct foo *obj = mem_objectpool_alloc(g_foopool);
// ...
// free object
mem_objectpool_free(g_foopool, obj);
}

Note that we've chosen uint32_t to force a 4 byte alignment. Statically allocated byte arrays may not be aligned correctly for the used structures inside the pool elements, which could lead to alignment problems at runtime. The "+1" is done for safety for the case that POOL_SIZE is not a multiple of sizeof(uint32_t).