Memory Management

The SDK adds the following facilities for your applications to manage memory for optimal performance.

Shared Memory Architecture

The Multiservices PIC has a configurable shared memory pool that consists of a large piece of TLB-mapped memory. The amount of memory is configured in megabytes per PIC, through the CLI. Accessing this data is very fast because no TLB misses are possible. Carved out of the shared memory are the forwarding database (FDB) and the policy database (PDB). The FDB provides access to the route information; the PDB is used to hold policy data for plugin applications only.

In addition to holding the FDB and PDB, shared memory can be used by applications to create shared memory arenas of any size. The size must be less than or equal to the amount configured, less the amounts for the FDB and PDB. To use all of the available shared memory when creating an arena, a size of zero can be used (a special indicator). Most applications will only need to create a single arena.

To share the memory with another process, a process needs to know the arena name that is given by the application creating the arena. Any number of processes can attach to an arena which will map the shared memory into its virtual memory address space. For the process creating the arena the attachment is implicit.

Pointer values to the memory within the arenas are valid across the processes that are attached to an arena. Shared memory application contexts can facilitate sharing pointer values or other data between processes. These are created with a name and are retrievable by name by any process that is attached to the arena in which the application context is made.

Object Cache Architecture

Applications can use the shared memory across processes and threads directly, guaranteeing that there will be no TLB misses. In such a multiprocessing environment, however, the thread-safe allocation needs to perform locking internally. When two threads allocate (or free) shared memory simultaneously, there is locking contention, and the allocations need to be serialized. Object cache memory management built on top of the shared memory arena alleviates the contention and can allow for allocations in parallel to improve performance. This is of particular interest for an applicationís data threads that are processing packets.

Object cache APIs allocate and free fixed-size chunks. For each structure, an object cache (oc) type can be created upon initialization with the size for the structure. This type can then be used to allocate that amount of memory from the shared memory arena using an object cache allocation library.

The object cache memory management works with allocation and deallocation APIs. They use the shared memory poolís memory, but upon deallocation the framework can cache the memory in a way optimized for the multiprocessing environment.

Object cache type creation can be done from any CPU, but allocation and deallocation is available to user and data CPUs only. These have a single real-time thread per CPU.

The object cache management framework frees memory to local CPU caches instead of freeing back to the shared memory arena. It also checks this cache first when an allocation is requested, so that allocating and freeing memory is often faster and achievable in parallel without contention. Of course, this might not be the case when the local cache is empty. In that case, the framework falls back to a global cache (shared by all CPUs), and if that is empty, the shared memory arena is used. The object caches (types), represented by oc in the following diagram, are layered as shown:

objcache-alloc-g016939.gif

The Object Cache

Each CPU has a bucket with which to hold onto objects for a given object cache. Object cache allocation first attempts to serve the request from the local object cache bucket for that CPU, and then goes to the global depot for the object cache. If both are insufficient, shared memory is reached through the arena passed during the object cache type creation.

When memory is low in the arena, it is periodically reclaimed from the local caches to the global depot, and from there back to the arena.

Performance can be improved by using object cache because access from the local object cache is contention free. Access from the global depot requires a lock, but the lock is per object cache type, so contention can be minimal. However, an access from arena needs a global lock for all CPUs.

The functions for allocating object cache are at sandbox/src/junos/lib/libmp-sdk/h/jnx/msp_objcache.h in your backing sandbox, and are documented in the SDK Library Reference documentation.

Example of Object Cache Allocation

The following example shows how to allocate the object cache.

msp_shm_handle_t shm_handle;    // handle for shared memory allocator
msp_oc_handle_t  table_handle;  // handle for OC table allocator 
msp_shm_params_t shmp;          // SHM setup pararmeters
msp_objcache_params_t ocp;      // OC type setup parameters
hashtable_t *    flows_table;   // pointer to the hashtable of flows
int obj_cache_id = 0;           // non-plugins use 0; o/w use service set id

shm_handle = table_handle = flows_table = NULL;

bzero(&shmp, sizeof(shmp));
bzero(&ocp, sizeof(ocp));

// allocate & initialize the shared memory arena

strlcpy(shmp.shm_name, "my arena", SHM_NAME_LEN);
// shmp.size is 0, so we use all available shared memory

if(msp_shm_allocator_init(&shmp) != MSP_OK) {
    // fail
}

shm_handle = shmp.shm; // get SHM handle

// create object cache allocator type for the flow look up table
ocp.oc_shm = shm_handle;
ocp.oc_size  = sizeof(hashtable_t);
strlcpy(ocp.oc_name, "my flow table", OC_NAME_LEN);

if(msp_objcache_create(&ocp) != MSP_OK) {
    // fail
}

table_handle = ocp.oc; // get OC handle


// allocate flows_table in OC:

flows_table = msp_objcache_alloc(table_handle,
                  msp_get_current_cpu(), obj_cache_id);
if(flows_table == NULL) {
    // fail
}

Object Cache Configuration

To tune SDK application scaling, use the object-cache-size setting in the configuration, specifying a value that is a multiple of 128. The amount of shared memory specified (in megabytes) is set aside for the applications, in the following diagram. FDB and PDB sizes are also configured and limited to the configured size. For the Multiservices-100 PIC, the valid range is from 128 through 512 MB; for the Multiservices-400 PIC or Multiservices DPC, the range is from 128 through 1280 MB. However, if you set wired process memory as well, the maximum value for object cache on the Multiservices-100 PIC is 128, and on the Multiservices-400 PIC, 768.

Wired Memory

To improve performance for memory or cache-intensive applications, you can "wire down" text, data, and heap for a particular process. The stack is not wired.

Wired process memory is memory used to wire down a process's memory segments to avoid TLB misses. It includes heap memory but not shared memory. 512 MB is the default size of wired process memory and the maximum size of wired process memory available. The SDK designates wired memory with the "big TLB" (BTLB) acronym.

If wired process memory is exhausted, the process uses unwired memory. When you specify a size for a process using the call setrlimit(RLIMIT_DATA, &limit), the system uses the wired memory you configured, plus any unwired memory necessary to reach the limit.

Both wired memory and shared memory arenas (discussed earlier) use statically wired TLB entries in the Multiservices PIC to avoid TLB misses, which normally slow down memory accesses. A shared memory arena and, specifically, object cache, is intended for use in applications with data (real-time) threads. Wired process memory is mainly intended to improve performance in critical control applications (that is, servers) that already use the malloc and free functions.

Setting Up Wired Memory

To reserve wired process memory, you first configure the wired-process-mem-size statement. For example:

adaptive-services {
    service-package {
        extension-provider {
            wired-process-mem-size 512;
        }
    }
}

Only one process per PIC can use the wired memory. To identify it, edit the processís build Makefile, adding the BTLB_BINARY tag.

Following is a sample makefile with the new tag:

PROG = my-example 

SRCS = my-example.c 

BTLB_BINARY = YES

DPLIBS += ${MY_DP_LIBS}

.include <version.mk>
.include <bsd.prog.mk>

Running a Process to Use Wired Memory

To run a wired binary through the SDK packaging framework and execute the init process, you first create a script to invoke the binary to run. The script passes BTLB_FLAG=BTLB_EXEC flag. For example:

#!/bin/sh
 
BTLB_FLAG=BTLB_EXEC /opt/sdk/sbin/MyBTLBProg -N 

Then, add a reference to the script in the package configuration (.conf) file. For example:

/* 
* Base configuration for init (/etc/init.conf)
*/
        
process "MyBTLBProg" {
    action once;
    command "/opt/sdk/sbin/MyBTLBProg.sh";
    failure-command "/etc/reboot.sh";
}

Caveats for Using Wired Memory

The following considerations apply when you use wired memory:


© 2007-2009 Juniper Networks, Inc. All rights reserved. The information contained herein is confidential information of Juniper Networks, Inc., and may not be used, disclosed, distributed, modified, or copied without the prior written consent of Juniper Networks, Inc. in an express license. This information is subject to change by Juniper Networks, Inc. Juniper Networks, the Juniper Networks logo, and JUNOS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners.
Generated on Sun May 30 20:26:47 2010 for Juniper Networks Partner Solution Development Platform JUNOS SDK 10.2R1 by Doxygen 1.4.5