5. MAD RPC Processing

The python-rdma package includes a simplified system for processing MAD based RPCs defined in the InfiniBand Architecture. Most of the tedious processing is taken care of automatically and the user sees only the actual RPC payload they are interested in.

For example, this displays the SMPPortInfo for the local port:

end_port = rdma.get_end_port();
path = rdma.path.IBDRPath(end_port);
with rdma.get_umad(end_port) as umad:
    pinf = umad.SubnGet(IBA.SMPPortInfo,path);
    pinf.printer(sys.stdout);

5.1. rdma.madtransactor MAD RPC Mixin

MADTransactor provides the base set of methods for doing MAD RPC. Derived classes uses this as a mixin to provide the basic API.

The user visible API is the IBA defined RPC names, eg SubnGet() which performs that named RPC.

Two modes of operation are possible, synchronous and asynchronous. In synchronous mode the RPC API will return the decoded reply payload. In asynchronous mode the RPC API will return the request message details. The mode in use is determined by the derived class.

The API is quite simplified:

# Return the SMPPortInfo for port 1
pinf = mad.SubnGet(IBA.SMPPortInfo,path,1);
print pinf.masterSMLID;

Under the covers the MADTransactor produces a rdma.IBA.SMPFormat or rdma.IBA.SMPFormatDirected that contains as a payload a zero rdma.IBA.SMPPortInfo. The attributeID is set to IBA.SMPPortInfo.MAD_ATTRIBUTE_ID and the argument is tested to ensure that SubnGet is a legal RPC.

When a valid reply is received the generic MAD header is processed and errors are converted into exceptions. The payload is unpacked and a new rdma.IBA.SMPPortInfo is returned.

All RPC functions have a similar signature:

RPC(payload, path, attributeModifier=0)
Parameters:
  • payload (rdma.binstruct.BinStruct derived class adhering to the MAD protocol) – The RPC type to execute. If it is a class then the request payload is zero, otherwise the content of the instance is sent as the request.
  • path (rdma.path.IBPath) – A reversible path specifying the target node.
  • attributeModifier (int) – The value of the generic MAD rdma.IBA.MADHeader.attributeModifier field
Returns:

If payload is class then an instance of that class, otherwise a new instance of payload.__class__.

Raises:

Support is also provided for processing incoming MADs as a server. The basic template is:

try:
    fmt,req = umad.parse_request(buf,path);
    raise rdma.MADError(req=fmt,req_buf=buf,path=path,
        reply_status=IBA.MAD_STATUS_UNSUP_METHOD_ATTR_COMBO,
        msg="Unsupported attribute ID %s"%(
            fmt.describe()));
except rdma.MADError as err:
    err.dump_detailed(sys.stderr,"E:",level=1);
    umad.send_error_exc(err);

fmt will be an instance of the appropriate class format structure, and req will be an instance of the appropriate payload structure. Continued parsing of the request should happen within the try block and errors raised as rdma.MADError with reply_status set appropriately. Once a reply is prepared use rdma.madtransactor.MADTransactor.send_reply().

class rdma.madtransactor.MADTransactor

This class is a mixin for everything that implements a MAD RPC transaction interface. Derived classes must provide the _execute() method which sends the MAD and gets the reply.

By design instances of this interface cannot be multi-threaded. For multi-threaded applications each thread must have a separate MADTransactor instance. Simple MAD request/reply transactors return payload, other attributes for the last processed reply are available via instance attributes.

Paths used with this object can have a MKey (for SMPs) and SMKey (for SA GMPs) attribute.

PerformanceGet(payload, path, attributeModifier=0)
PerformanceSet(payload, path, attributeModifier=0)
SubnAdmGet(payload, path=None, attributeModifier=0)
SubnAdmGetTable(payload, path=None, attributeModifier=0)
SubnAdmSet(payload, path=None, attributeModifier=0)
SubnGet(payload, path, attributeModifier=0)
SubnSet(payload, path, attributeModifier=0)
VendGet(payload, path, attributeModifier=0)
VendSet(payload, path, attributeModifier=0)
do_async(op)

This runs a simple async work coroutine against a synchronous instance. In this case the coroutine yields its own next result.

end_port

The end_port this is associated with

static get_request_match_key(buf)

Return a tuple for matching a request MAD buf. The tuple is ((oui << 8) | mgmtClass,(baseVersion << 8) | classVersion,attributeID). Where oui is 0 if this is not a vendor OUI MAD.

is_async

True if this is an async MADTransactor interface.

parse_request(rbuf, path)

Parse a request packet into a format and data.

Raises rdma.MADError:
 If the packet could not be parsed.
reply_fmt

The MADFormat for the last reply packet processed

reply_path

The path for the last reply packet processed

send_error_exc(exc)

Call send_error_reply() with the arguments derived from the rdma.MADError exception passed in.

send_error_reply(buf, path, status, class_code=0)

Generate an error reply for a MAD. buf is the full original packet. This entire packet is returned with an appropriate error code set. status and class_code should be set to the appropriate result code.

send_reply(ofmt, payload, path, attributeModifier=0, status=0, class_code=0)

Generate a reply packet. ofmt should be the request format.

trace_func

A function to call for tracing.

rdma.madtransactor.dumper_tracer(mt, kind, fmt=None, path=None, ret=None)

Logs full decoded packet dumps of what is happening to sys.stdout. Assign to rdma.madtransactor.MADTransactor.trace_func.

rdma.madtransactor.simple_tracer(mt, kind, fmt=None, path=None, ret=None)

Simply logs summaries of what is happening to sys.stdout. Assign to rdma.madtransactor.MADTransactor.trace_func.

5.2. rdma.umad Userspace MAD Interface

The userspace MAD interface is normally instantiated by rdma.get_umad() which will select the appropriate implementation for the platform.

class rdma.umad.LazyIBPath(end_port, **kwargs)

Bases: rdma.path.LazyIBPath

Similar to rdma.path.IBPath but the unpack of the UMAD AH is deferred until necessary since most of the time we do not care.

end_port is the rdma.devices.EndPort this path is associated with. kwargs is applied to set attributes of the instance during initialization.

class rdma.umad.UMAD(parent)

Bases: rdma.tools.SysFSDevice, rdma.madtransactor.MADTransactor

Handle to a UMAD kernel interface. This class supports the context manager protocol.

parent is the owning rdma.devices.EndPort.

recvfrom(wakeat)

Receive a MAD packet. If the value of rdma.tools.clock_monotonic() exceeds wakeat then None is returned.

Returns:tuple(buf,path)
register_client(mgmt_class, class_version, oui=0)

Manually register a MAD agent. This is done automatically for sending MADs, this API is mainly intended for special cases..

register_server(mgmt_class, class_version, oui=0, method_mask=0)

Register to receive MADs that match the given pattern. method_mask is a bitmask of the method ID to match, oui is only used for rdma.IBA.VendOUIFormat MADs.

register_server_fmt(fmt)

Same as register_server() except the arguments are deduced from fmt which should be derived from rdma.binstruct.BinFormat.

sendto(buf, path, agent_id=None)

Send a MAD packet. buf is the raw MAD to send, starting with the first byte of rdma.IBA.MADHeader. path is the destination.

5.3. rdma.vmad Verbs MAD Interface

The verbs MAD interface can be used to send GMP MADs (eg to QPN 1), which is useful for SA communication. This class creates a UD QP using verbs and uses that to send all GMPs. This means the the source QPN of the GMP will not be 1, which is a configuration supported by IBA.

class rdma.vmad.VMAD(parent, path, depth=16)

Bases: rdma.madtransactor.MADTransactor

Provide a UMAD style interface that runs on ibverbs. This can be used with GMP (eg QPN=1) traffic.

path is used to set the PKey and QKey for all MADs sent through this interface.

close()

Free the resources held by the object.

end_port

rdma.devices.EndPort this is associated with.

recvfrom(wakeat)

Receive a MAD packet. If the value of rdma.tools.clock_monotonic() exceeds wakeat then None is returned.

Returns:tuple(buf,path)
sendto(buf, path)

Send a MAD packet. buf is the raw MAD to send, starting with the first byte of rdma.IBA.MADHeader. path is the destination.

5.4. rdma.sched Parallel MAD Scheduler

MADSchedule is a parallel MAD scheduling system built using Python coroutines as the scheduling element. It provides for very simplified programming of parallel MAD operations.

A simple use of the class to fetch rdma.IBA.SMPNodeInfo for a list of paths:

def get_nodeinfo(sched,node):
    node.ninf = yield sched.SubGet(IBA.SMPNodeInfo,node.path);

nodes = [..];
sched = rdma.sched.MADSchedual(umad);
sched.run(mqueue=(get_nodeinfo(sched,I) for I in nodes));

The scheduler will pull coroutines from the mqueue argument and runs them to return MADs to send, bounding the total outstanding MAD count and returning replies as the result of yield.

Simplified, MADSchedule manages a set of generators and coroutines and schedules when each is running. Generators yield coroutines and coroutines yield MADs to execute. Generators are started by calling mqueue(). Typically this would be done using a generator expression as an argument, but this is not required.

Coroutines are the functions that actually process the MADs. They are started either by being yielded from a generator or via the queue() call. The typical format of a coroutine is:

def get_nodeinfo(sched,node):
    node.ninf = yield sched.SubGet(IBA.SMPNodeInfo,node.path);

MADSchedule implements the asynchronous interface for MADTransactor, so the RPC functions return the MAD to send. The coroutine yields these MADs back to the scheduler which issues them on the network and waits for a reply. When a reply (or exception) is returned for the MAD the yield statement will return that exactly as though the synchronous interface to MADTransactor was being used.

While a coroutine is yielded other coroutines can execute until rdma.sched.MADSchedule.max_outstanding MADs are issued, at which point the scheduler waits for MADs on the network to complete. As coroutines exit queued generators are called to produce more coroutines until there is no more work to do.

A coroutine may also yield another coroutine. In this instance the scheduler treats it as a function call and runs the returned coroutine to completion before returning from yield. If the coroutine produces an exception then it will pass through the yield statement as well. The called coroutine can return a result to the parent by setting the result attribute before returning.

This example shows how to perform directed route discovery of a network using parallel MAD scheduling:

def get_port_info(sched,path,port,follow):
    pinf = yield sched.SubnGet(IBA.SMPPortInfo,path,port);
    if follow and pinf.portState != IBA.PORT_STATE_DOWN:
        npath = rdma.path.IBDRPath(end_port,drPath=path.drPath + chr(port));
        yield get_node_info(sched,npath);

def get_node_info(sched,path):
    ninf = yield sched.SubnGet(IBA.SMPNodeInfo,path);
    if ninf.nodeGUID in guids:
        return;
    guids[ninf.nodeGUID] = ninf;

    if ninf.nodeType == IBA.NODE_SWITCH:
        sched.mqueue(get_port_info(sched,path,I,True)
                     for I in range(1,ninf.numPorts+1));
        pinf = yield sched.SubnGet(IBA.SMPPortInfo,path,0);
    else:
        yield get_port_info(sched,path,ninf.localPortNum,
                            len(path.drPath) == 1);

guids = {};
with rdma.get_umad(endport) as umad:
    sched = rdma.sched.MADSchedule(umad);
    local_path = rdma.path.IBDRPath(end_port);
    sched.run(get_node_info(sched,local_path));

5.4.1. What can be Yielded

A generator can yield:
  • A coroutine. The coroutine is scheduled to run as though rmda.madschedule.MADSchedule.queue() had been called. The generator does not wait for the coroutine to finish.
  • The result of queue(), or the result of yield’ing a coroutine. Yield will return once the thing queued is finished.
  • The result of mqueue() - yield will return once the generator is exhausted and all the coroutines it spawned are finished.
A coroutine can yield:
  • The result of a RPC call function (a tuple describing the MAD to send). The yield result will be the MAD reply.
  • A coroutine. The yield result will be value of result when the coroutine raises StopIteration or True if it is None, once the coroutine exits. Exceptions raised by the coroutine will propagate through the yield as though the yield was a function call.
  • A generator. This is identical to yielding a coroutine - the generator runs sequentially through its work and blocks at each yield.
  • The result of queue() - yield will return once the thing queued is finished.
  • The result of mqueue() - yield will return once the generator is exhausted and all the coroutines it spawned are finished.
  • None - yield immediately returns. This is useful for calling something that might be a coroutine or a normal function that returns None.
class rdma.sched.Context(op, gengen, parent=None)

Bases: object

class rdma.sched.MADSchedule(umad)

Bases: rdma.madtransactor.MADTransactor

This class provides a MADTransactor interface suitable for use by python coroutines. The implementation gets MAD parallelism by running multiple coroutines at once. coroutines are implemented as generators.

umad is a rdma.umad.UMAD instance which will be used to issue the MADs.

class Work

Bases: tuple

Work(buf, fmt, path, newer, completer)

buf

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

completer

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

fmt

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

newer

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

path

itemgetter(item, ...) –> itemgetter object

Return a callable object that fetches the given item(s) from its operand. After, f=itemgetter(2), the call f(r) returns r[2]. After, g=itemgetter(2,5,3), the call g(r) returns (r[2], r[5], r[3])

MADSchedule.is_async
MADSchedule.max_outstanding

Maximum number of outstanding MADs at any time.

MADSchedule.mqueue(works)

works is a generator returning coroutines. All coroutines can run in parallel.

Returns:An opaque context reference.
MADSchedule.queue(work)

work is a single coroutine, or work is a tuple of coroutines.

Returns:An opaque context reference.
MADSchedule.result

Set to return a result from a coroutine

MADSchedule.run(queue=None, mqueue=None)

Schedule MADs. Exits once all the work has been completed. queue and mqueue arguments as passed straight to the queue() and mqueue() methods.

5.5. rdma.satransactor Automatic SubnGet to SubnAdmGet Conversion

IBA provides two ways to get information about objects manages by a SMA - the first is a SubnGet SMP RPC to the end port, the second is a SubnAdmGet GMP RPC to the SA. These should return the same information and are generally interchangeable.

This class provides an easy way for tools to access the information either using SubnGet or using SubnAdmGet without really affecting the source code. The SubnGet is transparently recoded into a SubnAdmGet with the proper query components set from the path and attribute ID and proper unwrapping of the SA reply.

The class can wrapper both synchronous and asynchronous MADTransactor instances. When wrappering a synchronous instance the class can also automatically resolve a DR path to a LID for use with the SA.

I highly recommend that all tools with the cabability to perform SubnGet provide an option to use this class to rely on the SA. IBA defines operation modes that would deny all SubnGet operations without a valid MKey.

Example:

end_port = rdma.get_end_port();
path = rdma.path.IBDRPath(end_port);
with rdma.satransactor.SATransactor(rdma.get_gmp_mad(end_port)) as umad:
    pinf = umad.SubnGet(IBA.SMPPortInfo,path);
    pinf.printer(sys.stdout);
class rdma.satransactor.SATransactor(parent)

Bases: rdma.madtransactor.MADTransactor

This class wrappers another MADTransactor and transparently changes SMP queries into corrisponding SA queries. It is useful to write applications that need to support both methods.

There are some limitations due to how the SA interface is defined. SMPPortInfo requires the port number, which is often 0. This requires extra work unless the node type is known since the SA does not support the same port 0 semantics. Generally using ninf.localPortNum as the attributeModifier works around this.

When using the async interface it is not possible to use a IBDRPath since that requires multiple MADs to resolve the DR path to a LID through the SA.

The class will collect and cache information in the path to try and work around some of these issues.

It is also a context manager that wrappers the parent‘s close().

parent is the MADTransactor we are wrappering.

SubnGet(payload, path, attributeModifier=0)
close()
get_path_lid(path)

Resolve path to a LID. This is only does something if path is directed route.

is_async
prepare_path_lid(path)

Coroutine to resolve path to a LID. This only does something if path is directed route. This must be performed when using directed route paths with asynchronous MAD transactors.

result