Architectural Patterns for Parallel Programming

Communicating Sequential Elements

The Communicating Sequential Elements pattern is a domain parallelism pattern where each component performs the same operations on different pieces of ordered data [Fos94, CT92]. Operations in each component depend on partial results in neighbour components. Usually, this pattern is presented as a logical structure, conceived from the ordered data.

Examples
A Non-programming Example

Consider the case of honey bees. The workers act as constructors, harvesters, soldiers, and nursemaids. Each individual is capable to eventually perform any of these activities in cooperation with the others during its short lifetime. Minute-to-minute control of the hive’s behaviour is dispersed. Changes, like many shifts in foraging patterns of honey bees come from signals among the workers, whose success or failure at various tasks can rise hive-permeating hormone levels that then brings about changes in the whole hive.

A Programming Example

Consider the case of data mining problems, in which government and businesses organisations require information processing to acquire and analyse data about customers information, products, taxes, etc., devoting a lot of computational effort to automatically extract useful information or "knowledge" from these data. A particular type of data mining is mining for associations [CSG97, AS96], in which it is important to discover relations or associations among the information to generate rules for the inference of behaviour. For example, given a database in which records correspond to customer purchase transactions, the goal in mining for associations is to determine which sets of a number of items occur together in more than a given threshold fraction of these transactions.

A proposed solution uses an algorithm of several passes over the database. The first pass simply counts item occurrences to determine sets of items with frequency 1. A subsequent pass, say k, consists of two parts or phases: first the sets of items with frequency k-1 are used to generate a set of candidates, say C_k,, using a certain generation procedure. Next, the database is scanned and the support of candidates in C_k is counted. Fast counting is required to efficiently determine the candidates in C_k contained in a given transaction. Data distribution is used for this. Each processing element counts mutually exclusive local candidates. With a large number of elements, a large number of candidates can be counted in a pass. However, as part of the procedure, every element must broadcast its local data to all other elements at the end of every pass. Finally, after k passes, the results are available for analysis [CSG97, AS96].

Problem

A computation is required that can be performed as a set of quasi-independent operations on ordered data. Results cannot be constrained to a one-way flow among processing stages, but each component executes a relatively autonomous computation. Communications between components follow fixed and predictable paths. Consider, for example, a dynamics problem simulation: the data represents a model of a real system, where any change or modification in one region influences areas above and below it, and perhaps to a different extent, those on either side. Over time, the effects propagate to other areas, extending in all directions; even the source area may experience reverberations or other changes from neighbouring regions. If this simulation was executed serially, it would require that computations be performed across all the data to obtain some intermediate state, and then, a new iteration should begin. Generally, performance as execution time is the feature of interest.

Forces

Considering the problem description and granularity and load balance, as other elements of parallel design [Fos94, CT92], the following forces should be considered:

Preserve the precise order of data distributed among processing elements.

Computations are performed autonomously, on local pieces of data.

Every element performs the same operations, in number and complexity.

Partial results are usually communicated among neighbour processing elements.

Improvement in performance is achieved when execution time decreases.

Solution

Parallelism is introduced as multiple participating concurrent components, each one applying the same set of operations on a data subset. Components communicate partial results by exchanging data, usually through communication channels. No data objects are directly shared among components; each one may access its own private data subset only. A component communicates by sending data objects from its local space to another. This communication may have different variants: synchronous or asynchronous, exchange of a single data object or a stream of data objects, and one to one, one to many, many to one or many or many communications. Often the data of the problem can be conceived in terms of an ordered logical structure. The solution is presented as a network that may reflect this logical structure in a transparent and natural form [CG88, Shaw95, Pan96].

Structure

In this pattern, the same operation is almost simultaneously applied to different pieces of data. However, operations in each element depend on the partial results of operations in other components. The structure of the solution involves an ordered logical structure, conceived from the data structure of the problem. Therefore, the solution is presented as a network of elements that in general follows the shape imposed by this structure. Identical components simultaneously exist and process during the execution time (Figure 7).

Figure 7. Communicating Sequential Elements.

Participants

Sequential element. The responsibilities of a processing element are to perform a set of operations on its local data, and to provide a general interface for sending and receiving messages.

Communication channels. The responsibilities of a communication channel are to represent a medium to send and receive data between concurrent elements, and to synchronise communication activity between them.

Figure 8. Scenario of Communicating Sequential Elements.

Dynamics

A typical scenario to describe the basic run-time behaviour of this pattern is presented, where all the Sequential Elements are active at the same time. Every Sequential Element performs the same operations, as a piece of a processing network. In the most simple case, each one communicates only with a previous and next others (figure 8). The processing and communicating scenario is as follows:

A computation is started when all components Element N-1, Element N, Element N+1, etc. perform at the same time operation Op.1.

To continue the computation, all component communicate their partial results through the communication channels available (Here, Channel A and Channel B). Then all components synchronise and receive the partial results from their previous and next neighbours.

Once synchronisation and communications are finished, each component continues computing the next operation (in this case Op.2). The process repeats until each component has finished its computations.

Implementation

The implementation process is based on the four stages mentioned above in the Context in general and Implementation in general sections.

Partitioning. The ordered logical structure of data is a natural candidate to be initially decomposed into a network of data sub-structures or pieces. Depending on the precision required, data pieces have different size and shape. However, in order to maintain the process load-balance, they normally present the same size and shape. Trying to expose the maximum concurrency, the basic sequential element is defined to process a unique sequence of operations on only one data piece. Hence, the total number of sequential elements is equal to the number of data pieces, and each computation presents the same complexity per time step. While each sequential element performs the same operations on different data pieces, they share the same processing nature and structure. However, they can be represented by a single processing element (for instance, a process, task, function, object, etc.) or a subsystem of processing elements, which may be designed using design patterns [GHJV95, POSA96, PLoP94, PLoP95]. Some design patterns that particularly can be considered for implementing concurrent components are the Active Object pattern [LS95], and the "Ubiquitous Agent" pattern [JP96].

Communication. The communication is also associated with the network of data pieces. Each sequential element is expected to exchange partial results with its neighbours from time to time through channels. Channels should perform data exchange and coordinate the operation execution appropriately. An efficient communication depends on the amount and format of the data to be exchanged, and the synchronisation schema used. Both synchronous and asynchronous schemes can be found in several parallel systems, but a synchronous schema is commonly preferred. An important issue to consider here is how communication channels are defined. In general, this decision is linked with the programming language used. Some languages define precisely a type channel where it is possible to send and to receive values. Any sequential element is defined to write on the channel, and to read from it. No further implementation is necessary. On the other hand, other languages do not define the channel type. Thus, it should be designed and implemented in such a way that allows data exchange between elements. As the use of channels depends on the language, decisions about their implementation are delayed to other refining design stages. From an architectural point of view, channels are defined, whether they are implicit in the language, or they need to be explicitly created. Design patterns that can help with the implementation of channel structures are the Composite Messages pattern [SC95] and the Service Configurator pattern [JS96].

Agglomeration. The sequential elements and channels defined are evaluated with respect to the expected performance. Often, the number of processing and communicating elements is bigger than what is required, and some degree of agglomeration can be considered. Causes of agglomeration can be redundant communications, and the amount of communications in a dimension or direction. As each sequential element performs the same operations, changes in the granularity involve only the size of the amount of data pieces in the network to be processed per component. If they maintain the same granularity, the structure is normally load balanced. The conjecture-test approach is only used then to modify the granularity to achieve a better performance.

Mapping. In the most optimistic case, each sequential element is assigned to a processor. However, usually the number of processors is a number of times less than the number of processing elements. Taking this number as the number of elements per processor, it is possible to assign them in a more realistic form. The important feature to maximise processor utilisation and minimise communication costs, is balance. In these structures, computational efficiency is decreased due to load imbalances. If the design is to be used extensively, it is worthwhile to improve its load balance. Approaches to do this can use cyclic mapping or dynamic mapping. As a "rule of thumb", systems based on the Communicating Sequential Elements pattern will perform best on a SIMD (single-instruction, multiple-data) computer, if array operations are available. However, if the computations are relatively independent, a respectable performance can be achieved using a shared-memory system [Pan96].

Consequences
Benefits

Data order and integrity is granted, due to each sequential element accesses only its own local data subset, and there is no data directly shared among components [SG96, ST96].

As all sequential elements share the same functional structure, their behaviour can be modified or changed without great effort [SG96, ST96].

It is relatively easy to structure the solution in a transparent and natural form as a network of elements, reflecting the logical structure of data in the problem [CG88, Shaw95, Pan96].

As all components perform the same computation, granularity is independent of functionality, depending only on the size and number of the elements in which data is divided. It is easily to change in case a better resolution or precision is required.

This pattern can be used on most hardware systems, considering the synchronisation characteristic between elements as the only restriction [Pan96].

Liabilities

The performance of systems based on communicating elements is heavily impacted by the communication strategy (global or local) used. Usually, the processors available are not enough to support all elements. In order to apply a computation, each processor operates on a subset of the data. Due to this, dependencies between data, expressed as communications, can slow down the program execution [Fos94, Pan96].

In the use of this pattern load balance is still a difficult problem. Often, data is not easily divided into same size subsets of data and the computational intensity varies on different processors. To maintain synchronisation means that fast processors must wait until the slow ones can catch up, before the computation can proceed to the next set of operations. An inadequate load balance impacts strongly on performance. The decision to use this pattern should be based on how uniform in almost every aspect can the system be [Pan96].

The synchronous characteristic of the application determines efficiency. If the application is synchronous, a significant amount of effort is required to get a minimal increment in performance. If the application is asynchronous, it is more difficult to parallelise, and probably the effort will not be worthwhile, unless communications between processors are very infrequent [Pan96].

Known uses

The one-dimensional wave equation, used to numerical model the motion of vibrating systems, is an example of the Communicating Sequential Elements pattern. The vibrating system is divided in sections, and each processing element is responsible for the computation of the position at any moment of a section. Each computation depends only on partial results of the computation at neighbouring sections. Thus, each computation can be done independently, except when data is required from the previous or next sections [NHST94].

Simulation of dynamic systems, such as an atmosphere model, is another use of Communicating Sequential Elements. The model usually is divided as a rectangular grid of blocks. The simulation proceeds in a series of time steps, where each processing element computes and updates the temporal state in a block with data of the previous state and updates of the state of neighbouring blocks. Integrating the time steps and the blocks makes possible to determine the state of the dynamic system at some future time, based on an initial state [Fos94].

Image processing problems, such as the component labelling problem. An image is given as a matrix of pixels, and each pixel must be labelled according to certain property - for instance, connection. The image is divided in sub-images, and mapped to a network of processing elements. Each processing element tests for connection, and labels all the non-edge pixel of its sub-image. Edge pixels between sub-images are labelled in cooperation by the two respective processing elements [Fos94].

Related patterns

The Communication Sequential Elements pattern is based on the original concept of Communicating Sequential Processes (CSP) [Hoare84]. Patterns that can be considered related to this processing approach are the Ubiquitous Agent Design Pattern [JP96] and the Visibility and Communication between Agents pattern [ABM96].

Contact Information

Jorge Luis Ortega Arjona.

E-mail jortega-arjona@acm.org