Architectural Patterns for Parallel Programming

Manager-Workers

The Manager-Workers pattern is a variant of the Master-Slave pattern [POSA96] for parallel systems, considering an activity parallelism approach where the same operations are performed on ordered data. The variation is based on the fact that components of this pattern are proactive rather than reactive [CT92]. Each processing component simultaneously performs the same operations, independent of the processing activity of other components. An important feature is to preserve the order of data.

Examples
A Non-programming Example

Suppose a simple scheme for a repair centre, involving a manager and a group of technicians. The manager is responsible for receiving articles, and assigning an article to be repaired by a technician. All technicians have similar skills for repairing articles, and each one is responsible to repair one article at a time, independent of the other technicians. When a technician finishes repairing his assignment, he notifies the manager; the manager then assigns him a new article to be repaired, and so on. In general, repairing articles represents an irregular problem: some articles may present a simple fix and take a little amount of time, while others may require a more complex repair. Also, the effectiveness of this scheme relies on the fact that the number of articles that arrive to the centre can be substantially larger than the number of technicians available.

A Programming Example

Consider a real-time ultrasonic imaging system [GSVOM97], designed to acquire, process and display a tomographic image. Data is acquired based on the reflection of an ultrasonic signal that excites an array of 56 ceramic sensors. Data is amplified and digitalised to form a black and white image of 56´ 256 pixels, each one represented by a byte. An interpolation program is required to process the image, enlarging it to make it clearer to the observer. The image is displayed on a standard resolution monitor (640´ 480 pixels) in real-time, this is, at least 25 frames per second. In accordance with these requirements, an interpolation by a factor 3 between columns was chosen, enlarging the information of each image three times. A calculation shows the volume of data to be processes per second: each frame is represented as 168´ 256´ 1 bytes, and using 25 frames per second, makes a total of 1.075200 Mbytes per second. Using a manager-worker system for the cubic interpolation, the image is received a stream of pixels by the manager, which assigns to each worker a couple of pixels. Each worker uses each couple of pixels as input data, and calculates the cubic interpolation between them, producing other four interpolated pixels. As the number of workers is less than the total number of pixels, each worker requests for more work to the manager as soon as it finishes its process, and so on.

Problem
A computation is required where independent computations are performed, perhaps repeatedly, on all elements of some ordered data. Each computation can be performed completely, independent of the activity of other elements. Data is distributed among components without a specific order. However, an important feature is to preserve the order of data. Consider, for example, an imaging problem, where the image can be divided into smaller sub-images, and the computation on each sub-image does not require the exchange of messages between components. This can be carried out until completion, and then the partial results gathered. The overall affect is to apply the computation to the whole image. If this computation is carried out serially, it should be executed as a sequence of serial jobs, applying the same computation to each sub-image one after another. Generally, performance as execution time is the feature of interest.

Forces

Using the previous problem description and other elements of parallel design, such as granularity and load balance [Fos94, CT92], the following forces are found:

Preserve the order of data. However, the specific order of data distribution and operation among processing elements is not important

The same computation can be performed independently and simultaneously on different pieces of data.

Data pieces may exhibit different sizes.

Changes in the number of processing elements should be reflected by the execution time.

Improvement in performance is achieved when execution time decreases.

Solution

Parallelism is presented by having multiple data sets processed at the same time. The most flexible representation of this is the Manager-Workers pattern approach. This structure is composed of a manager component and a group of identical worker components. Each worker is capable of performing the same computation. It repeatedly seeks a task to perform, performs it and repeats; when no tasks remain, the program is finished. The execution model is the same, independent of the number of workers (at least one). If tasks are distributed at run time, the structure is naturally load balanced: while a worker is busy with a heavy task, another may perform several shorter tasks. To preserve data integrity, the manager program takes care of what part of the data has been operated on, and what remains to be computed by the workers [POSA96, CG88, Shaw95, Pan96, CT92].

Structure

The Manager-Workers pattern is represented as a manager, preserving the order of data and controlling a group of processing elements or workers. Conceptually, workers have access to different pieces of data at the same time. Usually, only one manager and several identical worker components simultaneously exist and process during the execution time (Figure 9).

Figure 9. Manager-Workers pattern.

Participants

Manager. The responsibilities of a manager are to create a number of workers, to partition work among them, to start up their execution, and to compute the overall result from the sub-results from the workers.

Worker. The responsibility of a worker is to seek for a task, to implement the computation in the form of a set of operations required by the manager, and to perform the computation.

Dynamics

A typical scenario to describe the run-time behaviour of the Manager-Worker pattern is presented, where all participants are simultaneously active. Every worker performs the same operation on its available piece of data. As soon as it finishes processing, returns a result to the manager, requiring for more data. Communications are restricted between the manager and each worker. No communication between workers is allowed (figure 10). In this scenario, the steps to perform a set of computations is as follows:

Figure 10. Scenario for the Manager-Workers pattern.

All participants are created, and wait until a computation is required to the manager. When data is available to the manager, this divides it, sending data pieces to each waiting worker.

Each worker receives the data and starts processing an operation Op on it. This operation is independent to the operations on other workers. When the worker finishes processing, it returns a result to the manager, requiring for more data. If there is still data to be operated, the process repeats.

The manager is usually replying to requests of data from the workers or receiving their partial results. Once all data pieces have been processed, the manager assembles a total result with the partial results and the program finishes. The non-serviced requests of data from the workers are ignored.

Implementation

The implementation process is based on the four stages mentioned above in the Context in general and Implementation in general sections.

Partitioning. The ordered data to be operated on by the common computation is decomposed into a set of data pieces. This partitioning criteria of the ordered data is a clear opportunity for parallel execution, and it is used to define the partitioning and gathering activity of the manager component. On the other hand, the same computation to be performed on different data pieces is used to define the structure of each one of the worker components. Sometimes, the manager is also implemented to perform the computation on data pieces as well. Usually, the structure of the manager component can be reused if it is designed to deal with different data types and sizes, delimiting its behaviour to divide, deliver and gather data pieces to the worker components. It is possible to implement a manager component using a single element approach (for instance, a process, a task, a function, an object, etc.), or to define a sub-system of elements that perform manager activities. Usually, concurrency the sub-system elements can be used, defining different interfaces for different actions. Design patterns [GHJV95, POSA96, PLoP94, PLoP95] can help to define and implement such interfaces. On the other hand, worker components can be also defined either by single elements or by sub-systems of elements. Patterns that particularly can help with the design and implementation of the manager and worker components are the Active Object pattern [LS95] and the Service Configurator pattern [JS96]. In the case of the worker components, other design patterns that may provide information about their implementation are the "Ubiquitous Agent" pattern [JP96] and the Object group pattern [Maf96].

Communication. The communication structure that coordinates the execution between the manager and worker should be defined. As workers are just allowed to communicate with the manager to get more work, defining an appropriate communication structure between manager and worker components is a key task. The communication structure should allow the interactions between the manager and each worker (request a data piece and, once processed, its delivery to the manager). Important parameters to consider are the size and format of data, the interface to service a request of data, and the synchronisation criteria. In general, a synchronous coordination is commonly used in Manager-Worker pattern systems. The implementation of communication structures depends on the programming language used. In general, if the language contains basic communication and synchronisation instructions, communication structures can be implemented relatively easily, following the single element approach. However, if it is possible to reuse the design in more than one application, it would be convenient to consider a more flexible approach using configurable communication sub-systems for the exchange of different types and sizes of data. Design patterns can help to support to the implementation of these structures; especially, consider the Composite Messages pattern [SC95], the Service Configurator pattern [JS96], and the Visibility and Communication between Control Modules and Client/Server/Service patterns [AEM95, ABM96].

Agglomeration. The data division and communication structure defined previously are evaluated with respect to the performance requirements. If necessary, the size of data pieces is changed, modifying the granularity of the system. Data pieces are combined or divided into larger or smaller pieces to improve performance or to reduce communication costs. Due to inherent characteristics of this pattern, the process is automatically balanced among the worker components, but granularity is modified in order to balance the process between the manager and the workers. If the operations performed by the workers are simple enough and workers receive relatively small amount of data, workers may remain idle while the manager is busy trying to serve their requests. On the contrary, if worker operations are too complex, the manager will have to keep a buffer of pending data to be processed. It is noticeable that load-balance between manager and workers can be achieved simply by modifying the granularity of data division.

Mapping. In the best case, the hardware allows that each component is assigned to a processor with enough communication links to perform efficiently. However, generally the number of components is defined a lot bigger than the number of available processors. In this case, it is common to place a similar number of worker components on each processor. To keep the structure the most balanced possible, the manager component can be executed on a dedicated processor, or at least on a processor with a reduced number of working components. The competing forces of maximising processor utilisation and minimising communication costs are almost totally fulfilled by this pattern. Mapping can be specified statically or determined at run-time, allowing a better load-balance. As a "rule of thumb", parallel systems based on the Manager-Workers pattern will perform reasonably well on a MIMD (multiple-instruction, multiple-data) computer, and it may be difficult to adapt to a SIMD (single-instruction, multiple-data) computer [Pan96].

Consequences

Benefits

The order and integrity of data is preserved and granted due to the defined behaviour of the manager component. The manager takes care of what part of the data has been operated on, and what remains to be computed by the workers

An important characteristic of the Manager-Workers pattern is due to the independent nature of operations that each worker performs. Each worker requests for a different piece of data during execution, which makes the structure to present a natural load balance [POSA96, CT92].

As every worker component performs the same computation, granularity can be modified easily, due to it depends only on the size of the pieces in which the manager divides the data. Furthermore, it is possible to exchange worker components or add new ones without significant changes to the manager, if an abstract description of the worker is provided [POSA96].

Synchronisation is simply achieved because communications are restricted to only between manager and each worker. The manager is the component in which the synchronisation is stated.

Using the Manager-Worker pattern, the parallelising task is relatively straightforward, and it is possible to achieve a respectable performance if the application fits this pattern. If designed carefully, the Manager-Worker pattern enables the performance to be increased without significant changes [POSA96, Pan96].

Liabilities

The Manager-Workers pattern presents execution problems if the number workers is large, the operations performed by the workers are too simple, or workers receive small amounts of data. In all cases, workers may remain idle while the manager is busy trying to serve all their requests. Granularity should be modified in order to balance the amount of data among the workers.

Manager-Worker architectures are not always feasible when performance is a critical issue. If the manager activities - data partition, receive worker requests, send data, receive partial results, and computing the final result - may take a longer time compared with the processing time of the workers, it can be considered that the overall performance depends mostly on the manager performance. A poor performance of the manager impacts heavily on the performance of the whole system [POSA96, CT92].

Many different parameters must be carefully considered, such as strategies for work subdivision, manager and worker collaboration, and computation of final result. Also, it is necessary to provide error-handling strategies for failure of worker execution, failure of communication between the manager and workers, or failure to start-up parallel workers [POSA96].

Known uses

Connectivity and Bridge Algorithms are an application of the Manager-Workers pattern. The problem is to determine if a connected graph has a bridge. A bridge is an edge whose removal disconnects the graph. A simple algorithm attempts to verify if an edge is a bridge by removing it and testing the connectivity of the graph. However, the required computation is very dense if the number of edges in the graph is large. A parallel version using a Manager-Worker pattern approach is described as follows: each worker, using the algorithm proposed, is responsible for verifying if an edge is a bridge. Different workers check for different edges. The manager distributes the graph information to the workers, builds the final solution, and produces results [NHST94].

In matrix multiplication, the matrixes are distributed among the workers by the manager. Each worker calculates products and returns the result to the manager. Finally, with all the results available, the manager can build the final result matrix [POSA96, Fos94].

In image processing, the Manager-Worker pattern is used for transformations on an image that involve an operation on each piece of the image independently. For example, in computing discrete cosine transform (DCT), the manager divides the image in sub-images, and distributes them among the workers. Each separate worker obtains the DCT of its sub-images or pixel block and returns the result to the manager. The final image is then composed by the manager, using all the partial results provided by the workers [POSA96, Fos94].

Related patterns

The Manager-Workers pattern can be considered as a variant of the Master-Slave pattern [POSA96, PLoP94] for parallel systems. Other related patterns with similar approaches are the Object Group pattern [Maf96], and the Client/Server/Service pattern [PLoP94].

Contact Information

Jorge Luis Ortega Arjona.

E-mail jortega-arjona@acm.org