Benha Faculty of Engineering 5<sup>th</sup> Year (Comm. & Comp.) Examiner: Dr. Hatem ZAKARIA



# Electrical Engineering Department Final Exam: 15 January 2015 Time allowed: 3 Hours

# **Distributed Processing (E521)**

**Model Answer** 

## Question (1)

(12 Marks)

### a. Compare between (hint: explanations using block diagrams is desired):

1. A parallel processing machine and Von Neumann model.

| Aspect                         | Von Neumann Model                                                              | Parallel Processing Machine                                                                                                                                                                                                                   |
|--------------------------------|--------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Architecture                   | Instruction Stream<br>I/O Control<br>Unit<br>Instruction Stream<br>Data Stream | Instruction Data Instruction Data<br>Stream <sub>n</sub> Stream <sub>1</sub> Stream <sub>1</sub> Stream <sub>0</sub> Stream <sub>0</sub><br>Uniprocessor<br>Data<br>Out <sub>n</sub> Data<br>Out <sub>n</sub> Data<br>Interconnection Network |
| Ease of programming            | Simple Sequential fashion.                                                     | Need to parallelize the program: may be harder.                                                                                                                                                                                               |
| Need for synchronization       | No need. It is only 1 machine.                                                 | Important aspect, since processors will be sharing the memory                                                                                                                                                                                 |
| Performance<br>evaluation      | For large parallelizable problems, uniprocessor systems are expensive          | For sequential problems (by nature), parallelizing is expensive                                                                                                                                                                               |
| Run time system<br>limitations | Limited factors affect this aspect (speed of execution, memory access time).   | Additional factors are involved<br>(processors' synchronization,<br>communication time, delays on the<br>interconnection network, etc.)                                                                                                       |

### 2. Shared memory and Message passing paradigm.

| Aspect                        | Shared Memory                                                                                         | Message Passing                                                  |
|-------------------------------|-------------------------------------------------------------------------------------------------------|------------------------------------------------------------------|
| Architecture<br>Block Diagram | M M M M<br>Interconnection Network<br>P P P O O O P                                                   | P P P O O O P<br>M M M M                                         |
| Advantageous                  | <ul> <li>User-friendly programming</li> <li>Fast communication</li> </ul>                             | - Flexible<br>- Deadlocks avoided                                |
| Disadvantageous               | <ul> <li>Access control and<br/>synchronization overhead</li> <li>Possibility of deadlocks</li> </ul> | Costly in terms of<br>interconnection networks<br>infrastructure |

3. Fine-grain and Coarse-grain parallelism of distributed programs.

### Fine-grain Parallelism

- Small amounts of computational work are done between communication events → low computation to communication ratio
- Facilitates load balancing
- Implies high communication overhead and less opportunity for performance enhancement

#### **Coarse-grain Parallelism**

- Relatively large amounts of computational work are done between communication/synchronization events
- High computation to communication ratio
- Implies more opportunity for performance increase
- Harder to load balance efficiently



 A SIMD system having 4 processing elements connected in nearest neighbor fashion. Consider that each processor has only its local memory

 $T_{add} = 2ns \text{ and } T_{com} = 1ns$ No. of processors = 4 Total No. of add operations/clock cycle = |16/4| + |8/4| + |4/4| + |2/4| + |1/4| = 4 + 2 + 1 + 1 + 1 = 9Execution time = 9\*3 = 27 ns

- A MIMD system having 4 processing elements accessing a shared memory through an interconnection network.

No. of processors = 4 Total No. of add operations = 16 + 8 + 4 + 2 + 1 = 31Execution time with (pipelining) = 18 + 10 + 6 + 4 + 3 = 41 ns

### Question (2)

- a. State if the following statements are (✓) or (×) and justify your answer '*Note: Negation is not the answer*':
  - 1. Crossbar networks are blocking networks and it allows the system to continue functioning even if it contains some faulty elements. (\*)

Crossbar networks are non-blocking networks (*allowing multiple input-output connection patterns to be achieved simultaneously*) and it doesn't allow the system to continue functioning with faulty elements as it is *non-fault tolerant* 



(12 Marks)

2. Synchronous communications are non-blocking communications, while asynchronous communications are blocking communications. (\*)

Synchronous communications are blocking communications (other work must wait until the communications have completed), while asynchronous communications are non-blocking communications as they allow tasks to transfer data independently from one another (i.e. other work can be done while the communications are taking place)

3. In MINs network, it is always possible to establish a connection between any arbitrary unused pair of input/output during the presence of another connection. (**\***)

MINs are blocking networks so it isn't always possible, especially if the unused pair is going to use the same switch used before.

Or in Clos network, it is always possible to establish a connection between any arbitrary unused pair of input/output during the presence of another connection

b. Draw a block diagram showing the asynchronous inter-task communication between two different clock modules using a FIFO buffer.



c. Compute the switch and link costs of interconnecting 1024 nodes using a crossbar relative to a MIN with 4 X 4 switches, assuming that switch cost grows quadratically with the number of input/output ports.

#### <u>Answer</u>

- MIN with  $4 \times 4$  switches, the cost of each switch is proportional to  $4^2$
- There are 1024/4 (log<sub>4</sub> 1024) total switches
- Log<sub>4</sub> 1024 stages of 1024 unidirectional links per stage from the switches plus 1024 links to the MIN from the end nodes.
- $cost(crossbar)_{switches} = 1024^2$ ,  $cost(crossbar)_{links} = 8192$
- relative\_cost(4 × 4)<sub>switches</sub> =  $1024^2 / (4^2 \times 1024/4 \times \log_4 1024) = 51.2$   $\rightarrow$  2 Marks
- relative\_cost(4 × 4)<sub>links</sub> = 2048 / (1024 × (log<sub>4</sub> 1024 + 1)) = 2/7 = 0.333  $\rightarrow$  2 Marks

#### **Question (3)**

a. <u>Show the routing</u> of the message in the following Omega MIN from the processing node No. 010 to the processing node No. 100. <u>What happens if a new connection</u> is needed between node No. 110 and node No. 111 during the presence of the previous connection? <u>State if MINs are</u> <u>non-blocking</u> networks or not and <u>why</u>?



Answer: The 2<sup>nd</sup> connection is blocked by switch C

MINs are blocking networks so it isn't always possible to

establish a connection, especially if the unused pair is going to use the same switch used before.

a. Compare between the Completely Connected and Ring Networks with N processors in terms of (architecture, link cost, worst delay, degree, symmetry, and diameter).

| Aspect       | Completely Connected<br>Network | Ring Network        |
|--------------|---------------------------------|---------------------|
| Architecture |                                 |                     |
| Link Cost    | N(N-1)/2                        | Ν                   |
| Worst Delay  | 1                               | $\lceil N/2 \rceil$ |
| Degree       | N-1                             | 2                   |
| Symmetry     | Yes                             | Yes                 |
| Diameter     | 1                               | $\lceil N/2 \rceil$ |

b. Consider the case of a crossbar network consisting of 4 processors and 4 memory modules. Assume that a processor generates a memory request with a probability equal 0.5 in a given cycle. Compute the bandwidth of such system.

<u>Answer</u>:  $BW = M(1-(1-(\rho/M))^n) = 4(1-(1-(0.5/4))^4) = 1.6552$ 

c. Consider a parallel architecture built using N processors each capable of sustaining 0.5 megaflop. Consider a supercomputer capable of sustaining 20 megaflops. What is the condition in terms of the fraction f of the task T is not dividable into concurrent subtasks under which the parallel architecture can exceed the performance of the supercomputer? If f = 20%, determine: the minimum number of parallel processors N to be used? Speedup factor S(n), Maximum speedup factor, and Efficiency  $\eta$ ?

**Answer:** The condition is expressed as follows:  $0.5^*f + (1-f) * 0.5 *N > 20$  (1 mark) For  $f = 20\% \rightarrow 0.5^*0.2 + (1 - 0.2)*0.5^*N > 20 \rightarrow N = 50$  processor (1 mark) S(n) = N/[1+f(N-1)] = 50 / [1 + 0.2 (50 - 1)] = 4.63 (1 mark) Maximum speedup factor = 1/f = 1/0.2 = 5 (1 mark) Efficiency  $\eta = s(n) / N = 9.25 \%$  (1 mark)

# **Question** (4)

### (12 Marks)

- a. Compare between:
  - 1. Local Area Networks (LANs) and Wide Area Networks (WANs)

| Aspect                     | LAN                                                                                                                          | WAN                                                                                                                                                                                                                  |
|----------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Architecture               | Computer<br>Cable                                                                                                            | Router<br>Host                                                                                                                                                                                                       |
| Scaling                    | Single building or campus (up to few kilometers)                                                                             | Span large geographical area (country or continent)                                                                                                                                                                  |
| Connections                | Connect PCs and workshops in company offices<br>and factories to share resources (e.g. Printers) and<br>exchange information | <ul> <li>Contains a collection of machines "hosts"<br/>intended for running user programs</li> <li>Hosts are connected by a communication<br/>subnet</li> <li>The subnet carry messages from host to host</li> </ul> |
| Transmission<br>Technology | Broadcasting                                                                                                                 | Point to Point                                                                                                                                                                                                       |
| Topology                   | Various possible forms (Bus, Ring,)                                                                                          | Various possible forms (Star, Complete,)                                                                                                                                                                             |
| Transmission<br>Media      | Twisted Pair                                                                                                                 | Fiber Optics                                                                                                                                                                                                         |

# 2. Circuit Switching and Packet Switching techniques

| Item                                 | Circuit Switched | Packet Switched |
|--------------------------------------|------------------|-----------------|
| Dedicated Copper Pass                | Yes              | No              |
| Bandwidth Available                  | Fixed            | Dynamic         |
| Potentially Wasted<br>bandwidth      | Yes              | No              |
| Store and Forward<br>Transmission    | No               | Yes             |
| Each Packet Follow the Same<br>Route | Yes              | No              |
| Call Setup                           | Required         | Not Required    |
| When can congestion occur?           | At setup time    | On every Packet |



3. Half duplex and full duplex transmission modes

### Half - duplex Transmission

Communication can go either way, but only one at a time. A single railroad is of this type.



### **Full - duplex Transmission**

It possible to transmit in both directions at the same time by using a different frequency band for each direction



b. Discuss in details the relationship between services and protocols of a computer network; proof your discussion with a block diagram.



*Service* defines operations that can be performed on an object but does not specify how these operations are implemented. These services are provided by a layer to the layer above it

*A protocol* relates to the implementation of the service and as such is not visible to the user of the service. It is a set of rules governing the format and meaning of the packets, or messages that are exchanged by the peer entities within a layer.

c. Television channels are 6 MHz wide. How many bits/sec can be sent if four-level digital signals are used (assume a noiseless channel).

<u>Answer</u>: Max. Data Rate =  $2 H \log_2 V$  bits/sec =  $2*6*10^6 \log_2 4$  = 24 Mbps

d. Compute the checksum *CS* and the transmitted frame T(x) for the message: 1 1 0 1 0 1 1 0 1 1 using the polynomial  $x^4 + x + 1$ 

<u>Answer</u>: Generator G(x) = 10011 Checksum CS = 1110 Transmitted Frame T(x) = 11010110111110

### **Question (5)**

### (12 Marks)

a. Discuss in details the two main strategies developed by the network designers to deal with transmission errors (i.e. error detection and correction)?

# 1<sup>st</sup> Strategy

- Based on adding enough redundant information along with each block of data sent
- This enables the receiver to deduce what the transmitted character must have been
- This strategy is called "Forward Error Correction (FEC)" where it uses <u>error correction</u> <u>codes</u>

### 2<sup>nd</sup> Strategy

- Based on adding enough redundancy to allow the receiver to deduce that an error occurred, <u>but not which error</u>, and it requests a retransmission
- This strategy is called "Automatic Repeat Request (**ARQ**)" where it uses <u>error detection</u> <u>codes</u>

b. Draw a block diagram representing the fundamental encryption and decryption process for a symmetric key cipher.



- c. Decipher the following cryptogram which was encrypted by two consecutive keys  $KE_1$  and  $KE_2$ :
  - *KE*<sub>1</sub> is a fixed period transposition cipher with d = 5 and f(i) = 35142.
  - $KE_2$  is a geometric pattern encoding using a 4 x 5 matrix and letters taken off by columns in the order 2 4 1 3 5.

The cryptogram is: TROYPGCRYATTEIEUGSAN

#### Answer:

The plaintext is: Egypt is a great Country. (each decoding step 1.5 Marks)

### (Good Luck)