Benha Faculty of Engineering $5^{\text {th }}$ Year (Comm. \& Comp.) Examiner: Dr. Hatem ZAKARIA

## Electrical Engineering Department

 Final Exam: 20 May 2013 Time allowed: 3 Hours
## Distributed Processing (E521)

## Model Answer

No. of Questions: 5

## Question (1)

(12 Marks)
a. Compare between (hint: explanations using block diagrams is desired):

1. SIMD and MIMD machines.

| Aspect | SIMD | MIMD |
| :---: | :---: | :---: |
| Architecture |  |  |
| Ease of programming | Simple Sequential fashion. | Need to parallelize the program: may be harder. |
| Need for synchronization | No need. It is only 1 machine. | Important aspect, since processors will be sharing the memory |
| Performance evaluation | For large parallelizable problems, uniprocessor systems are expensive | For sequential problems (by nature), parallelizing is expensive |
| Run time system limitations | Limited factors affect this aspect (speed of execution, memory access time). | Additional factors are involved (processors' synchronization, communication time, delays on the interconnection network, etc.). |

2. Fine-grain and Coarse-grain parallelism of distributed programs.

## Fine-grain Parallelism

- Small amounts of computational work are done between communication events $\rightarrow$ low computation to communication ratio
- Facilitates load balancing
- Implies high communication overhead and less opportunity for performance enhancement



## Coarse-grain Parallelism

- Relatively large amounts of computational work are done between communication/synchronization events
- High computation to communication ratio
- Implies more opportunity for performance increase
- Harder to load balance efficiently

b. Explain in details the two main types of the shared memory, what does it mean that the memory is cache coherent?
Uniform Memory Access (UMA)
- Shared memory is accessible by all processors in the same way a single processor accesses its memory
- Identical processors
- Equal access and access times to memory



## Non-Uniform Memory Access (NUMA)

- Each processor has part of the shared memory attached (memory banks)
- NUMA systems include additional hardware or software to move data between banks.
- Memory access time depends on the memory location
 relative to a processor
- Cache coherent means: If one processor updates a location in shared memory, all the other processors know about the update.
c. Assume that a simple addition of two elements requires 2 ns , and the communication time of the processing element with the memory takes 1 ns . You are required to compute the execution time needed to perform the total sum of 16 elements using a SIMD system having 3 processing elements connected in nearest neighbor fashion. Consider that each processor has only its local memory.


## Answer

$\mathrm{T}_{\text {add }}=2 \mathrm{~ns}$ and $\mathrm{T}_{\text {com }}=1 \mathrm{~ns}$
No. of processors $=3$
Total No. of add operations/clock cycle $=|8 / 3|+|4 / 3|+|2 / 3|+|1 / 3|$

$$
=3+2+1+1=7
$$

Execution time $=7 * 3=21 \mathrm{~ns}$

## Question (2)

(10 Marks)
a. Discuss in details why synchronous communications are blocking communications, while asynchronous communications are non-blocking communications.
Synchronous communications require some type of "handshaking" between tasks that are sharing data. Synchronous communications $\rightarrow$ blocking communications as other work must wait until the communications have completed.
Asynchronous communications allow tasks to transfer data independently from one another. Task 1 can prepare and send a message to Task 2, then immediately begin doing other work, while task 2 is receiving the data. Asynchronous communications $\rightarrow$ nonblocking communications as other work can be done while the communications are taking place. Interleaving computation with communication is the single greatest benefit for using asynchronous communications.
b. Draw a block diagram showing the asynchronous inter-task communication between two different clock modules using a FIFO buffer.

c. Compute the switch and link costs of interconnecting 256 nodes using a crossbar relative to a MIN with $2 \times 2$ switches, assuming that switch cost grows quadratically with the number of input/output ports.

## Answer

- MIN with $2 \times 2$ switches, the cost of each switch is proportional to $2^{2}$
- There are $256 / 2\left(\log _{2} 256\right)$ total switches
- $\quad \log _{2} 256$ stages of 256 unidirectional links per stage from the switches plus 256 links to the MIN from the end nodes.
- $\quad \operatorname{cost}(\operatorname{crossbar})_{\text {switches }}=256^{2}, \quad \operatorname{cost}(\operatorname{crossbar})_{\text {links }}=512$
- relative_cost $(2 \times 2)_{\text {switches }}=256^{2} /\left(2^{2} \times 256 / 2 \times \log _{2} 256\right)=16$
- relative_cost $(2 \times 2)_{\text {links }}=512 /\left(256 \times\left(\log _{2} 256+1\right)\right)=0.222$


## Question (3)

a. Compare between the Completely Connected and Linear Array Networks with N processors in terms of (architecture, link cost, worst delay, degree, symmetry, and diameter).

| Aspect | Completely Connected <br> Network | Linear Array Network |
| :---: | :---: | :---: |
| Architecture | $N(N-1) / 2$ | $N-1$ |
| Link Cost | 1 | $N$ |
| Worst Delay | $N-1$ | $N$ |
| Degree | Yes | No |
| Symmetry | 1 | N |
| Diameter |  |  |

b. Consider the case of a crossbar network consisting of 3 processors and 3 memory modules. Assume that a processor generates a memory request with a probability equals 0.3 in a given cycle. Compute the bandwidth of such system.

$$
\text { Answer: } B W=M\left(1-(1-(\rho / M))^{n}\right)=3\left(1-(1-(0.3 / 3))^{3}\right)=0.813
$$

c. Show the routing of the message in the following Omega MIN from the processing node No. 101 to the processing node No. 011. What happens if a new connection is needed between node No. 110 and node No. 111 during the presence of the previous connection? State if MINs are non-blocking networks or not and why?


- The two connetcion can be established simultaneosly.
- MINs Network has the property of being blocking.
- Contention is more likely to occur on network links where paths from different sources to different destinations share one or more links
d. Consider a parallel architecture built using $N$ processors each capable of sustaining 0.5 megaflop. Consider a supercomputer capable of sustaining 20 megaflops. What is the condition in terms of the fraction $f$ of the task $T$ is not dividable into concurrent subtasks under which the parallel architecture can exceed the performance of the supercomputer? If $f=20 \%$, determine: the minimum number of parallel processors N to be used? Speedup factor $S(n)$, Maximum speedup factor, and Efficiency $\eta$ ?

Answer: The condition is expressed as follows: $0.5^{*} f+(1-f)^{*} 0.5{ }^{*} \mathrm{~N}>20$
For $f=20 \% \rightarrow 0.5^{*} 0.2+(1-0.2)^{*} 0.5^{*} \mathrm{~N}>20 \rightarrow \mathrm{~N}=50$ processor
$S(n)=N /[1+f(N-1)]=50 /[1+0.2(50-1)]=4.63$
Maximum speedup factor $=1 / f=1 / 0.2=5 \quad$ Efficiency $\eta=s(n) / N=9.26 \%$
Question (4)
(12 Marks)
a. Compare between:

1. Local Area Networks (LANs) and Wide Area Networks (WANs)
Aspect
2. Circuit switching and packet switching techniques.

3. Half duplex and full duplex transmission modes.

## Half - duplex Transmission

Communication can go either way, but only one at a time. A single railroad is of this type.

## Full - duplex Transmission

It possible to transmit in both directions at the same time by using a different frequency band for each direction
b. Discuss in details three different objectives of computer networks.

Resource sharing: All programs, equipment and data are available to anyone on the network. No regard to the physical location of the resource "server" and the user "client".


High Reliability: Multiple CPUs, If one goes down, the other may be able to take over its work.
Cooperative Computing: The main problem can be divided into sub-problems. Each processor solves a certain part of the problem.

Communication Medium: 2 or more people who live far apart can write a report together. A change to an online document can be seen by the immediately.

Saving Money: Small PCs have a much better price/performance ratio than large ones. Mainframes are a factor of 10 faster than small PCs, but have very high cost. Designers build systems consisting of PCs, with data kept on one or more shared file server machine
c. Television channels are 6 MHz wide. How many bits/sec can be sent if four-level digital signals are used (assume a noiseless channel).

Max. Data Rate $=2 H \log _{2} V$ bits $/ \mathrm{sec}=2 * 6 * 10^{6} \log _{2} 4=24 \mathrm{Mbps}$
d. Compute the checksum $C S$ and the transmitted frame $T(x)$ for the message: 1101011011 using the polynomial $x^{4}+x+1$

Generator $\mathbf{G}(\mathbf{x})=10011 \quad$ Checksum CS $=1110$
Transmitted Frame $T(x)=11010110111110$

## Question (5)

(12 Marks)
a. Discuss the main types of attacks activated by an intruder that would potentially affect information in a computer network.

## Passive Wiretapping

An intruder simply monitors at some point without interfering with the information flow. Such unauthorized observation of information is referred to as release of message content When the message content is not available, the wire tapper can examine the quantities, lengths, and frequencies of the message transmission. These types of passive attacks are referred to as traffic analysis "i.e. learn about the character of the data being exchanged"

## Active Wiretapping

An intruder performs some type of message processing on the information stream that passes the tapping point. Message could be selectively modified; deleted, delayed, recorded, duplicated either immediately or at later time, in addition fake messages can also be created. These types of activities are referred to as unauthorized message stream modification. Discarding all messages that pass the connection point is another form of active wiretapping resulting in a denial of service to a user by a LAN, called unauthorized denial of message service
b. Draw a block diagram to show the security environment of a local area network (LAN) and indicate the positions at which an intruder can be located.

An intruder can be located at some point through which all information of interest must pass, this could be: The transmission link, Multiple-user host computer, Bridge, and Gateway
c. Draw a block diagram representing the fundamental encryption and decryption process for a symmetric key cipher.

d. Decipher the following cryptogram which was enciphered by geometrical pattern encoding using a $5 \times 5$ matrix with letters taken off in the column order 23154

LATESSSOTUEHRHAEATEEIPFRR<br>Model answer: Plaintext is "ELSIE HAS PART OF THE TREASURE"

