The parallel program was developed on heterogeneous environment. We have used for developing some different machines (HP 9000, VAX-750, SUN IPC). This environment is good for testing, but the measurement of the performance is quite difficult since the different computers have different performances. However, we could test the application, and the load-balancing as well. The presented results were measured on an IBM SP1 computer with 8 processors and 3 RS6000/580 computers connected by Ethernet with TCP/IP protocol. All the computers are connected by NFS.
Our test problem was simple. It is a bar clamped at both ends, which involves a 3 dimensional problem. In the computation the phase space was divided into 120*200*300=7200000 cubes, and each cube into 2*3=6 simplices. We used different number of computers and only one slave per processor. In each case one slave got a domain not bigger than 37*37*36=49284 cubes. As we have mentioned in section 4, in order to get reasonable load-balancing the master program creates more domains than the number of processors.
The slave processes have measured the elapsed times (user, system, real). This time values were gathered by the master program and it was reported at the end of the running. Table 1 displays the measured times with respect to the number of processors. We choose always the worst (maximal) time values. The speed up is computed from the max user + max system times, however the real speed up would come from real times, we have used the user + system times, because we couldn't exclude the extra load on the computers caused by other processes during the measurement period.
Table 1: The elapsed times depending on number of processors
Table 2 shows the measured times and the number of domains (jobs) solved by each processor, when we have used 11 processors: 8 processors from SP1 (p1-p8) and 3 RS6000/580 computers (p9-p11). We can recognize the effect of load-balancing. Processor p11 have computed only 6 domains while p2 computed 34.
Table 2: Measured times using 11 processors
Our paper displayed an algorithm for solving a class of nonlinear equation systems and the parallel distributed approach of this algorithm. The described algorithm and parallel application proved to be suitable for efficient resolving a wide class of technical problems by using the network of computers. The network enabled us to get more computing power and more system memory at reasonable price/performance ratio. Our experiments convinced us that the PVM system is useful for solving different wide range of problems in applied mechanics.