MPI-SpeedUp for SpinPack (matrix at memory)
Checked for up to 1000 cores now (Jul08)!
linear in plot: x=log(CPUs) y=log(t1/t)
lg2(t1/t) = b * lg2(CPUs) # t1 extrapolated 1CPU-time
t1/t = (2^(b))^lg2(CPUs) # 2^b = SpeedUP2 (CPU-doubling)
t1/t2 = (2^(b))^lg2(2) = 2^b = SpeedUp2
BWFactor = (t1/t(1CPU)) # Band Width Factor
OverallSpeedUp = SMPSpeedUP * MPISpeedUp
SMPSpeedUp = SMPSpeedUp2 ^ lg2(SMPCores)
MPISpeedUp = BWFactor * MPISpeedUp2 ^ lg2(MPINodes)
SpeedUp2
v2.33: SMP = 1.66 (up to 32 CPUs), MPI = 1.46 (up to 64 nodes)
v2.36: SMP = 1.66 (up to 32 CPUs), MPI = 1.69 (up to 50 nodes)
BWFactor 100Mbit/s = ca. 40% ( 40% float, 25% double, 2*2GHz)
1Gbit/s = ca. 100% (100% float, 70% double, 4*2Ghz)
2*10Gbit/s = ca. 100% (estimated, BW*Cores/Node)
extrapolation:
v2.33: SpeedUp = 1.66^lg2(SMPCores) * 1.46^lg2(MPINodes)
v2.36: SpeedUp = 1.66^lg2(SMPCores) * 1.69^lg2(MPINodes) ??