How to correctly apply the Amdahl's argument:
Dr. Gene AMDAHL has formulated the law quite some time ago, when processing flows organization improvements were in focus for mainframe computing centres, for industrial production, for generic work-flow optimisations. This law is still valid and has its word to say.

The process-oriented point of view has remained a bit out of sight in the simplicity of the traditional formula, but should never been. The same applies to the "inventory" of ( somewhat specialized, "product"-specific ) resources. If interested in more details, one may re-read the parallelism-amdahl details from history #3 in https://stackoverflow.com/revisions/18374629/3
Both the paragraphs, with re-formulated Amdahl's law, in the chapter Criticism, have all the details about the rationale why we cannot simply put numbers into a formula and expect a valid ( i.e. reasonable and achievable ) result.
So, lets start with an abstraction of the PROCESS:
The one, identified and named above, is the process of sending an SMS message. It can take ~ 10 [s]. This process is independent ( hooray, no coordination, no barriers, no semaphores, no locks, no process-to-process communications ), but indivisible in length, so for brevity, we may call it an atomic-process as it cannot be performed but in a complete "atomically"-indivisible fashion, but it has some latency ( duration ), before a process-finished state could be reached.
Each of the processes will work only if and only if when it has been both mapped ( allocated ) and scheduled ( activated/executed ) onto some processing resource.
So, next the RESOURCES:
As there has been already defined in the original post, there are 4-SMS-modems. These are process-specific resources, that are responsible for carrying the "atomic"-process ( task ) independently of one from the others ( which is a simple case here, not automatically extensible ad infinitum, as the carrier network will start blocking, once the radio-access-network ( GSM last-mile ), POP-processing ( BTS-node ) and/or carrier-network ( uplinks from BTS to MSC ) will start to reach their respective resouces' available capacities ceilings - all complex systems simply have complex hierarchies of resources / capacities / workloads allocations / performance-related ceilings where each may start blocking of some of our "atomic"-process-of-interest latency ( duration ) and "schedulability" ( a mapping of a process onto "our" identified unit of process-specific processing-resource ... i.e. the GSM-modem has some amount of the background logic, related with the GSM-network coordination, so its behaviour is not isolated from the complexity of the local GSM-cell traffic, processed on both the local BTS-node and also on the complex BTS/MSC-network processing/traffic transport coordination conditions, which we do not observe but indirectly, by increased latency or even by a denial of service response, if BTS-node is not able to handle our "atomic"-process request and discards it after some GSM-standard driven timeouts, as it is unable to receive and carry the SMS - thus the SMS is known to be a not-guaranteed messaging service by-design since ever ).
So, lets keep the simple assumption, that SMS-modem will always process the request to send an SMS, so in this resources-management simplified case the only limiting factor is the number of SMS-modems - here, being 4
Overhead-strict and resources-aware re-formulation:
1 Where: s := a SERIAL only part of the End-to-End process-flow
S = ____________________________________________ 1 - s := a PARALLEL organizable part
/ ( 1 - s ) \ pSO := a PARALLEL task setup overhead
s + pSO + max| _________ , atomicP | + pTO pTO := a PARALLEL test termination overhead
\ N / N := a number of resources that process atomic-process-block
atomicP:= a duration of a further indivisible atomic-process-block
Result:
For a mock-up situation, where actual times are:
Ts = X [s] a duration to launch the End-to-End process-flow ( start the program ),
TatomicP = 10 [s] a latency of sending one SMS-message will never be shorter,
Nres = 5 [1] a number of SMS-messages processing resources ( i.e. GSM-modems ),
nSMS = 1 [1] a number of SMS-messages to send,
TpSO = Y [s] a duration to start a PARALLEL sub-process for GSM-modem pool handling,
TpTO = Z [s] a duration to disengage the pool of PARALLEL sub-processes and to release all of their owned resources,
thus the End-to-End process-flow duration T_E2E
will be:
T_E2E = Ts+TpSO+max(TatomicP*nSMS/Nres, TatomicP)+TpTO) = X + Y + max(10/Nres, 10) + Z
the fraction s
of the pure-SERIAL
process flow, that will always have to remain pure-SERIAL
, will be:
( Ts ) / ( T_E2E ) = X / ( X + Y + 10 + Z ) [1]
the principal ceiling ( a maximum ever achievable ) Speedup S
will be:
( Ts + TatomicP*Nsms ) / ( Ts + TpSO + max(TatomicP*nSMS/Nres,TatomicP) + TpTO )
, which for just one SMS-message and whatever large number of SMS-processing GSM-modems will be here << 1, right because the pure-SERIAL
process-flow does exhaust no time to neither setup a pool of sub-processes and to finally terminate them and release the pool's allocated resources and spends no time to arrange transfers of data there and back, among the main-task and the pool of tasks organised in a true PARALLEL
fashion.
So, here, you pay way more than you receive back ( from the process organisation point of view, the pool of processing-ready GSM-modems ( resources ) is rather expensive to arrange and control so as to send just one SMS-message ).
In case the End-to-End workflow starts to process a larger cohort of SMS-messages, the performance will boost and the speedup S
will do the same. The larger the cohort to process the better. The larger the cohort to process and the more the GSM-modems to process the cohort, the better.
Yet, the Amdahl's argument sets the principal ceiling of such achievable speedups:
the S
will principally never be more than 4 for 4 modems, not more than 5 for 5 modems etc.