The answer here boils down to "benchmark it."
The process creation overhead, itself, should be minimal, but the overhead of starting up the Node child could be a performance killer.
The reason centers around container reuse.
When a Node Lambda function is invoked for the first time, then finishes, the container and the process inside it remain on a warm standby for the next invocation. When that happens, your process is already running, and the handler function is invoked in a matter of microseconds. There is no time required to set up the container and start the process and run through any initialization code on that second invocation.
This means that, in scenario 1, the time for the function to get started is minimized. The overhead is how long it takes for the caller to make the request to Lambda and for Lambda to return the response, once available. In between those two things, there is very little time.
By contrast, if you spin up a child process with each request in scenario 2, you have all of that initialization overhead with each request.
I recently had occasion to run some code in Lambda that was in a language Lambda doesn't support, called by a Lambda function written in Node.js I do this with a child process, but with a twist: the child process was written to read from STDIN and write to STDOUT, for IPC from and to the JS code. I can then send a "request" to the child process and an event is triggered when the child writes the response.
So, the child is started from Node, with its controlling Node object in a global variable, only if not already present... but it is likely to be already present, again, due to container reuse.
In Node/Lambda, setting context.callbackWaitsForEmptyEventLoop
allows the Lambda callback to consider the invocation finished, even if the event loop is still running, and this means I can leave that child process running across invocations.
With this mechanism in place, I achieve best-case runtimes for each Lambda invocation of under 3 milliseconds when the container is reused. For each new container, then first initiation of that child process is in excess of 1000 ms. The 3ms time is doubtless better than I could achieve if calling a second Lambda function from inside the first one, but the savings come fron keeping the inner process alive while the container remains alive.
Since your outer function is Python, it's not clear to me just exacrly what implications there are for you, or how useful this might be, but I thought it might serve to illustrate the value of the concept of keeping your child process alive between invocations.
But start with what you have, and benchmark both of your scenarios, multiple tines, to ensure that any longer than expected runtines aren't an artifact of new container creation.