Yesterday I stumbled upon a StackOverflow question that asked about implementing a Rosetta Code problem in parallel to speed it up. One easy way to do it is one, which is a modification of the python example on rosettacode.org.
When I posted the question the OP commented that he/she was looking for using non-terminal sub-processes that yield these super-d values. I thought about it, but that version does not seem very practical. If the main process is interested in the results of the computation, then temporary concurrency will be the cleaner solution (like in this example). If the main thread isn’t, e.g., if you are running an old-school threaded web-server, that hands off incoming connections to a worker thread, then solutions with non-terminal sub-processes can make sense. In the latter case you are essentially “starting a program N times in parallel and shutting them down together”. This certainly makes sense, but only if those programs don’t need to communicate. Remember to KISS.
Thank you for reading and Happy Coding!