When you start doing reinforcement learning you will sooner or later come to the point where you will generate random numbers. Initializing policy networks or Q-tables, choosing between exploration or exploitation, or selecting among equally good actions are a few examples. Numpy is very efficient at generating those random numbers, and most of the time (like 95%) this is all you need. However, there is one particular edge case were numpy is not the best solution, and that is exactly the case we encounter in RL a lot: generating single random numbers (i.e., to select an action epsilon-greedy).

Generating single random numbers in numpy is a bad idea, because every numpy call gets sent to the numpy engine and then back to python, which creates overhead that dominates runtime for single random numbers. In this case it is (much) more efficient to use the python standard library instead. However, if you can generate the random numbers in batches, numpy is significantly faster than the standard library again. Take a look at this simple profile:

So if you do your profiling your code and notice that RNG adds up to a significant portion of your runtime, consider pre-generating the random numbers in numpy and then save them to a list. This solution sacrifices a bit of readability, but allows for much faster code. Here is an example that mimics the syntax of the python standard library:

That’s it. Thanks for reading and Happy Coding!