For discrete distributions, you can use the aforementioned biopython or scikit-learn's sklearn.metrics.mutual_info_score
. However, both compute the mutual information between "symbolic" data using the formula you cited (which is intended for symbolic data). In either case, you disregard that the values of your data have an inherent order.
For continuous distributions, you are better off using the Kozachenko-Leonenko k-nearest neighbour estimator for entropy (K & L 1987) and the corresponding Kraskov, ..., Grassberger (2004) estimator for mutual information. These circumvent the intermediate step of calculating the probability density function, and estimate the entropy directly from the distances of data point to their k-nearest neighbour.
The basic idea of the Kozachenko-Leonenko estimator is to look at (some function of) the average distance between neighbouring data points. The intuition is that if that distance is large, the dispersion in your data is large and hence the entropy is large. In practice, instead of taking the nearest neighbour distance, one tends to take the k-nearest neighbour distance (where k is typically a small integer in the range 5-20), which tends to make the estimate more robust.
I have implementations for both on my github: https://github.com/paulbrodersen/entropy_estimators