(This question is probably flirting with the "no software recommendations" rule; I understand why it might be closed).
In their paper F_2 Lanczos revisited, Peterson and Monico give a version of the Lanczos algorithm for finding a subspace of the kernel of a linear map over Z/2Z. If my cursory reading of their paper is correct (whether it is or not is clearly not a question for SO), the algorithm presented requires a number of iterations that scales inversely proportional to the word size of the machine used. The authors implemented their proof of concept algorithm with a 64 bit word size.
Does there exist a publicly available implementation of that algorithm utilizing wide SIMD words for (a potentially significant) speedup?