Iterating over bits in FPGA

Question

Now I'm trying to figure out best method for iterating over bits in FPGA. I'm using some variation of fast powering algorithm, a.k.a exponentiation by squaring (more precisely it's doubling and add algorithm for elliptic curve mathematics). To implement it on hardware, I know I must use FSM which does iteration. My problem is how to properly "handle" moving from bit to bit. My first thought was to switch order of bytes, but when my k = 17 is 32bit, I must discard first 27 bits, so it's rather stupid idea. Another concept was with "moving" 0001000 pattern and bitwise & it with number, but it also requires to find first nonzero bit.

TL&DR Got for example k = 17 (32bits, so: 17x0 10001) and want to iterate 5 times (that means I start iteration on first "real" bit of number) knowing each bit I iterate over.

Language doesn't matter - I need only the algorithm, not solution in specific language. However, if it is easily done in Verilog, I wouldn't mind. :P

With a little googling you can find a bunch of papers using the search term "doubling and add algorithm elliptic curve VHDL" and optionally throwing in FSM. You'd be hard pressed not being able to find a description of the algorithm. A data dependency graph is a close to abstract as a block diagram. — , Jun 21 '14 at 06:56
Yeah, but almost all algorithms describe loop from "l-1 to 0", and completely don't know how to get bits count l of number. I guess logarithm is rather bad idea. :P — Kacper Banasik, Jun 21 '14 at 14:41
Yes, but this would result in 27 wasted cycles - and that's why I'm asking here. To learn something new and better! :) — Kacper Banasik, Jun 21 '14 at 21:14
Check out this question on finding the first non-zero bit: http://stackoverflow.com/questions/24166295/verilog-bit-change-location/24169188#24169188 — Guy, Jun 25 '14 at 13:51
In your TL-DR example, the second two-digit number may be meant to be a _27_. `start iteration on first "real" bit` is that to say from the most significant bit? Do you need to know its position/weight? If the number isn't zero, that bit will be one… and the first iteration should give ample time to figure out the bits to follow/the position/weight of the leading 1-bit. — greybeard, Sep 22 '16 at 20:05

score 0 · Answer 1 · edited Jun 26 '14 at 11:16

0

Don not code for FPGA but still:

rewrite algorithm to iterate number x from LSB to MSB
then in each iteration bit shift x right by 1 bit
stop if x==0.

this way you have bit-scan inside your main loop and do not need additional cycles for it.

x!=0 is done easily by ORing all its bits together

C++ code example:

DWORD x = ...;
for (; x != 0; x >>= 1) 
{
    //here is your iteration loop stuff like:
    if (DWORD(x & 1) !=0 ) ...;   
}

edited Jun 26 '14 at 11:16

Simon Richter

28,572
1
42
64

answered Jun 22 '14 at 04:40

Spektre

49,595
11
110
380

@Martin Thompson what does added space after bullet number mean in markdown? (do not take this the wrong way I just want to know the difference so I use it right in the future) – Spektre Jun 23 '14 at 14:05
It turns it into a "real" numbered list (where the browser formats it as an
), rather than one done "by hand". You can even put the same number on each line and the browsers numbering will carry on (see latest edit :)

Martin Thompson

Jun 23 '14 at 16:13

score 0 · Answer 2 · answered Jun 25 '14 at 12:12

A dedicated combinatorial circuit to find the first nonzero bit, shift it to the first position and tell you the shift amount should be fairly light on resources.

In principle, the compiler should be able to find this solution on its own and improve on it:

if none of the top 16 bits are set, set bit 4 of the shift amount, and shift by 16.
if none of the top 8 bits are set, set bit 3 of the shift amount, and shift by 8.
...

The compiler should be able to find further optimizations on this.

score 0 · Answer 3 · answered Jan 19 '15 at 17:44

Something like:

always @ *
casex(num)
    8XXX_XXXX: k = 32;
    4XXX_XXXX: k = 31;
    2XXX_XXXX: k = 30;
    ...

Should give you the value of k.

You can have a shift register which can be parallel loaded so you can write a 1 to the kth bit, so you know when your iterations have ended.

score 0 · Answer 4 · answered Sep 22 '16 at 18:02

If you loop from 0 to 31 and discard the 27 leading zeros...you aren't necessarily wasting cycles. Depends on whether you've surrounded this with a synchronous process, or a asynchronous one.

One gives you a rather small clocked circuit with a 32 clock latency. The other gives you a giant rats nest of ANDs and ORs which won't run at a very high frequency.

Depends on what you want. Remember though, that even if you do decide to loop over 32 clocks, you can PIPELINE it such that you start a new calculation every clock. It might take you 32 clocks to get an answer, but you CAN do them at high speed.

Iterating over bits in FPGA

4 Answers4