4

I am building a neural network running on an FPGA, and the last piece of the puzzle is running a sigmoid function in hardware. This is either:

1/(1 + e^-x)

or

(atan(x) + 1) / 2

Unfortunately, x here is a float value (a real value in SystemVerilog).

Are there any tips on how to implement either of these functions in SystemVerilog?

This is really confusing to me since both of these functions are complex, and I don't even know where to begin implementing them due to the added complexity of being float values.

rohan32
  • 500
  • 1
  • 7
  • 18
  • 1
    Which FPGA are you designing for? Does it have any DSP resources? – Hida Nov 21 '16 at 09:58
  • 1
    Use a DSP block on your FPGA to calculate the sigmoid function. – noobuntu Nov 21 '16 at 14:57
  • 1
    What are the latency and throughput requirements? That would heavily influence anything I would design for this. Also, I'm not familiar with any FPGA tool that can synthesize a real (float) into hardware (it might exist, but its pretty recent if it does). – hops Mar 21 '17 at 16:22
  • In many neural network computation, the calculation might not need very accurate. If you really implement float point arithmetic unit, that could be very costly for evaluating the equation above and complex implementation. One choose is to use fixed point math or lookup table to implement with specific domain range. – jclin Dec 18 '19 at 18:53

6 Answers6

0

One simpler way for this is to create a memory/array for this function. However that option can be highly inefficient.

x should be the input address for the memory and the value at that location can be the output of the function.

Suppose value of your function is as follow. (This is just an example)

x = 0 => f(0) = 1
x = 1 => f(0) = 2
x = 2 => f(0) = 3
x = 3 => f(0) = 4

So you can create an array for this, which stored the output values.

int a[4] = `{1, 2, 3, 4};
Karan Shah
  • 1,912
  • 1
  • 29
  • 42
0

I just finished this by Vivado HLS, which allows you to write circuits in C. Here is my C code.

#include math.h

void exp(float a[10],b[10])

{
    int i;
    for(i=0;i<10;i++)
    {
        b[i] = exp(a[i]);
    }
}

But there is a question that it is impossible to create a unsized matrix. Maybe there is another way that I don't know.

Vaibhav Mule
  • 5,016
  • 4
  • 35
  • 52
ID.W
  • 58
  • 8
0

As you seem to realize, type real is not synthesizable. you need to operate on the type integer mantissa and type integer exponent separately and combine them when you are done, having tracked the sign. Once you take care of (e^-x), the rest should be straight-forward.

try this page for a quick explanation: https://www.geeksforgeeks.org/floating-point-representation-digital-logic/

and search on "floating point digital design" for more explanations/examples.

CapnJJ
  • 151
  • 5
0

Do you really need a floating number for this? Is fixed point sufficient?

Considering (atan(x) + 1) / 2, quite likely the only useful values of x are those where the exponent is fairly small. (if the exponent is large, your answer is pi/2).

atan of a fixed point number can be calculated in HW fairly easily; there are cordic methods (see https://zipcpu.com/dsp/2017/08/30/cordic.html) and direct methods; see for example https://dspguru.com/dsp/tricks/fixed-point-atan2-with-self-normalization/

Mac
  • 299
  • 2
  • 5
0

FPGA design flows in which hardware (FPGA) is targeted generally do not support floating point numbers in the FPGA fabric. Fixed point with limited precision is more commonly used.

A limited precision fixed point approach:
Use Matlab to create an array of samples for your math function such that the largest value is +/- .99999. For 8 bit precision (actually 7 with sign bit), multiply those numbers by 128, round at the decimal point and drop the fractional part. Write those numbers to a text file in 2s complement hex format. In SystemVerilog you can implement a ROM using that text file. Use $readmemh() to read these numbers into a memory style variable (one that has both a packed and unpacked dimension). Link to a tutorial:
https://projectf.io/posts/initialize-memory-in-verilog/.
Now you have a ROM with limited precision samples of your function

Section 21.4 Loading memory array data from a file in the SystemVerilog specification provides the definition for $readmh(). Here is that doc:
https://ieeexplore.ieee.org/document/8299595

If you need floating point one possibility is to use a processor soft core with a floating point unit implemented in FPGA fabric, and run software on that core. The core interface to the rest of the FPGA fabric over a physical bus such as axi4 steaming. See:
https://www.xilinx.com/products/design-tools/microblaze.html to get started.
It is a very different workflow than ordinary FPGA design and uses different tools. C or C++ compiler with math libraries (tan, exp, div, etc) would be used along with the processor core.

Another possibility for fixed point is an FPGA with a hard core processor. Xilinx Zynq is one of them. This is a complex and powerful approach. A free free book provides knowledge on how to use Zynq
http://www.zynqbook.com/.
This workflow is even more complex that soft core approach because the Zynq is a more complex platform (hard processor & FPGA integrated on one chip).

Mikef
  • 1,572
  • 2
  • 10
  • 21
0

Its pretty hard to implement non-linear functions like that in hardware, and on top of that floating point arithmetic is even more costly. Its definitely better(and recommended) to work with fixed point arithmetic as mentioned in answers before. The number of precision bits in fixed point arithmetic will depend on your result accuracy and the error tolerance.

For hardware implementations, any kind of non-linear function can be approximated as piecewise linear function, and use a ROM based implementation approach as described in previous answers. The number of sample points that you take from the non-linear function determines your accuracy. The more samples you store the better approximation of the function you get. Often in hardware , number of samples you can store can become restricted by the amount of fast/local memory available to you. In this case to optimize the memory resources, you can add a little extra compute resources and perform linear interpolation to calculate the needed values.