0

I asked this while back in Xilinx Forum but I got no help, still need the help that's why am asking here.

I am working on FFT IP Core to match its result with Matlab. I created an RTL Kernel(XO) using Vivado contains just FFT IP Core, and below u see all the configuration(I tried different config as I tried with pipeline streaming arch and the same problem).

I am using Vitis to connect this RTL kernel with another C++ Kernel responsible for reading data from memory and stream it to the IP Core.

The input data is fixed point scaled between -1 and 1. The data is a sine wave samples.

The config packet is 24 bits, has N = 8, and FWD_INV = 1 ( config packet : 000000000000000100000011) // still not sure about SCALE_SCH parameter I will be thankful if someone make it more clear to me.

My problem is I can't get correct output, I check through the emulator the input data and it is streaming correctly, the config packet was sent correctly as well but the output I get is totally different if I compare it with fft Matlab results for the same sine wave.

Can you help me figure out what I am doing wrong and could anyone explain me how the scale_sch working in my case?

  • C++ Kernel:
typedef ap_int<64>  data64_t;
typedef ap_int<128> data128_t;
typedef ap_int<32>  data32_t;

typedef ap_axis<IN_DWIDTH , 0, 0, 0> axis_in;
typedef ap_axis<OUT_DWIDTH , 0, 0, 0> axis_out;

void hls_dma(int                    num_words,
             data64_t              *din,
             data64_t              *dout,
             hls::stream<axis_in>   &strm_in,
             hls::stream<axis_out>  &strm_out)
{
#pragma HLS DATAFLOW

    data64_t temp_in;
    axis_in val_0, val_1;
    data32_t tmp_val;

    rd_loop: for (int i = 0; i < 2 * num_words; i++)
    {
        temp_in = din[i];
        val_0.data = 0x0;
        val_1.data = 0x0;

        tmp_val = temp_in.range(31, 0);
        val_0.data |= tmp_val;
        strm_in.write(val_0);

        tmp_val = temp_in.range(63, 32);
        val_1.data |= tmp_val;
        if(!(val_1.data == 0)) val_1.data = 0;
        strm_in.write(val_1);
    }

    axis_out v0, v1;
    data64_t temp_out;

    wr_loop:
    for (int i = 0; i < 2 * num_words; i++)
    {
        v0 = strm_out.read();
        v1 = strm_out.read();

        temp_out.range(31, 0) = v0.data;
        temp_out.range(63, 32) = v1.data;

        *dout++ = temp_out;
    }
}

extern "C" {
    void fft_infc(int                    fft_select,
                  int                    num_fft,
                  int                    fft_dir,
                  data64_t              *din,
                  data64_t              *dout,
                  volatile ap_uint<24>  *config,
                  hls::stream<axis_in>  &strm_in,
                  hls::stream<axis_out> &strm_out)
    {
        #pragma HLS INTERFACE axis port=config

        #pragma HLS INTERFACE axis_in port=strm_in
        #pragma HLS INTERFACE axis_out port=strm_out

        #pragma HLS INTERFACE m_axi port=din offset=slave bundle=gmem1 
        #pragma HLS INTERFACE m_axi port=dout offset=slave bundle=gmem2 

        ap_uint<24> tmp_config = 0;
        tmp_config[8] = (fft_dir == 1) ? 1 : 0;
        tmp_config.range(4, 0) = fft_select;
        *config = tmp_config;

        hls_dma(SAMPLE_NUM, din, dout, strm_in, strm_out);
    }
}

Xilinx Forum : https://support.xilinx.com/s/question/0D54U00006ap3gpSAA/fft-ip-core-the-results-of-matlab-and-fft-ip-core-didnt-match-?language=en_US

Thank you in advance.

emulator ftt configuration 1 fft configuration 3 fft configuration 2

  • The input data:

enter image description here

  • The Matlab output:

matlab fft abs(output)

  • My fft ip core implementation output:

abs(output)

Note: I am ploting the abs of outputs.

Benjamin Buch
  • 4,752
  • 7
  • 28
  • 51
A.A.
  • 125
  • 1
  • 11
  • "the output I get is totally different" It would be helpful if you could show *how* they are different. Maybe it is shifted, or scaled? Some FFTs implementations normalize differently. It's always a good idea to plot your results and compare them visually. – Cris Luengo Mar 28 '23 at 14:36
  • @CrisLuengo, sorry for missing details, I just added the plot of the input and outputs data. – A.A. Mar 28 '23 at 15:05
  • 1
    So you get twice as many values from the FFT IP Core. Maybe it is interleaving the real and imaginary parts? My guess the issue is in how you interpret the result. – Cris Luengo Mar 28 '23 at 15:11
  • Well, my interpretation of the result was same as mentioned in the fft ip core documentation, in my case the output is fix32_31, so first 32 bits is the real and second 32bits is imaginary and so on.. you can see the C++ kernel responsible for the data stream management.(I also can see through the emulator the stream out from the core and I tried to compare some output samples and the look same, the data getting out from the core and the final data i have in memory. – A.A. Mar 28 '23 at 15:23
  • I don't know anything about Xilinx, so I'll assume you do that part right. In the configuration panel, it says "Transform Length", and it's set to 16384. Your data has <400 points. The transform size should match the data size. I don't know what happens in this implementation, maybe it's reading uninitialized data after the <400 points you pass in? I presume you didn't specify the transform size in MATLAB, but if you did, then MATLAB pads with zeros if the data is shorter, or crops if the data is longer than the given transform size. – Cris Luengo Mar 28 '23 at 15:30
  • well as I thought when `Run time Configurable Transform Length` is true , it will be configured at the run time and it depends of the config packet i send which has NFFT value to reconfigure the Transform Length and in my case I specified 8-point. – A.A. Mar 28 '23 at 15:39
  • And yes, I didn't specify the transform length in MATLAB, well that's my bad cause am noobie with MATLAB, but will see I will look for it and specified as I will try to specified in the core configuration without using the `Run time Configurable Transform Length` and see if there is any difference. – A.A. Mar 28 '23 at 15:43
  • 1
    You configured an 8-point transform? So each 8 output values are the FFT of the corresponding 8 input value. This is not comparable to what MATLAB does. MATLAB computed a 380 (or whatever the exact number of data points is) point transform. You need to match the transform length to the number of data points in your sample. – Cris Luengo Mar 28 '23 at 16:14
  • 1
    Note that, if in MATLAB you specify N=8, you will get only 8 output values, corresponding to the FFT of the first 8 input values. It will ignore the rest of the input. You'd have to loop to get a result similar to what you get in Xilinx. – Cris Luengo Mar 28 '23 at 16:16

1 Answers1

0

The mistake was in the Transform Point Size, In Xilinx FFT IP Core I was applying the FFT with 8-point as Transform Length and that's make sense that I didn't get same results as MATLAB, because the transformation I did in MATLAB was for all the samples together at once. I specified same Transform Length in MATLAB and Xilinx FFT IP Core to get at the end same results.

A.A.
  • 125
  • 1
  • 11