2021-02-02

FPGA tutorial Step 5: Clock speed, timing and slack

Creating a new test project

The following shows a custom module connected to a 10mhz clock on a module ready to provide 20mhz, 40mhz and 100mhz:

image-20210209203108466

VHDL code:


library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;

entity Blinker is
    Port ( clk : in STD_LOGIC;
           sw : in STD_LOGIC_VECTOR (15 downto 0);
           led : out STD_LOGIC_VECTOR (15 downto 0));
end Blinker;

architecture Behavioral of Blinker is

begin

    process(clk) is
    
    variable X : integer range 0 to 1023;
    
    begin
        if rising_edge(clk) then
            
            -- Count
            if X < 2 then
                X := 2;
            end if;
            
            X := X + 1;
            
            if X < 512 then
                led(2) <= '1';
            else 
                led(2) <= '0';
            end if;
        end if;
        
    end process;

end Behavioral;

Viewing the schematics

This setup contains some simple logic with a little calculation and a check if X is below 512. When running sythesisand the clicking "Schematic" on the left hand side, you will get a complete overview of how vivado chose to wire this code and block design:

image-20210209203624312

Zoom in and expand the design_1 box:

image-20210209203659376

We keep zooming in even more and expand the blinker logic; the part we have built:

image-20210209203737345

Here it becomes visible how simple or complex your vhdl code ends up when synthesised. Even though flip flops run fast do not switch in 0-time. And all the little delay in every part of the program adds up and gives a total "slack"; The more depth the system has, the more delay will there be from the time the clock raises till the very last element has been updated to the far right hand side.

Checking delay (report timing summary)

Now try clicking "Report Timing Summary" on the left hand side and just hit OK in the next window:

image-20210209203943721

image-20210209204045368

As long as the number (highlighted with yellow here) is positive, everything is fine. The positive number 97.432ns indicates, that there are around 97.5 nanoseconds more to be used before issues will occur. This is an eternity in the fpga world. But this demo is only running 10mhz. The faster clock cycle, the shorter period of time is available during each clock cycle.

So how much time do we have available in total per clock cycle in this example? Well thats simple to calculate. Here we run with 10mhz meaning we perform 10,000,000 clock cycles per second giving us 1,000,000,000 (1 second in nano seconds) / 10,000,000 (100mhx) = 100 ns. That is 100 nanoseconds for the entire logic to be executed. This means, that only around 3.5ns has been used so far.

Changing clock cycle

Now, if we have used only 3.5ns, we should be able to switch to using 100mhz clock cycle, giving us a total of 1,000,000,000 / 100,000,000 = 10ns to work with on each clock cycle. This should give us a remainder of 10ns - 3.5ns = 7.5 ns. Updated block design:

image-20210209205506467

Lets change the wiring to the clocking wizard, re-synthesising and review the timing. Please note that you have to click "Reload" as the views will still be showing old values:

image-20210209205014412

After refreshing, lets see if the prediction of around 7.5ns:

image-20210209205613496

So far so good. We are now able to determine how much slack will be affected when changing clock speed.

Exceeding the available time and how to deal with it

Let's go add some more logic to the vhdl-file causing the complexity of the synthesis to rise and hence exceeding the execution time available. I know that division typically requires a lot of logic to be implemented, so I will try add some simple division and afterwards check the schematics and the delay.

New VHDL:


library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;

entity Blinker is
    Port ( clk : in STD_LOGIC;
           sw : in STD_LOGIC_VECTOR (15 downto 0);
           led : out STD_LOGIC_VECTOR (15 downto 0));
end Blinker;

architecture Behavioral of Blinker is

begin

    process(clk) is
    
    variable X : integer range 0 to 1023;
    
    begin
        if rising_edge(clk) then
            
            -- Count
            if X < 2 then
                X := 2;
            end if;
            
            X := X + 1;
            
            if X < 512 then
                led(2) <= '1';
            else 
                led(2) <= '0';
            end if;
            
            if X > 800 then
                X := 300 / X;
            end if;
        end if;
        
    end process;

end Behavioral;

Lets synthesise and see the timing result.

image-20210209210125673

This time, there has been used 9.765ns more then available. This means that a total of 19.765 has been used. And we can confirm that the program has become a lot more complex by looking at the schematics again:

image-20210209210825730

Decrease the clock speed

As we were running a 100mhz clock, reducing the clock speed from 100mhz to 40mhz will change the time available from 10ns to 1,000,000,000 / 40,000,000 = 25ns. This should leave us with 25ns - 19.765ns = 5.235ns of unused time.

Lets give it a try. The block design will look like this:

image-20210209210513322

And now we can confirm the expected positive slack (meaning time available) was almost exactly the anticipated value:

image-20210209210947637