FPGA tutorial Step 5: Clock speed, timing and slack
Creating a new test project
The following shows a custom module connected to a 10mhz clock on a module ready to provide 20mhz, 40mhz and 100mhz:
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use ieee.numeric_std.all; entity Blinker is Port ( clk : in STD_LOGIC; sw : in STD_LOGIC_VECTOR (15 downto 0); led : out STD_LOGIC_VECTOR (15 downto 0)); end Blinker; architecture Behavioral of Blinker is begin process(clk) is variable X : integer range 0 to 1023; begin if rising_edge(clk) then -- Count if X < 2 then X := 2; end if; X := X + 1; if X < 512 then led(2) <= '1'; else led(2) <= '0'; end if; end if; end process; end Behavioral;
Viewing the schematics
This setup contains some simple logic with a little calculation and a check if X is below 512. When running sythesisand the clicking "Schematic" on the left hand side, you will get a complete overview of how vivado chose to wire this code and block design:
Zoom in and expand the design_1 box:
We keep zooming in even more and expand the blinker logic; the part we have built:
Here it becomes visible how simple or complex your vhdl code ends up when synthesised. Even though flip flops run fast do not switch in 0-time. And all the little delay in every part of the program adds up and gives a total "slack"; The more depth the system has, the more delay will there be from the time the clock raises till the very last element has been updated to the far right hand side.
Checking delay (report timing summary)
Now try clicking "Report Timing Summary" on the left hand side and just hit OK in the next window:
As long as the number (highlighted with yellow here) is positive, everything is fine. The positive number 97.432ns indicates, that there are around 97.5 nanoseconds more to be used before issues will occur. This is an eternity in the fpga world. But this demo is only running 10mhz. The faster clock cycle, the shorter period of time is available during each clock cycle.
So how much time do we have available in total per clock cycle in this example? Well thats simple to calculate. Here we run with 10mhz meaning we perform 10,000,000 clock cycles per second giving us 1,000,000,000 (1 second in nano seconds) / 10,000,000 (100mhx) = 100 ns. That is 100 nanoseconds for the entire logic to be executed. This means, that only around 3.5ns has been used so far.
Changing clock cycle
Now, if we have used only 3.5ns, we should be able to switch to using 100mhz clock cycle, giving us a total of 1,000,000,000 / 100,000,000 = 10ns to work with on each clock cycle. This should give us a remainder of 10ns - 3.5ns = 7.5 ns. Updated block design:
Lets change the wiring to the clocking wizard, re-synthesising and review the timing. Please note that you have to click "Reload" as the views will still be showing old values:
After refreshing, lets see if the prediction of around 7.5ns:
So far so good. We are now able to determine how much slack will be affected when changing clock speed.
Exceeding the available time and how to deal with it
Let's go add some more logic to the vhdl-file causing the complexity of the synthesis to rise and hence exceeding the execution time available. I know that division typically requires a lot of logic to be implemented, so I will try add some simple division and afterwards check the schematics and the delay.
library IEEE; use IEEE.STD_LOGIC_1164.ALL; use ieee.numeric_std.all; entity Blinker is Port ( clk : in STD_LOGIC; sw : in STD_LOGIC_VECTOR (15 downto 0); led : out STD_LOGIC_VECTOR (15 downto 0)); end Blinker; architecture Behavioral of Blinker is begin process(clk) is variable X : integer range 0 to 1023; begin if rising_edge(clk) then -- Count if X < 2 then X := 2; end if; X := X + 1; if X < 512 then led(2) <= '1'; else led(2) <= '0'; end if; if X > 800 then X := 300 / X; end if; end if; end process; end Behavioral;
Lets synthesise and see the timing result.
This time, there has been used 9.765ns more then available. This means that a total of 19.765 has been used. And we can confirm that the program has become a lot more complex by looking at the schematics again:
Decrease the clock speed
As we were running a 100mhz clock, reducing the clock speed from 100mhz to 40mhz will change the time available from 10ns to 1,000,000,000 / 40,000,000 = 25ns. This should leave us with 25ns - 19.765ns = 5.235ns of unused time.
Lets give it a try. The block design will look like this:
And now we can confirm the expected positive slack (meaning time available) was almost exactly the anticipated value: