In the last article I described how to simulate cellular automata on FPGA. The result wasn't spectacular – only some LEDs flashing. Soon after that I wanted to see more cells, so I decided to generate VESA output.
VESA signals were designed to be used with CRT. In this scary technology that was widely used when I was a kid, electron beam is magnetically deflected to scan surface of phosphorescent screen.
The beam travels from left to right to draw a line (by exciting phosphorescent layer of the screen). Than it goes back to left side of the screen and a bit down. Than it draws next line and so on. As a result it goes from top to the bottom. When it reaches bottom of the screen it goes back to the top.
There are five signals used:
R, G, B signals are analog. HSync and VSync signals are digital.
HSync signal was used to mark time when electron beam goes back to the beginning of a scanline. Similarly, VSync was used to mark time when beam goes back to top of a screen. This can be visualized by following timing diagram:
Timings vary from one VESA mode to another. I decided to use 1280x1024@60 mode, because that's resolution of my old LCD screen.
I use 50 MHz external clock input and generate VESA signals:
module top ( input wire clk50, output wire vga_h_sync, output wire vga_v_sync, output reg vga_R, output reg vga_G, output reg vga_B );
I said before that R, G, B signals are analog... I drive them with digital output, so I can use only 8 colors.
Specification of 1280x1024@60 mode says that pixel frequency is 108.0 MHz. I used clock synthesis capabilities of my FPGA to get as close as possible to this frequency. By multiplying frequency times 13 it goes up to 650 MHz, than it is divided by 6 to go down to 108.333... MHz.
I also generate clock signal of the same frequency, but phase shifted by 180 degrees (clkps
below). This will be useful for driving memory.
wire clk; wire clkps; clkdiv clkdiv ( .CLK_IN1(clk50), .CLK_OUT1(clk), .CLK_OUT2(clkps) );
clkdiv
above is an instance of a standard Spartan module that can be used for clock synthesis.
I use separate module to generate vga_h_sync
and vga_v_sync
outputs.
It also generates CounterX
and CounterY
data lines that tell which
pixel is being displayed. inDisplayArea
and inPrefetchArea
are helper signals and I will tell more about them later.
wire inDisplayArea; wire inPrefetchArea; wire [10:0] CounterX; wire [10:0] CounterY; sync_gen_1024x1080 syncgen ( .clk(clk), .vga_h_sync(vga_h_sync), .vga_v_sync(vga_v_sync), .inDisplayArea(inDisplayArea), .inPrefetchArea(inPrefetchArea), .prefetchCounterX(CounterX), .counterY(CounterY) );
I use static memory built in on FPGA chip to store pixel data for current line and next line of the image.
my_ram image_ram ( .clka(clkps), // input clka - has to be shifted from clock that generates data and address .ena(filler_read || gen_read), // input ena .addra(filler_read ? filler_addr : gen_raddr), // input [9 : 0] addra .douta(rdata), // output [15 : 0] douta .enb(gen_write || rst_write), .addrb(gen_write ? gen_waddr : rst_waddr), .dinb(gen_write ? gen_wdata : rst_wdata) );
Current line is being read by filler
module and sent to RGB signals.
wire filler_read; wire [9:0] filler_addr; wire [15:0] rdata; assign filler_addr[9:8] = 0; two_lines filler ( .clk(clk), .CounterX(CounterX), .CounterYparity(CounterY[0]), .inDisplayArea(inDisplayArea), .inPrefetchArea(inPrefetchArea), .read(filler_read), .addr(filler_addr[7:0]), .data(rdata), .image(image) ); always @(posedge clk) begin vga_R <= image & inDisplayArea; // one cycle of delay vga_G <= image & inDisplayArea; // (because we want no logic after reading signal from register to minimize output delay) vga_B <= image & inDisplayArea; end
Next line is computed from previous line by a process simulating cellular automaton when RGB data are not being sent (front porch, HSync and back porch).
wire gen_read; wire gen_write; wire [9:0] gen_raddr; wire [9:0] gen_waddr; wire [15:0] gen_wdata; assign gen_raddr[9:8] = 0; assign gen_waddr[9:8] = 0; ca_gen gen ( .clk(clk), .start(CounterY < 11'd 1023 && CounterX == 11'd 1296), .direction(CounterY[0]), .read(gen_read), .raddr(gen_raddr[7:0]), .rdata(rdata), .write(gen_write), .waddr(gen_waddr[7:0]), .wdata(gen_wdata) );
There is separate process to initialize first line of the image. I initialize it with a single white pixel that moves from left to right.
wire rst_write; wire [9:0] rst_waddr; wire [15:0] rst_wdata; assign rst_waddr[9:8] = 0; ca_gen0 reset ( .clk(clk), .start(CounterY == 11'd 1023 && CounterX == 11'd 1296), .write(rst_write), .waddr(rst_waddr[7:0]), .wdata(rst_wdata) );
Process that generates first line of the image is wired to memory
module by rst_write
, rst_waddr
and rst_wdata
output lines.
As input lines it gets clock signal (clk
) and start
signal
that is activated at some point of time after last line
and before first line of the next frame.
module ca_gen0( input wire clk, input wire start, output reg write, output reg [7:0] waddr, output reg [15:0] wdata );
Its internal state consists of cycle
(current internal clock cycle)
and position
registers.
reg [10:0] position; initial position = 0; reg [7:0] cycle; initial begin cycle = 8'd 160; wdata = 16'b0; waddr = 8'b0; end
When start
signal comes, cycle
is initialized to 0 and position
is incremented modulo 1280 (we have 1280 pixels in a line).
if (start) begin wdata <= 16'b0; write <= 0; cycle <= 0; position <= (position + 1'b1) % 1280; end
Before cycle
reaches its final value of 160 (this is how many 16bit
memory writes are necessary to initialize used memory area) it
sets write
to 1, increments cycle
and writes some data to
wdata
bus. cycle
is being copied to waddr
. All of this happens
at positive edge of a clock.
always @ ( posedge clk ) begin if (cycle != 8'd 160) begin if (cycle == position[10:4]) wdata <= (1 << 15-position[3:0]); else wdata <= 16'b0; write <= 1; waddr <= cycle; cycle <= cycle + 1'b1; end ...
It is important to note that clock for memory module is phase shifted by 180 degrees. This gives some time for signals to stabilize before positive edge of memory clock signal comes.
How much memory is needed? One line of the image has 1280 bits. We need to store previous and next line, because next line is calculated from previous line. This gives 2560 bits - not much. My Spartan-6 chip XC6SLX16 has 64 blocks of 9 Kibits (72 KiB in total).
To access memory I use BRAM_SDP_MACRO. It gives some abstraction over Spartan memory primitives. I set read width and write width to 16 bits. I select option to use output register. Documentation of this option says:
A value of 1 enables to the output registers to the RAM enabling quicker clock-to-out from the RAM at the expense of an added clock cycle of read latency. A value of 0 allows a read in one clock cycle but will have slower clock to out timing.
So I expect values to be present at the output after two clock cycles of RAM. Since RAM clock is phase shifted, I expect values after three main clock cycles.
Module that generates next line of the image has similar structure to the module that generates first line of the image. There are two differences:
Combinatorial logic is calculating next generation of cellular automaton state from previous generation. I've decided to do this in 16-bit blocks (memory width). Since state of a cell depends also on state of a neighboring cells I need 18 bits of an input for 16-bit output. I want screen to wrap around edge, so as value before first value in a line I take last value of a line.
So the sequence memory reads is:
We want last three memory reads to be available, so we store values in a shift register (which is filled through buffer2):
always @ ( posedge clk ) begin buffer0 <= buffer1[0]; buffer1 <= buffer2; end
The sequence of memory writes is simple: write from the first word till last one. But it needs to be delayed by right count of clock cycles (data must be available + one clock cycle for combinatorial logic of cellular automata).
Another thing is: we shouldn't write to the same memory location we will be reading from. Double buffering (of scanline pixels) is used. First buffer uses memory addresses 0-79, second uses words 80-159. This makes incrementing memory counter slightly complicated (I'm sure this could be simplified, e.g. by aligning memory ranges to power of 2):
function [7:0] increment_addr; input [7:0] addr; begin if (addr == 8'd 79) increment_addr = 8'd 0; else if (addr == 8'd 159) increment_addr = 8'd 80; else increment_addr = addr + 1'b1; end endfunction
To sum up, here is the pseudocode:
1. buffer[2] <= mem[rbegin+79] 2. buffer[2] <= mem[rbegin] 3. buffer[2] <= mem[rbegin+1] 4. left = buffer[0][0] right = buffer[2][15] mem[wptr++] <= comb_ca(left, buffer[1], right) buffer[2] <= mem[rptr++] ... (as above) 84. mem[wptr] <= result
So far I've described how memory is filled with lines to be shown. But how is it sent to the screen?
From VGA synchronization module we are getting following information:
CounterX
- current pixel in a line (pixel data is expected
to be sent when it's 0-1279)CounterY
- least significant bit of line number (remember? lines
are double buffered...)inDisplayArea
- are we in display area of a screen? - it's easier
to calculate this in synchronization moduleinPrefetchArea
- set to true some time before inDisplayArea
is true to give time for memory readThe code is ugly, but it somehow works...
reg [15:0] display_reg; reg load_to_reg; always @* begin if (CounterYparity) addr[7:0] = CounterX[10:4] + 7'd 80; else addr[7:0] = CounterX[10:4]; read = inPrefetchArea && (CounterX[3:0] == 4'b0001); // on 0000 inDisplayArea is still false load_to_reg = inPrefetchArea && (CounterX[3:0] == 4'b1111); image = inDisplayArea && display_reg[15-CounterX[3:0]]; end always @(posedge clk) begin if (load_to_reg) display_reg <= data; end
Most of the code above is combinatorial – not using clock input. Clock is used only in the part for storing data from memory into register. Would it be better to have registers for output signals?
For me this is both hard and important question when designing for FPGA. Maybe people designing digital electronics have some intuition about that. I don't have good intuition yet, but I see tradeoff:
In this project clock is set to 108 MHz because of video mode requirements. If combinatorial logic is simple enough to have propagation delay less than 9.25 ns (~length of clock cycle) than I think it's the right level of complication.
Some of the output signals go to memory, which is clocked by signal shifted by 180 degrees. These signals have to propagate in half of clock cycle, so I think here is potential bottleneck. I'm pretty sure it's possible to calculate propagation delay of a signal with Xilinx tools, but I haven't tried that.
Finally, we go back to synchronization module. This is the heart of this project. It not only generates VESA synchronization signals (hsync, vsync), but also:
Let's dive into the code.
We have separate block for current position (counterX
and counterY
).
wire counterXmaxed = (counterX == 11'd 1687); wire counterYmaxed = (counterY == 11'd 1065); always @(posedge clk) begin if (counterXmaxed) counterX <= 0; else counterX <= counterX + 1'b1; if (counterXmaxed && counterYmaxed) counterY <= 0; else if (counterXmaxed && !counterYmaxed) counterY <= counterY + 1'b1; end
Looks pretty straightforward. But counterX
is actually private register.
It is set to be zero when horizontal synchronization starts.
This is convenient when calculating vga_h_sync
output.
But outside of synchronization module it's more convenient
to start counting X position when display area starts or
some fixed time before (to have time to fetch pixel values from memory).
I have decided to start counting X position 16 pixels before
visible area. The name of the variable is prefetchCounterX
.
`define FRONT_MARGIN 16 wire [10:0] xShift = 112 + 248 - `FRONT_MARGIN; always @(posedge clk) begin // polarity of sync pulse is positive vga_h_sync <= counterX < 112; if (counterX == xShift) prefetchCounterX <= 0; else prefetchCounterX <= prefetchCounterX + 1'b1; end
Why 112 + 248 - 16
above?
Horizontal: width start - visible area - 1280 0 - front porch - 48 1280 - sync pulse - 112 1328 - back porch - 248 1440 - whole line - 1688
I do no such tricks with Y position, so block for vga_v_sync
is check for range of counterY
:
always @(posedge vga_h_sync) begin vga_v_sync <= counterY >= 11'd 1056 && counterY < 11'd 1059; end
Finally, we have control signals that inform about being in display area and in prefetch area (to access pixel values at beginning of a line in advance).
always @(posedge clk) begin inDisplayArea <= prefetchCounterX >= 11'd 15 && prefetchCounterX < 11'd 1295 && counterY < 11'd 1024; inPrefetchArea <= prefetchCounterX < 11'd 1280 && counterY < 11'd 1024; end
Hardware setup is very simple. Hsync
and vsync
are connected to matching VGA cable lines with 150 Ohm resistors (exact value doesn't really matter). R
, G
, B
lines are connected to matching VGA lines with 270 Ohm resistors.
In this project I generate video output by calculating it on the fly line by line. For some real video output usually whole video frame is stored in memory. To do this I have to learn using SDRAM memory that is included on FPGA board I have.