Summary
This paper introduces a real-time video transmission system based ON Gigabit Ethernet implemented by ANLOGIC (Shanghai Anlu Information Technology Co., Ltd.) AL3 FPGA. This paper first briefly introduces the characteristics of AL3 FPGA, KSZ9031RNX Gigabit PHY chip and video transmission bandwidth calculation, etc., and then describes the system hardware architecture and FPGA program architecture in detail, and solves the video cache bandwidth bottleneck by using ping-pong page mode access to SDRAM , to achieve 1080P 30 frames / second real-time video transmission.
text
AL3LA10 is the latest FPGA device from Shanghai Anlu Technology. The device adopts the original LUT4/LUT5 hybrid logic unit structure, including 10K equivalent logic units, a total of 500Kbits RAM blocks, and has rich high-performance LVDS differential interfaces. The chip package is compatible with various chips such as ALTERA CYCLONE4 and XILINX SPARTAN6, and provides more RAM blocks and differential IO resources than the same foreign chip, which is very suitable for large-capacity video data collection, transmission and Display applications.
The KSZ9031RNX is a three-speed adaptive PHY chip that uses the RGMII interface to interface with the FPGA (MAC). The outstanding feature of this chip is that the delay of each pin of the RGMII interface can be set through the MDIO interface, which reduces the complexity of the FPGA end-to-end delay adjustment. TFP401A is a DVI video decoding chip from TI, which supports up to 1080P resolution. The chip receives the LVDS signal output by the graphics card, and outputs parallel field synchronization data of LVTTL level after decoding.
The maximum bandwidth of one Gigabit Ethernet port is 1Gb/s. This design uses dual-port Gigabit Ethernet, so the maximum video transmission bandwidth supported by this design is 2Gb/s. For 1080P 60 frames/second video, the bandwidth is 1920*1080*60*24, which is about 3Gb/s, which exceeds the maximum network transmission bandwidth of the system. The video bandwidth of 1080P 30 frames/second is 1920*1080*30*24, which is about 1.5Gb/s, which is less than the maximum network transmission bandwidth of this system, because this system supports 1080P 30 frames of video real-time transmission. Since the system uses two Gigabit Ethernet ports, the bandwidth of each port is about 750Mb/s, and the effective bandwidth is about 75%.
Figure 1 shows a block diagram of the hardware structure of the system. After the DVI video is output from the graphics card and decoded by TFP401A, it outputs the field synchronization dot matrix data of R8G8B8, including HSYNC, VSYNC, DE, ODCK, DATA and other signals. The FPGA receives and intercepts the video source according to the field synchronization signals such as HSYNC, VSYNC, DE and so on. valid video data. After the video data enters the FPGA, it will be stored and read out through two SDRAMs in a ping-pong manner. Each time the frame synchronization of the DVI video source is used as the switching signal, so that one frame is stored in the SDRAM each time, and then switched to another SDRAM. . When reading from the SDRAM, it is determined which data is sent to the first block PHY according to the control signal, and the remaining data is sent to the second block PHY. The PC software can set some parameters of the FPGA through the UART port, and can also read back the current working status information of the sending card.
Figure 1 Dual-port Gigabit Ethernet video transport card hardware
Figure 2 is the software architecture diagram of the system. The FPGA judges the starting point of each frame through the VSYNC signal, and uses this signal to control the switching of the SDRAM and the clearing of the internal FIFOs at all levels. The FPGA obtains the coordinates of the midpoint data of a frame according to the HSYNC and DE signals, and intercepts the valid video image data from it according to the control and transmits it to the asynchronous buffer FIFO1 at the next level. The asynchronous buffer FIFO1 is mainly used for DVI data input and asynchronous clock domain conversion of SDRAM read and write. The SDRAM read-write control module reads data from FIFO1 and writes it to SDRAM1 or SDRAM2 according to the depth indication information of FIFO1. At the same time, the SDRAM read-write control module judges the depth of FIFO2 and FIFO3. When the depth is less than a certain value, it reads from SDRAM1 or SDRAM2. Write to FIFO2 or FIFO3. The Ethernet packing module is responsible for packing the data according to the UDP format, including adding the UDP header, reading the data from FIFO2 or FIFO3 and filling it into the packet, and calculating the FCS check and adding it to the end of the packet. At the same time, another job of the module is to convert the data into double-edge data that meets the RGMII timing sequence and write it into the PHY chip KSZ9031RNX.
Figure 2 Software architecture diagram of dual-port Gigabit Ethernet video transmission card
UART is the interface between the PC software and the transmission card. It is used by the PC to control the video capture resolution of the transmission card. Usually, all the video received by DVI is not sent out through Gigabit Ethernet, but part of it is intercepted. In addition, the PC can also control the resolutions sent by Gigabit Ethernet port 1 and Gigabit Ethernet port 2 respectively. For example, if a 1920*768 image is intercepted from the DVI output video, then port 1 and port 2 can send 1920*384 respectively. The number of lines sent by the two ports can be adjusted arbitrarily, but the total number remains consistent with the intercepted height.
The clock domain of the entire system is shown in Figure 3. The DVI video input part uses the pixel data input by TFP401A to accompany the clock ODCK. The clock frequency will change with the input video resolution. The video resolution is 1080P. The clock frequency is the highest, about is 157MHz. The SDRAM read and write control part uses PLL to output the clock with a frequency of 150MHz, and realizes the read and write of SDRAM under this clock frequency, and writes the data to the asynchronous FIFO2 and FIFO3. The Ethernet packing and sending part uses a fixed clock of 125MHz, because the sending clock of Gigabit Ethernet is a fixed 125MHz.
Figure 3 Dual-port Gigabit Ethernet video transmission card software clock domain
The SDRAM control module is responsible for the control of data writing and reading out of the SDRAM. Since the working clock of the module is 150MHz, the working clock of the SDRAM is also 150MHz. The address of SDRAM is multiplexed by row and column addresses, and precharge and refresh time are also required. Therefore, the number of effective read and write data for one SDRAM page access is 256, while the actual number of clocks for writing is 285, and the number of clocks actually spent for reading is 300, from which it is calculated that the actual write effective bandwidth of SDRAM is 256/285*150MHz, and the actual readout effective bandwidth of SDRAM is 256/300*150MHz. 1080P, 30 frames/second image data bandwidth is 1920*1080*30, about 62.5MHz, due to the use of ping-pong storage method, in the process of storing and reading each frame of image, as long as the SDRAM write bandwidth is 265/285 *150>62.5 and readout bandwidth 256/300*150>62.5. It can be seen from this place that the read and write bandwidth of the actual SDRAM has a large margin relative to the data bandwidth, but in fact the operating clock of the SDRAM cannot be reduced. Because each line of DVI image data is input continuously, 1080P input 1920*1.2 points continuously for each line, of which the first 1920 points are valid pixels, and the rest are line blanking. To ensure that the data in FIFO1 is written into SDRAM before the next line of data comes, so that FIFO1 will not be full, it must satisfy F_SDRAM*1920>F_ODCK*1920*1.2, the maximum clock of F_ODCK is 157MHz, so the minimum clock of F_SDRAM is 131MHz, so this design finally selects the 150MHz SDRAM working clock.
Summarize
This paper designs a dual-port Gigabit Ethernet transmission card, which can support 1080P 30 frames per second video transmission at the most, and the resolution and image area of the transmitted image can be flexibly controlled by PC software. This design improves the read and write bandwidth of SDRAM through SDRAM’s ping-pong page mode access. Finally, the hardware board as shown in Figure 4 is made according to the design, and the corresponding FPGA program is written and tested on the hardware board. The system performance and stability meet the requirements.
Practice has proved that Anlu AL3 FPGA chip has stable performance, and is not inferior to foreign chips in terms of logic efficiency, IP module and timing performance. Several high-speed clock timing margins are involved in this design, and some performance indicators even exceed foreign chips. In addition, the supporting TD compilation software has relatively complete functions, fast compilation speed, accurate timing, easy to use, and can meet general development needs.
Figure 4 Dual-port Gigabit Ethernet video transmission card