1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
\r
4 <title>Hardware implementation of Theora decoding</title>
\r
5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
\r
9 <h1 align="center">Hardware implementation of Theora decoding</h1>
\r
10 <h1 align="center">Integration with
\r
12 <p align="center">by student André Luiz Nazareth da Costa (andre.lnc [at] gmail.com)</p>
\r
13 <p align="center">and mentor Timothy B. Terriberry (tterribe [at] vt.edu)</p>
\r
16 <p>Decode a video in the Theora format requires a great power of processing.
\r
17 In this way, the development of a specify hardware for it is a viable solution
\r
18 and some modules had already been made successful in hardware on GSoC 2006
\r
19 (Google Summer of Code). The idea is you get the FPGA with small embedded processor
\r
20 and to put just the critical modules in the hardware.<br>
\r
21 Goal of my project is to give continuity to the project of the last year, putting
\r
22 one or more modules in hardware and then diminishing the cpu-time processing.
\r
23 This implementation will be done in VHDL and synthesized to the Altera Stratix
\r
24 II FPGA. GSoC Project page: <a href="http://code.google.com/soc/2007/xiph/appinfo.html?csaid=4235040C184DBD68">http://code.google.com/soc/2007/xiph/appinfo.html?csaid=4235040C184DBD68</a></p>
\r
25 <h4>XIPH and Theora</h4>
\r
26 <p>The Xiph.Org Foundation (<a href="http://www.xiph.org/">http://www.xiph.org/</a>) is a non-profit corporation dedicated to protecting the foundations of Internet multimedia from control by private interests. The purpose is to support and develop free, open protocols and software to serve the public, developer, and business markets.
\r
27 Theora is the video codec from Xiph, based on the VP3 codec donated by On2 Technologies.</p>
\r
29 <p>Google Summer of Code (<a href="http://code.google.com/soc/">http://code.google.com/soc/</a>) is a program that offers
\r
30 student developers stipends to write code for various open source projects.
\r
31 Google works with a several open source, free software and technology-related
\r
32 groups to identify and fund several projects over a three month period. Historically,
\r
33 the program has brought together over 1,000 students with over 100 open source
\r
34 projects, to create hundreds of thousands of lines of code. The program, which
\r
35 kicked off in 2005, is now (2007) in its third year.</p>
\r
36 <h4>Hardware implementation</h4>
\r
37 <p>The First step (analysis of theora decoding process) was studied firstly by
\r
38 Felipe Portavales and after by Leonardo Piga. The conclusion was that the function
\r
39 reconrefframes waste approximately 60% of CPU-time, but functions before have
\r
40 a lot of struct's of decision and few struct's of processing (like multiplication).
\r
41 You can see this on <a href="http://svn.xiph.org/trunk/theora-fpga/doc/">http://svn.xiph.org/trunk/theora-fpga/doc/</a>. Thus, this
\r
42 first part isn't too interesting to be done in Hardware, but the reconrefframes is.</p>
\r
43 <p>Felipe Portavales did the iDCT and Leonardo Piga did the others functions.
\r
44 The VHDL simulation is OK and the synthesis in FPGA is OK too.</p>
\r
45 <p>But, the integration was did with NIOS processor, which is a proprietary processor.
\r
46 The alternative of a nonproprietary processor was the LEON.<br>
\r
47 NIOS has a good interface and good support for FPGA. LEON has a different interface
\r
48 and is very flexible. Then, I started to study more about this processor, my
\r
49 first goal from GSoC was to do all the integration of Theora Hardware with
\r
50 LEON. This page describe how to do this integration step by step. </p>
\r
52 <p>Theora: <a href="http://www.theora.org/">http://www.theora.org/</a></p>
\r
53 <p>Xiph: <a href="http://www.xiph.org/">http://www.xiph.org/</a></p>
\r
54 <p>Theora Hardware Wiki: <a href="http://wiki.xiph.org/index.php/TheoraHardware">http://wiki.xiph.org/index.php/TheoraHardware</a></p>
\r
55 <p>Google Summer of Code page: code : <a href="http://code.google.com/soc/">http://code.google.com/soc/</a></p>
\r
56 <p>GSoC Project page: <a href="http://code.google.com/soc/2007/xiph/appinfo.html?csaid=4235040C184DBD68">http://code.google.com/soc/2007/xiph/appinfo.html?csaid=4235040C184DBD68</a></p>
\r
57 <p>Gaisler : <a href="http://www.gaisler.com">http://www.gaisler.com</a></p>
\r
58 <p>Vorbis Hardware implementation on LEON2: <a href="http://oggonachip.sourceforge.net/">http://oggonachip.sourceforge.net/</a></p>
\r
59 <p>MP3 Hardware implementation on LEON2: <a href="http://lampiao.lsc.ic.unicamp.br/~billo/leon2_on_mblazeboard/index.htm">http://lampiao.lsc.ic.unicamp.br/~billo/leon2_on_mblazeboard/index.htm</a></p>
\r
61 <p>Leon Sparc: <a href="http://tech.groups.yahoo.com/group/leon_sparc/">http://tech.groups.yahoo.com/group/leon_sparc/</a></p>
\r
62 <p>Theora: <a href="http://lists.xiph.org/mailman/listinfo/theora-dev">http://lists.xiph.org/mailman/listinfo/theora-dev</a></p>
\r
63 <h2>1 - The LEON3 processor.</h2>
\r
64 <p align="center"><img src="leon3.JPG" width="746" height="393"></p>
\r
65 <p align="center">figure 1</p>
\r
66 <h4>About Gaisler</h4>
\r
67 <p>Gaisler Research provides a complete framework for the development of processor-based
\r
68 SOC designs. The framework is centered around the LEON processor core and includes
\r
69 a large IP library, behavioral simulators, and related software development
\r
71 <a href="http://www.gaisler.com">http://www.gaisler.com</a></p>
\r
72 <h4>About GRLIB</h4>
\r
73 <p>The GRLIB IP Library is an integrated set of reusable IP cores, designed for
\r
74 system-on-chip (SOC) development. The IP cores are centered around the common
\r
75 on-chip bus, and use a coherent method for simulation and synthesis.<br>
\r
76 <a href="http://gaisler.com/products/grlib/grlib.pdf">http://gaisler.com/products/grlib/grlib.pdf</a></p>
\r
77 <h4>Choosing a ideal configuration (add my Configuration File)</h4>
\r
78 <p>You need first to install the GRLIB (I worked with grlib-gpl-1.0.15-b2149.tar.gz
\r
79 ), It is following the instructions on grlib.pdf<br>
\r
80 After you have the GRLIB installed, you can run the "make xconfig" on "grlib/designs/leon3-altera-ep2s60-sdr" (I
\r
81 used the Stratix II EP2S60F672C5ES).<br>
\r
82 There, you can select this components:</p>
\r
83 <p>Component Vendor<br>
\r
84 LEON3 SPARC V8 Processor Gaisler Research<br>
\r
85 AHB Debug UART Gaisler
\r
87 AHB Debug JTAG TAP Gaisler
\r
89 LEON2 Memory Controller European
\r
91 AHB/APB Bridge Gaisler
\r
93 LEON3 Debug Support Unit Gaisler
\r
95 Generic APB UART Gaisler
\r
97 <p>My configuration file: <a href="config.in">config.in</a> </p>
\r
98 <p>Now, you can run the synthesis of your design (make quartus).<br>
\r
99 <em>FPGA problem pins:</em> You need pay attention in select the suitable design for
\r
100 your FPGA, else you can have problem with pin mapping.<br>
\r
102 <h4>Using the GRMON jTAG interface</h4>
\r
103 <p>GRMON is a general debug monitor for the LEON processor, and for SOC designs
\r
104 based on the GRLIB IP library.<br>
\r
105 We will use this to Load and execution of LEON applications<br>
\r
106 Manual: <a href="http://www.gaisler.com/doc/grmon.pdf">http://www.gaisler.com/doc/grmon.pdf</a></p>
\r
107 <p><a href="ftp://gaisler.com/gaisler.com/grmon/grmon-eval-1.1.21.tar.gz">ftp://gaisler.com/gaisler.com/grmon/grmon-eval-1.1.21.tar.gz</a></p>
\r
108 <p>Run GRMON with this command:</p>
\r
109 <p>grmon-eval -altjtag -u</p>
\r
110 <p>-altjtag : Connect to the JTAG Debug Link using Altera USB Blaster or Byte
\r
112 -u : Put UART 1 in loop-back mode, and print its output on monitor console.</p>
\r
114 <h2>2 - Application on LEON3</h2>
\r
115 <p align="center"><img src="2_libt.png" width="688" height="419"></p>
\r
116 <p align="center">figure 2</p>
\r
117 <h4>libtheora-1.0alpha6, libogg-1.1.3 and sparc-elf-3.4.4-1.0.29</h4>
\r
118 <p>Create a new path (like /theora_hardware/).<br>
\r
119 Do the download and unpack the libtheora-1.0alpha6.tar.gz on /theora_hardware/</p>
\r
120 <p><a href="http://downloads.xiph.org/releases/theora/libtheora-1.0alpha6.tar.gz">http://downloads.xiph.org/releases/theora/libtheora-1.0alpha6.tar.gz</a><br>
\r
121 tar -xzf libtheora-1.0alpha6.tar.gz</p>
\r
122 <p>Do the download and unpack the libogg-1.1.3.tar.gz on /theora_hardware/libtheora-1.0alpha6/</p>
\r
123 <p><a href="http://downloads.xiph.org/releases/ogg/libogg-1.1.3.tar.gz">http://downloads.xiph.org/releases/ogg/libogg-1.1.3.tar.gz</a><br>
\r
124 tar -xzf libogg-1.1.3.tar.gz</p>
\r
125 <p>Now, you will need to use the BCC (Bare-C Cross-Compiler). BCC is a cross-compiler
\r
126 for LEON2 and LEON3 processors.<br>
\r
127 Do the download and unpack the sparc-elf-3.4.4-1.0.29.tar.bz2 on /opt/</p>
\r
129 tar -C /opt -xjf sparc-elf-3.4.4-1.0.29.tar.bz2</p>
\r
130 <h4>dump_video.c modified and vector of input</h4>
\r
131 <p>How we are not running on a Linux, you will need to take care with file functions.
\r
132 You can to comment the fprint, to change the fread's to a vector of inputs
\r
133 and the fwrite will be just a printf. Like this:</p>
\r
134 <p><a href="dump_video_hardware.c">dump_video_hardware.c</a><br>
\r
135 <a href="insert vector_of_input.h">insert vector_of_input.h</a></p>
\r
136 <h4>BUG detected from OGG lib (unaligned address error).</h4>
\r
137 <p>There was a error Bug from OGG lib:</p>
\r
138 <p>IU in error mode (tt = 0x07)<br>
\r
139 400013a4 e8220011 st %l4, [%o0 + %l1]</p>
\r
140 <p>The trap type 0x07 is a memory access to unaligned address. Some architectures
\r
141 support unaligned stores, but SPARC does not (just in 4 by 4 bytes). I had
\r
142 a luck in to find a report from a group that put the Vorbis decoder on FPGA.
\r
143 It was a master thesis of 2 students http://oggonachip.sourceforge.net/.</p>
\r
144 <p>Then, you just need to type so extra lines in configure.in file (on Ogg library's,
\r
145 /theora_hardware/libtheora- 1.0alpha6/libogg-1.1.3/) as follows:</p>
\r
146 <p>AC_CHECK_SIZEOF(short,2)<br>
\r
147 AC_CHECK_SIZEOF(int,4)<br>
\r
148 AC_CHECK_SIZEOF(long,4)<br>
\r
149 AC_CHECK_SIZEOF(long long,8)"<br>
\r
151 <h4>Compilation of Libtheora for LEON3 architecture </h4>
\r
152 <p>You can run this script</p>
\r
153 <p># Export sparc-elf PATH<br>
\r
154 export PATH=/opt/sparc-elf-3.4.4/bin:$PATH</p>
\r
157 cd libogg-1.1.3/<br>
\r
159 <p># Set CROSS-Compiler and parameters<br>
\r
160 export CC=sparc-elf-gcc<br>
\r
161 export CXX=sparc-elf-gcc<br>
\r
162 export CFLAGS='-mv8 -msoft-float -static'<br>
\r
163 # -mv-8 generate SPARC V8 mul/div instructions - needs hardware multiply and
\r
165 # -msoft-float emulate floating-point - must be used if no FPU exists in the
\r
167 <p>#Configure and install OGG lib<br>
\r
168 ./configure --prefix=/theora_hardware/ --target=sparc-elf
\r
169 --host=sparc-elf --enable-static <br>
\r
172 <p>#Configure and make Theora for LEON (sparc)<br>
\r
174 ./configure --prefix=/theora_hardware/ --target=sparc-elf
\r
175 --host=sparc-elf --enable-static --disable-encode<br>
\r
177 <h4>How to do the test on figure 2</h4>
\r
178 <p>After last step, you will have the binary "dump_video_hardware".
\r
179 At first step (The LEON processor) you generated by the synthesis a programmer
\r
181 (leon3mp.sof) that now you can to programmer your FPGA. Then, open the Grmon interface
\r
182 and load the dump_video_hardware ("load dump_video_hardware"). Now, "run
\r
183 dump_video_hardware".</p>
\r
185 <h2>3 - LINUX on LEON3</h2>
\r
187 <p align="center"><img src="3_linux.png" width="688" height="357"></p>
\r
188 <p align="center">figure 3</p>
\r
189 <h4 align="left">Snapgear </h4>
\r
190 <p align="left">LINUX support for LEON2 and LEON3 is provided through a special
\r
191 version of the SnapGear Embedded Linux distribution. SnapGear Linux is a full
\r
192 source package, containing kernel, libraries and application code for rapid
\r
193 development of embedded Linux systems.</p>
\r
194 <p>Download the Snapgear:<br>
\r
195 <a href="ftp://gaisler.com/gaisler.com/linux/snapgear/snapgear-p33a.tar.bz2">ftp://gaisler.com/gaisler.com/linux/snapgear/snapgear-p33a.tar.bz2</a></p>
\r
196 <p>Snapgear Manual:<br>
\r
197 <a href="ftp://gaisler.com/gaisler.com/linux/snapgear/snapgear-manual-1.33.0.pdf">ftp://gaisler.com/gaisler.com/linux/snapgear/snapgear-manual-1.33.0.pdf</a></p>
\r
198 <p>Download the Sparc Linux Cross Compiler:<br>
\r
199 <a href="ftp://gaisler.com/gaisler.com/linux/snapgear/sparc-linux-1.0.0.tar.bz2">ftp://gaisler.com/gaisler.com/linux/snapgear/sparc-linux-1.0.0.tar.bz2</a></p>
\r
200 <p>Kernel versions that I am using: linux-2.6.21.1 for MMU system</p>
\r
201 <p>The tool-chain should be installed under /opt :</p>
\r
203 tar xjf /sparc-linux-1.0.0.tar.bz2</p>
\r
204 <p>Add /opt/sparc-linux/bin to your PATH.</p>
\r
205 <p>The SnapGear distribution can be installed anywhere:</p>
\r
206 <p>tar -xjf snapgear-p33a.tar.bz2</p>
\r
207 <p>General instructions on how to use SnapGear linux is provided with the distribution.</p>
\r
208 <p align="left"> </p>
\r
209 <h4 align="left">Testing</h4>
\r
210 <p>After programmer your FPGA with LEON3, you can open the GRMON with this command:<br>
\r
211 ./grmon-eval -altjtag -nb -abaud 38400 -nosram</p>
\r
212 <p>The GRMON should be started with -nb to avoid going into break mode on a page-fault
\r
213 or data exception.</p>
\r
214 <p><em>Problem with SRAM</em></p>
\r
215 <p>I disabled the SRAM (-nosram) because I had just 2 Mbit of SRAM on my FPGA,
\r
216 then I needed to load the kernel on SDRAM. But, I was having problems of memory
\r
217 mapping. Thus, I decided disable the SRAM.<br>
\r
219 <em>Serial and jTAG Dbg Link.</em></p>
\r
220 <p>The "-abaud 38400" set application baudrate for UART 1.<br>
\r
221 In order to have a konsole interface from linux you need to connect a serial
\r
222 cable with you computer. Then, you can use a program like "kermit" that
\r
223 provides a serial communication with your linux konsole on FPGA. Some FPGA´s
\r
224 has 2 serial connectors, BE SURE that you are using the suitable connector!.<br>
\r
225 I am using the follow configuration of kermit:</p>
\r
226 <p>set line /dev/ttyS0<br>
\r
227 define sz !sz \%0 > /dev/ttyS0 < /dev/ttyS0<br>
\r
228 set speed 38400<br>
\r
229 set carrier-watch off<br>
\r
230 set prefixing all<br>
\r
231 set parity none<br>
\r
232 set stop-bits 1<br>
\r
234 set file type bin<br>
\r
235 set file name lit<br>
\r
236 set flow-control none<br>
\r
237 set prompt "Sparc Linux Kermit> "<br>
\r
240 Now, load your kernel image (image.dsu) generated with Snapgear and to see
\r
241 your konsole running on kermit.</p>
\r
243 <h2>4 - Libtheora running on LINUX</h2>
\r
244 <p align="center"><img src="4_linux_libt.png" width="688" height="419"></p>
\r
245 <p align="center">figure 4</p>
\r
246 <h4>Libtheora compilation for Linux on LEON3</h4>
\r
247 <p>Now, you can use the original dump_video.c because you are using the linux.
\r
248 Then, you can to work with files. </p>
\r
249 <p># Export sparc-linux PATH<br>
\r
250 export PATH=/opt/sparc-linux/bin/:$PATH</p>
\r
253 cd libogg-1.1.3/<br>
\r
255 <p># Set CROSS-Compiler and parameters<br>
\r
256 export CC=sparc-linux-gcc<br>
\r
257 export CXX=sparc-linux-gcc<br>
\r
258 export CFLAGS='-msoft-float -fPIC -static'<br>
\r
259 # -msoft-float emulate floating-point - must be used if no FPU exists in the
\r
261 # -g generate debugging information - must be used for debugging with gdb<br>
\r
262 # -fPIC generate position independent machine code. It is necessary because
\r
263 we are using linux now.<br>
\r
264 # -static when linking an application static, all code used from libraries
\r
265 are included into the output binary</p>
\r
266 <p>#Configure and install OGG lib<br>
\r
267 ./configure --prefix=/homes_export/andre.lnc/theora/libtheora6_hard/ --target=sparc-linux
\r
268 --host=sparc-linux --enable-static <br>
\r
271 <p>#Configure and make Theora for LEON (sparc)<br>
\r
273 ./configure --prefix=/homes_export/andre.lnc/theora/libtheora6_hard/ --target=sparc-linux
\r
274 --host=sparc-linux --enable-static<br>
\r
276 <h4>How to do the test on figure 4</h4>
\r
277 <p>After generate the binary for LINUX on LEON3, you need to do a copy of this
\r
278 to /snapgear-p33/romfs/home/ and to make a image of linux kernel with the Theora
\r
279 compiled (dump_video). Don`t forget to do a copy of some video to /snapgear-p33/romfs/home/.
\r
280 Take care about size of your linux image, your SDRAM of FPGA needs to have
\r
281 space for this.</p>
\r
283 Programmer your board with LEON3;<br>
\r
284 Load the linux image on LEON3 (using grmon);<br>
\r
285 Open your kermit interface and set the configuration;<br>
\r
286 Run the linux kernel (using grmon);<br>
\r
287 Come back to kermit and you will see a konsole of Linux;<br>
\r
288 Now, go to home (cd home) and run the dump_video (./dump_video video.ogg);</p>
\r
290 <h2>5 - A Peripheral on LEON3</h2>
\r
291 <p align="center"><img src="5_apb_hardware.png" width="600" height="373"></p>
\r
292 <p align="center">figure 5</p>
\r
293 <h4>AHB and APB bus</h4>
\r
295 AHB is a new generation of AMBA bus which is intended to address the requirements
\r
296 of high-performance synthesizable designs. It is a high-performance system bus that
\r
297 supports multiple bus masters and provides high-bandwidth operation.<br>
\r
298 AMBA AHB implements the features required for high-performance, high clock
\r
299 frequency systems.<br>
\r
300 The APB is part of the AMBA hierarchy of buses and is optimized for minimal power
\r
301 consumption and reduced interface complexity.
\r
302 The AMBA APB appears as a local secondary bus that is encapsulated as a single AHB
\r
303 slave device. APB provides a low-power extension to the system bus which
\r
304 builds on AHB signals directly.<br>
\r
305 The APB bridge appears as a slave module which handles the bus handshake and
\r
306 control signal retiming on behalf of the local peripheral bus.
\r
308 <h4>AMBA Protocol</h4>
\r
309 <p>You can see details:<br>
\r
310 <a href="http://www.gaisler.com/doc/amba.pdf">http://www.gaisler.com/doc/amba.pdf</a></p>
\r
311 <h4>Why APB interface was choosed.</h4>
\r
312 <p>I was searching on
\r
313 teses and articles in order to decide where would be the best place for Theora
\r
314 Hardware and how I could to do the communication between software and hardware
\r
315 by bus and to pass the data's for hardware. I found many differents solution.<br>
\r
316 The AHB is a high speed bus suitable to connect units with high data rate.
\r
317 But, the problem is that the Theora Hardware will be a Master on AHB bus and
\r
318 could overload the bus and diminish the performance of LEON3. APB is slower
\r
319 than AHB. However the protocol is simpler than AHB and don't disturb the communication
\r
320 between LEON3 and Memory controller. Also, I found hybrids solution with APB
\r
321 and AHB, but I thought better to plug this just on APB bus.</p>
\r
323 <h2>6 - Plugging Theora Hardware on LEON3</h2>
\r
324 <p align="center"><img src="6_input_vector.png" width="707" height="457"></p>
\r
325 <p align="center">figure 6</p>
\r
326 <h4>How to include a APB core </h4>
\r
327 <p>How to include the Theora APB core</p>
\r
328 <p>Create the path grlib/lib/opencores/theora_hardware<br>
\r
329 Include ¨theora_hardware¨ on grlib/lib/opencores/dir.txt</p>
\r
330 <p>Download the revision 13432 from SVN on grlib/lib/opencores/theora_hardware/:<br>
\r
331 <a href="http://svn.xiph.org/trunk/theora-fpga/">http://svn.xiph.org/trunk/theora-fpga/</a>
\r
333 <p>You will need to change the name of entity syncram to tsyncram of the modules:
\r
334 Syncram, expand block, loopfilter, copyrecon, databuffer. It is because syncram
\r
335 is a name used in other different component from LEON3.</p>
\r
336 <p>Now, we need to create the theora_hardware.vhd and theora_amba_interface.vhd:<br>
\r
337 <a href="theora_hardware.vhd">theora_hardware.vhd</a> and <a href="theora_amba_interface.vhd">theora_amba_interface.vhd</a></p>
\r
338 <p>Create vhdlsyn.txt on grlib/lib/opencores/theora_hardware/vhdlsyn.txt and
\r
339 include all the vhdl`s</p>
\r
341 If you prefer, you can download these files here: <a href="theora_hardware1.tar">theora_hardware1.tar</a> </p>
\r
342 <p>You should include the Theora Hardware APB/AMBA (OPENCORES_THEORA_HARDWARE
\r
343 on VENDOR_OPENCORES) just changing the file devices.vhd (grlib/lib/grlib/amba/):<br>
\r
344 <a href="devices.vhd">devices.vhd</a></p>
\r
345 <p>Finally, we need instantiate the theora_hardware on leon3.vhd and take care
\r
346 about to use a selector free of APB slave output vector (apbo(i)):<br>
\r
347 <a href="leon3mp_2.vhd">leon3mp.vhd</a><br>
\r
349 Before synthesis ("make quartus"), Type the commands "make distclean" and "make
\r
350 script" on your path (design grlib/designs/leon3-altera-ep2s60-sdr/).<br>
\r
352 <h4>Addressing protocol that I did between Software Interface and Theora_amba_interface</h4>
\r
354 struct theora_regs_t {<br>
\r
355 volatile int flag_send_data; <br>
\r
356 volatile int data_transmitted;<br>
\r
357 volatile int flag_read_data; <br>
\r
358 volatile int data_received; <br>
\r
361 struct theora_regs_t * theora_regs = (struct theora_regs_t *)0x80000800;<br>
\r
363 flag_send_data (address 0x80000800): It is a flag used to the driver to know if can send a data to Theora Hardware. <br>
\r
364 data_transmitted (address 0x80000804): Data Transmitted to Theora Hardware<br>
\r
365 flag_read_data (address 0x80000808): It is a flag used to the driver to know Can the driver receive a data from Theora Hardware.<br>
\r
366 data_received (address 0x8000080C): Data received from Theora Hardware<br>
\r
369 <h4>How to do a Software interface</h4>
\r
370 If you can send a data, you need to do a loop on software until the flag_send_data to be '1'. Then you can send by data_transmitted.<br>
\r
371 If you can write a data, you need to do a loop on software until the flag_read_data to be '1'. Then you can send by data_received.<br>
\r
372 Below is a example of a simple software that I did to test this comunication:<br>
\r
374 <a href="send_vector_of_input.c">send_vector_of_input.c</a> and <a href="input.h">input.h</a><br>
\r
375 Compiling the software:
\r
376 <p> sparc-elf-gcc -mv8 -msoft-float -g send_vector_of_input.c -o send_vector_of_input.exe<br>
\r
378 <h4>Inputs and ReconRefFrames</h4>
\r
379 There is a correct sequence of inputs that you need to send to ReconRefFrames. You can generate this vector of inputs with a libtheora modified:<br>
\r
380 <a href="libtheora-1.0alpha6-fpga.tar.gz">libtheora-1.0alpha6-fpga.tar.gz</a>
\r
381 <h4>Theora AMBA interface</h4>
\r
382 <p>The <a href="theora_amba_interface.vhd">Theora_amba_interface</a> implement the APB/AMBA peripheral in order to receive and transmit the data's from driver to Theora_hardware using the Addressing protocol defined above and the ReconReframe protocol.
\r
384 <h2>7 - Integration Software and Hardware of Theora decoder on LEON3</h2>
\r
385 <p align="center"><img src="7_linux_libt_hardware.png" width="702" height="545"></p>
\r
386 <p align="center">figure 7</p>
\r
387 <h4>Driver Theora</h4>
\r
388 <p>A driver is necessary because we are using a linux. Then, a software running on linux can not write in a real address, it needs of a driver. <br>
\r
389 There are many tutorial on internet of how to do a character device, then I will not talk about these details.
\r
390 The parameters of transaction between software and driver that I did are these:<br>
\r
398 struct _data dt;<br>
\r
400 I/O control function: theora_ioctl(struct inode *inode, struct file *filp, unsigned int nFunc, unsigned long nParam)<br>
\r
402 If nFunc = '0' means that the driver will try to do a reading on Theora Hardware. If nFunc = '1', the driver will try to do a writting on Theora Hardware.<br>
\r
404 If occurred a successful reading, the dt.read will return 1. If not, will return 0.<br>
\r
405 If occurred a successful writting, the dt.wrote will return 1 and the data on dt.data. If not, the dt.wrote will return 0.<br>
\r
407 See the driver theora: <a href="theora.c">theora.c</a><br>
\r
409 <h4>How to include the driver on linux image</h4>
\r
411 Include the theora.c (the driver) on snapgear-p33/linux-2.6.21.1/drivers/char/ <br><br>
\r
412 Include the line "obj-$(CONFIG_THEORA) += theora.o" on snapgear-p33/linux-2.6.21.1/drivers/char/Makefile. Like this <a href="Makefile_char">Makefile</a><br><br>
\r
413 Include the lines ...<br>
\r
415 bool "Theora Driver"<br>
\r
418 ... on snapgear-p33/linux-2.6.21.1/drivers/char/Kconfig. Like this <a href="Kconfig">Kconfig</a><br><br>
\r
420 You need to make sure to select a unique number from the snapgear-p33/linux-2.6.21.1/Documentation/devices.txt. In my case was the number 121.<br>
\r
421 Then, you need to add the line "DEVICES = theora,c,121,0 \" on snapgear-p33/vendors/gaisler/leon3mmu/Makefile. Like this <a href="Makefile_leon">Makefile</a>. It will create a /dev/mydriver each time make is run.<br><br>
\r
422 Now, if you want generate the linux image, you just need to do a "make" on snapgear-p33 path<br><br>
\r
423 When you boot the linux from FPGA you will see these lines:<br>
\r
424 Loading theora ...<br>
\r
425 LEON THEORA driver by Andre Costa (2007) - andre.lnc@gmail.com<br>
\r
429 Below I will describe some errors that I had with the driver and how to solve it:
\r
430 <p>- Unable to handle kernel paging request at virtual address 80000000:<br>
\r
431 The MMU protects certain memory spaces, you either bypass the MMU using
\r
432 the SPARC specific STA or LDA instructions (not recommended) or use
\r
433 ioremap to inform the MMU about the new area. In my case I used the ioremap.
\r
436 - Warning: ioremap: done with statics, switching to malloc<br>
\r
437 Error (running on FPGA): alloc_io_res(phys_80000800): cannot occupy<br>
\r
441 The problem is that you repeatedly call ioremap(). You
\r
442 should do this once and keep the pointer returned from ioremap and use
\r
443 this to access the hardware in the rest of the code. I was using the ioremap on ioclt(), but It should be on theora_init().
\r
445 <p>- BUG: soft lockup detected on CPU#0! <br>
\r
446 Soft lockup is when the kernel fails to reschedule for 10 seconds. This
\r
447 implies that your driver does not yield the CPU. For example, in your
\r
448 read/write functions you should either return immediately or sleep until
\r
449 woken up by an interrupt. You may not busy wait. I was doing the loop (until receive a data from theora_hardware) on driver, but It should do just on modified libtheora software.
\r
451 <h4>How to cut LIBTHEORA and to send the data's to Driver Theora</h4>
\r
453 You will to edit the dct_decode.c from libtheora. First, open the driver: pf = open("/dev/theora",O_RDONLY|O_WRONLY|O_TRUNC|O_CREAT);
\r
454 The function write_theoradriver(int pf, int data) that was implemented is responsable to send a data to the driver. Then, we need to send all the data's and receive in a correct sequence. Take care about this, if just a data was not sent or read it's can stop all the pipeline of decodification. You can receive back the data in order to compare the output.<br>
\r
455 <a href="dct_decode2.c">dct_decode2.c</a> <br>
\r
456 <a href="codec_internal.h">codec_internal.h</a> (some little changes on this file)<br>
\r
458 <h2>8 - Video Controller</h2>
\r
459 <p align="center"><img src="8_video_controller.png"></p>
\r
460 <p align="center">figure 8</p>
\r
461 <p>The controller consists of a YUV to RGB converter and a video signal generator that send the signal to a D/A converter.</p>
\r
462 <h4>Lancelot board</h4>
\r
463 <p align="center"><a href="http://www.microtronix.com/products/?product_id=97"><img src="lancelot.jpg"></a></p>
\r
464 <p>It is a video D/A converter and It is necessary because the Stratix II doesn't have one.<br>
\r
465 You should read the <a href="http://www.microtronix.com/_dat/products/files/97/lancelotusermanual.pdf">Manual</a>
\r
467 <h2>9 - Integration Video Controller and Hardware Theora</h2>
\r
468 <p align="center"><img src="9_integration_video.png"></p>
\r
469 <p align="center">figure 9</p>
\r
471 Leonardo Piga did a video controller and he plugged it on NIOS. Then I worked in order to pluged this video controller on my LEON-Theora integration and I found some problems that I will describe.</p>
\r
473 <p><i>dct_decode:</i> The differences between this dct_decode.c and dct_decode2.c is that now we don't need to receive the outputs of reconrefframe and compare with software, we just need to send the data's predecoded to reconrefframe.
\r
474 Beyond this, we need to send the height and the width, because the videocontroller will request.<br>
\r
475 You can see my dct_decode: <a href="dct_decode.c">dct_decode.c</a> and <a href="dump_video.c">dump_video.c</a> (Now we can't see print to any file, the data's are transmitted to theora_amba_interface)<br>
\r
477 <p><i>Hierarchy of the modules:</i> Now we have the theora_hardware that will have the reconrefframe and the video controller. It was necessary to do some adaptations (theora_apb.vhd, theora_amba_interface.vhd, theora_hardware.vhd ...). Here you can download all these modules: <a href="theora_apb.tar">theora_apb.tar</a></p>
\r
478 <p><i>Pins of Lancelot: </i> You will need to connect all pins of lancelot on to leon system. My new leon3mp is <a href="leon3mp.vhd">leon3mp.vhd</a>, and my file of connections: <a href="leon3mp.qsf">leon3mp.qsf</a></p>
\r
480 <h4>Memory Problems</h4>
\r
481 At first time, this system (LEON+ Theora_Hardware + Video Controller) was using a lot of internal memory, I was not getting to join it on my FPGA. There was some bugs on video controller and that size of buffer was not necessary, but if changed the size (to a video of 96x80), it was not running. These bugs are solved, but I just decode a video of 96x80 resolution. It will be futurely solved when the external memory is implemented.
\r
483 <h4>Cross-clock domain</h4>
\r
485 I had some difficult in to plug it on Leon, because of hardware constrains. The clock frequency used by video controller is of 25 MHz, but the frequency of Leon system is of 50 MHz. It was not just to put a simples clock divider, because on the synthesis a had problems of cross-clock domain at time analysis. The video controller (25 MHz) need to receive data's from a module of 50 MHz. It was generating a clock skew problems. The solution was simples, I needed to change some parameters on PLL of Leon system, the PLL (phase-locked loop) is basically a closed loop frequency control system that generate the clocks of Leon and sdram with the phase adjusted, I needed to include a new clock there with the correct parameters. Like this on /grlib/lib/techmap/clocks/clkgen_altera_mf.vhd <br>
\r
486 <a href="clkgen_altera_mf.vhd">clkgen_altera_mf.vhd</a>
\r
488 <h4>A band of 8 pixel green below of video</h4>
\r
490 The dump_video includes a band of 8 pixel green below of video. If run a video of 96x72, I will have a video of 96x80. Something like:<br>
\r
491 Ogg logical stream 583c6ca0 is Theora 96x80 29.97 fps video<br>
\r
492 Encoded frame content is 96x72 with 0x0 offset<br>
\r
494 Theora encodes the frame in whole 16x16 macro blocks, so both the width
\r
495 and height must be a multiple of 16. When the actual video content is
\r
496 not a multiple of 16, it is expanded to one and a clipping rectangle is
\r
497 stored in the header (that's the "Encoded frame content..." message).
\r
498 dump_video does not crop the output down to the actual size of this
\r
499 rectangle, but outputs the entire expanded frame. The encoder by default
\r
500 stores zeros in this part of the frame, so that's why it looks green.
\r
503 <h4>ffmpeg2theora</h4>
\r
505 <a href="http://ffmpeg.mplayerhq.hu/">Here</a> you can find the ffmpeg2theora software that you can change the resolution, the start and end point, and more some things very usefull that you certainly will need to do to tests some videos.
\r
507 <h4>The current point</h4>
\r
508 <p> I did a demonstration of this integration until the video controller and it is on youtube.
\r
509 <a href="http://www.youtube.com/watch?v=tZSsz1b28rA">Click here to see the video</a><br>
\r
510 <center><object width="425" height="350"> <param name="movie" value="http://www.youtube.com/v/tZSsz1b28rA"> </param> <embed src=" http://www.youtube.com/v/tZSsz1b28rA" type="application/x-shockwave-flash" width="425" height="350"> </embed> </object></center>
\r
512 This video shows the sequence:<br>
\r
514 - Programmer the board (Stratix II)with LEON3 (by USB Blaster - jtag);<br><br>
\r
515 - Load the linux image on LEON3 (using grmon, by USB Blaster - jtag). The Unknown device is the Theora Hardware, it is not show the name "Theora Hardware" because I am using a evaluation version of Grmon;<br><br>
\r
516 - Open the kermit interface and set the configuration (by Serial Interface);<br><br>
\r
517 - Run the linux kernel (using grmon, by USB Blaster - jtag);<br><br>
\r
518 - Come back to kermit and you will see a konsole of Linux (by Serial Interface). Here you can the "LEON Theora Driver" that is recognize by the linux;<br><br>
\r
519 -Now, I go to home path and run the dump_video (./dump_video ronaldinho9680.ogg);<br>
\r
521 My current FPGA programmer file: <a href="leon3mp.sof">leon3mp.sof</a><br>
\r
522 My current LINUX Kernel images (with theora driver and dump_video included and complied): <a href="image.dsu">image.dsu</a><br>
\r
525 There are basically two problems:</p>
\r
527 On NIOS, a video was running very slow, almost 7 times. On my LEON system it is still slow, but just 5 times, then the perfomance is a little better then NIOS. The last days I was debugging the flow in order to discovery what I can to do to increase the speed.
\r
528 The time of APB/AMBA bus is OK. I did some measures and the time that it is spending to decode using a <a href="http://www.students.ic.unicamp.br/~ra031198/theora_hardware/7_linux_libt_hardware.png">old pipeline</a> is just 1/2 of time required, it is the time to the software to decode the first part, to send to the hardware and to the software read the output. A video of 15 second is decoded in 7 seconds. But, If I plug the video controller, it is taking 75 seconds (5 times). I am trying to fix this problem.<br>
\r
529 There is still other problem, the image is good, but there is some little purple points on image. Leonardo said to me that he is working on this problem.<br>
\r
530 The size is little because the buffer multiplexed with a external memory (SRAM) was not implemented, then we just have to user the few blocks of internal memory of FPGA.<br><br>
\r
532 Although this problems, I think the most important is that now we have a complete theora decoding on FPGA and with no NIOS or any module proprietary. Putting a .ogg video on linux a seeing a video on monitor.
\r
536 fazer tar da libtheora<br>
\r
537 fazer tar da projeto do quartus AINDA FALTA TESTAR<br>
\r
539 colocar quartus archive
\r
545 <h2>10 - Memory Controller for our Memory Muliplexed</h2>
\r
546 <p align="center"><img src="10_memory_controller.png" width="423" height="168"></p>
\r
547 <p align="center">figure 10</p>
\r
548 <p>NOT IMPLEMENTED</p>
\r
550 <h2>11 - Full Integration</h2>
\r
551 <p align="center"><img src="11_integration_full.png"></p>
\r
552 <p align="center">figure 11</p>
\r
554 <h4>Final consideration</h4>
\r
555 <p>[to complete]</p>
\r
556 <h4>Timing analysis </h4>
\r
557 <p>[to complete]</p>
\r