add: performance values for Lattice MachXO2
[zpu.git] / zpu / docs / zpu_arch.html
blob62acdfa9bd111ae227d4d6bf9fe7194954b0611e
1 <html>
2 <body>
3 <h1>This Document</h1>
4 This is a snapshot of the zpu/zpu/docs/zpu_arch.html document in CVS.
5 <p>
6 Several of the links will only work if you have checked out the zpu/zpu tree from opencores CVS. See <a href="#download">Download</a> below.
7 <h1>Index</h1>
8 <ul>
9 <li> <a href="#introduction">Introduction</a>
10 <ul>
11 <li> <a href="#license">License</a>
12 <li> <a href="#features">Features</a>
13 <li> <a href="#status">Status</a>
14 <li> <a href="#download">Download</a>
15 <li> <a href="#patch">Creating a patch</a>
16 <li> <a href="#mailinglist">Getting help - mailing list</a>
17 </ul>
18 <li> <a href="#architecture">Core Architecture</a>
19 <ul>
20 <li> <a href="#instructionset">Instruction set</a>
21 <li> <a href="#interrupts">Interrupts</a>
22 <li> <a href="#startup">Startup code (aka crt0.s)</a>
23 <li> <a href="#vectors">Jump vectors</a>
24 </ul>
25 <li> <a href="#implementations">Core Implementations</a>
26 <ul>
27 <li> <a href="#performance">Performance Summary</a>
28 <li> <a href="#zpu4_small">zpu4 small</a>
29 <li> <a href="#zpu4_medium">zpu4 medium</a>
30 <li> <a href="#alzpu_pipe">alzpu pipelined</a>
31 <li> <a href="#zealot">Zealot medium and small</a>
32 <li> <a href="#zy2000">ZY2000 SOC</a>
33 <li> <a href="#verilogwip">Un-named verilog translation</a>
34 <li> <a href="#implementing">Implementing your own ZPU</a>
35 </ul>
36 <li> <a href="#refdesign">Reference Designs</a>
37 <ul>
38 <li> <a href="#ref_min">SOC - Minimal (core+RAM)</a>
39 <li> <a href="#ref_basic">SOC - Basic (core+RAM+UART)</a>
40 <li> <a href="#ref_soc">SOC - Board (core+RAM+Wishbone+++)</a>
41 <li> <a href="#rams">Common - RAM models</a>
42 <li> <a href="#wishbone">Common - Wishbone</a>
43 <li> <a href="#uart">Common - UART</a>
44 <li> <a href="#spicontroller">Common - SPI flash controller</a>
45 </ul>
46 <li> <a href="#tools">Working with tools and core</a>
47 <ul>
48 <li> <a href="#setuplinux">Setup - Linux toolchain</a>
49 <li> <a href="#setupcygwin">Setup - Cygwin toolchain</a>
50 <li> <a href="#gcc2ram">GCC to RAM</a>
51 <li> <a href="#hdlsim">HDL simulation (ZPU4)</a>
52 <li> <a href="#gdbsim">GDB simulation (ZPU4)</a>
53 <li> <a href="#simulator">Instruction Set Simulator</a>
54 </ul>
55 <li> <a href="#misc">Miscellaneous</a>
56 <ul>
57 <li> <a href="#tuning">Speeding up the ZPU</a>
58 <li> <a href="#codesize">Optimizing for code size</a>
59 <li> <a href="#ecos">Installing eCos build tools</a>
60 <li> <a href="#memorymap">Memory map</a>
61 </ul>
62 <li> <a href="#todo">TODO</a>
63 <ul>
64 <li> <a href="#todolist">TODO list</a>
65 <li> <a href="#repository">Repository Re-org</a>
66 <li> <a href="#nextgen">Next generation ZPU</a>
67 <li> <a href="#float">Floating point support</a>
68 </ul>
69 </ul>
71 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
73 <a name="introduction"/>
74 <h1>Introduction</h1>
75 <P>The worlds smallest 32 bit CPU with GCC toolchain.
76 <P>The ZPU is a small CPU in two ways: it takes up very little resources and
77 the architecture itself is small. The latter can be important when learning
78 about CPU architectures and implementing variations of the ZPU where
79 aspects of CPU design is examined. In academia students can learn VHDL,
80 CPU architecture in general and complete exercises in the course of a year.</P>
81 <P>
82 The current ZPU instruction set and architecture has not changed for
83 the last couple of years and can be considered quite stable. There is
84 a lot of discussion about various modifications to the ZPU architecture
85 in the zylin-zpu mailing list, but currently no actual modifications are
86 planned as the improvements that have been identified are relatively
87 slight(&lt;30% performance/size improvement).
88 </P>
89 <P>
90 There are a handful of implementations of the ZPU. Most of these usually
91 have some strong points and there is some movement in the direction of
92 consolidating improvements into a few officially recommended ZPU
93 implementations.
94 </P>
95 <P>
96 For those that are interested in the Zylin ZPU, I recommend joining
97 up on the zylin-zpu mailing list and participating in the discussion
98 there. The zylin-zpu is a friendly place where people of different
99 skills, hardware, software, tools meet to exchange ideas about the ZPU
100 and microprocessor architecture in general.
101 </P>
103 <P>Sincerely,</P>
104 <P>&Oslash;yvind Harboe <BR>Zylin AS
105 </P>
107 <a name="license"/>
108 <h2>License</h2>
109 <P>The project includes HDL, GCC toolchain and eCos HAL.
111 <P>The ZPU has a BSD license for the HDL and GPL for the rest.
112 This allows users to implement any version of the ZPU they want in
113 commercial products, but if improvements are done to the architecture
114 as such, then they need to be contributed back.
115 </P>
117 <P>Per Jan 1. 2008, Zylin has the Copyright for the ZPU, i.e. Zylin
118 is free to decide that the ZPU shall have a BSD license for HDL + GPL
119 for the rest.</P>
121 <a name="features"/>
122 <h2>Features</h2>
123 <UL>
124 <LI>Small size: (See <a href="#implementations">performance summary</a>)
125 <LI>Code size 80% of ARM Thumb
126 <LI>GCC toolchain(GDB, newlib, libstdc+)
127 <LI>eCos embedded operating system support
128 </UL>
130 <a name="status"/>
131 <h2>Status</h2>
132 <UL>
133 <LI>HDL works
134 <LI>GCC toolchain works
135 <LI>eCos HAL works
136 </UL>
137 <P>... but there is a long <a href="#todo">TODO</a> list</P>
138 <P>Expect churn as we converge onto a shorter list of <a href="#implementations">implementations</a>.
140 <a name="download"/>
141 <h2>Download source code</h2>
142 The ZPU HDL source code is available as a GIT repository from <a href="http://repo.or.cz/w/zpu.git" target="_blank">http://repo.or.cz/w/zpu.git</a>.
143 You can download the latest sourcecode as a snapshot without installing GIT.
145 Previously the ZPU repository was hosted as a CVS repository at www.opencores.org,
146 but that ZPU CVS repository is there only for historical reference at this point.
147 Once www.opencores.org grows a GIT hosting service, the plan is to replicate
148 the GIT repository there.
151 The GCC ZPU toolchain is available from <a href ="http://repo.or.cz/w/zpugcc.git" target ="_blank">http://repo.or.cz/w/zpugcc.git</a>. The ZPU GCC toolchain is BIG (over 100 MBytes).
152 <a name="patch"/>
153 <h2>GIT</h2>
154 For more advanced use of GIT, you will need to hit the books and read up
155 on the GIT documentation.
156 <p/>
157 That said, you can ask "silly" newbie questions about GIT on the <a href="#mailinglist">zylin-zpu mailing
158 list</a> and you should receive some friendly prodding in the right direction
159 w.r.t. finding reading material.
160 <a name="mailinglist"/>
161 <h2>Getting help - mailing list</h2>
162 <P>The place to get help is the <a href="http://www.zylin.com/mailinglist.html">zylin-zpu mailing list</a>
165 The ZPU is an open source project and if you demonstrate that you have
166 made an effort to read the documentation and googled, then you will
167 normally get some help from this list if you ask clear questions.
169 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
172 <a name="architecture"/>
173 <h1>Architecture</h1>
174 The ZPU is a zero operand, or stack based CPU. The opcodes have a fixed width of 8 bits.
176 Example:
178 <div style="white-space:pre;background-color:#dddddd;">
179 <code style="white-space:pre;background-color:#dddddd;">
180 IM 5 ; push 5 onto the stack
181 LOADSP 20 ; push value at memory location SP+20
182 ADD ; pop 2 values on the stack and push the result
183 </code>
184 </div>
185 As can be seen, a lot of information is packed into the 8 bits, e.g. the IM instruction pushes a 7 bit signed integer onto the stack.
187 The choice of opcodes is intimately tied to the GCC toolchain capabilities.
189 <div style="white-space:pre;background-color:#dddddd;">
190 <code style="white-space:pre;background-color:#dddddd;">
191 /* simple program showing some interesting qualities of the ZPU toolchain */
192 void bar(int);
193 int j;
194 void foo(int a, int b, int c)
196 a++;
197 b+=a;
198 j=c;
199 bar(b);
202 foo:
203 loadsp 4 ; a is at memory location SP+4
204 im 1
206 loadsp 12 ; b is now at memory location SP+12
208 loadsp 16 ; c is now at memory location SP+16
209 im 24 ; «j» is at absolute memory location 24.
210 ; Notice how the ZPU toolchain is using link-time relaxation
211 ; to squeeze the address into a single no-op
212 store
213 im 22 ; the fn bar is at address 22
214 call
215 im 12
216 return ; 12 bytes of arguments + return from fn
217 </code>
218 </div>
220 <a name="instructionset"/>
221 <h2>Instruction set</h2>
222 <p>A base set of instructions must be implemented in RTL, but the rest may be implemented as RTL or as microcode. This allows a tradeoff of core size vs code size and performance.
223 <p>The instructions that may be implemented in RTL or microcode are referred to as emulated instructions. The microcode is in crt0.s. The <a href="#implementations">implementation</a> determines which instructions run as microcode.
224 <p>All operations are 32 bit wide.
225 <p>TODO Is the table broken? Fix it.
227 <table border="1">
228 <tr><td>Name</td><td>Opcode</td><td>Description</td><td>Definition</td></tr>
229 <tr>
230 <td>
231 BREAKPOINT
232 </td>
233 <td>
234 00000000
235 </td>
236 <td>
237 The debugger sets a memory location to this value to set a breakpoint. Once a JTAG-like
238 debugger interface is added, it will be convenient to be able to distinguish
239 between a breakpoint and an illegal(possibly emulated) instruction.
240 </td>
241 <td>
242 No effect on registers
243 </td>
244 </tr>
245 <tr>
246 <td>
248 </td>
249 <td>
250 1xxx xxxx
251 </td>
252 <td>
253 Pushes 7 bit sign extended integer and sets the a «instruction decode interrupt mask» flag(IDIM).
254 <p>
255 If the IDIM flag is already set, this instruction shifts the value on the stack left by 7 bits and stores the 7 bit immediate value into the lower 7 bits.
256 <p>
257 Unless an instruction is listed as treating the IDIM flag specially, it should be assumed to clear the IDIM flag.
258 <p>
259 To push a 14 bit integer onto the stack, use two consecutive IM instructions.
260 <p>
261 If multiple immediate integers are to be pushed onto the stack, they must be interleaved with another instruction, typically NOP.
262 </td>
263 <td>
264 <code style="white-space:pre;">
265 pc <= pc + 1 <br>
266 idim <= 1 <br>
267 if (idim=0) then <br>
268 sp <= sp - 1; <br>
269 for i in wordSize-1 downto 7 loop <br>
270 mem(sp)(i) <= opcode(6) <br>
271 end loop <br>
272 mem(sp)(6 downto 0) <= opcode(6 downto 0) <br>
273 else <br>
274 mem(sp)(wordSize-1 downto 7) <= mem(sp)(wordSize-8 downto 0) <br>
275 mem(sp)(6 downto 0) <= opcode(6 downto 0) <br>
276 end if
277 </code>
279 </td>
280 </tr>
281 <tr>
282 <td>
283 STORESP
284 </td>
285 <td>
286 010x xxxx
287 </td>
288 <td>
289 Pop value off stack and store it in the SP+xxxxx*4 memory location, where xxxxx is a positive integer.
290 </td>
291 <td>
292 </td>
293 </tr>
294 <tr>
295 <td>
296 LOADSP
297 </td>
298 <td>
299 011x xxxx
300 </td>
301 <td>
302 Push value of memory location SP+xxxxx*4, where xxxxx is a positive integer, onto stack.
303 </td>
304 <td>
306 </td>
307 </tr>
308 <tr>
309 <td>
310 ADDSP
311 </td>
312 <td>
313 0001 xxxx
314 </td>
315 <td>
316 Add value of memory location SP+xxxx*4 to value on top of stack.
317 </td>
318 <td>
320 </td>
321 </tr>
322 <tr>
323 <td>
324 EMULATE
325 </td>
326 <td>
327 001x xxxx
328 </td>
329 <td>
330 Push PC to stack and set PC to 0x0+xxxxx*32. This is used to emulate opcodes. See
331 zpupgk.vhd for list of emulate opcode values used. zpu_core.vhd contains
332 reference implementations of these instructions rather than letting the ZPU execute the EMULATE instruction
334 One way to improve performance of the ZPU is to implement some of
335 the EMULATE instructions.
337 </td>
338 <td>
340 </td>
341 </tr>
342 <tr>
343 <td>
344 PUSHPC
345 </td>
346 <td>
347 emulated
348 </td>
349 <td>
350 Pushes program counter onto the stack.
351 </td>
352 <td>
354 </td>
355 </tr>
356 <tr>
357 <td>
358 POPPC
359 </td>
360 <td>
361 0000 0100
362 </td>
363 <td>
364 Pops address off stack and sets PC
365 </td>
366 <td>
368 </td>
369 </tr>
370 <tr>
371 <td>
372 LOAD
373 </td>
374 <td>
375 0000 1000
376 </td>
377 <td>
378 Pops address stored on stack and loads the value of that address onto stack.
380 Bit 0 and 1 of address are always treated as 0(i.e. ignored) by
381 the HDL implementations and C code is guaranteed by the programming
382 model never to use 32 bit LOAD on non-32 bit aligned addresses(i.e.
383 if a program does this, then it has a bug).
384 </td>
385 <td>
387 </td>
388 </tr>
389 <tr>
390 <td>
391 STORE
392 </td>
393 <td>
394 0000 1100
395 </td>
396 <td>
397 Pops address, then value from stack and stores the value into the memory location of the address.
399 Bit 0 and 1 of address are always treated as 0
400 </td>
401 <td>
403 </td>
404 </tr>
405 <tr>
406 <td>
407 PUSHSP
408 </td>
409 <td>
410 0000 0010
411 </td>
412 <td>
413 Pushes stack pointer.
414 </td>
415 <td>
417 </td>
418 </tr>
419 <tr>
420 <td>
421 POPSP
422 </td>
423 <td>
424 0000 1101
425 </td>
426 <td>
427 Pops value off top of stack and sets SP to that value. Used to allocate/deallocate space on stack for variables or when changing threads.
428 </td>
429 <td>
431 </td>
432 </tr>
433 <tr>
434 <td>
436 </td>
437 <td>
438 0000 0101
439 </td>
440 <td>
441 Pops two values on stack adds them and pushes the result
442 </td>
443 <td>
445 </td>
446 </tr>
447 <tr>
448 <td>
450 </td>
451 <td>
452 0000 0110
453 </td>
454 <td>
455 Pops two values off the stack and does a bitwise-and & pushes the result onto the stack
456 </td>
457 <td>
459 </td>
460 </tr>
461 <tr>
462 <td>
464 </td>
465 <td>
466 0000 0111
467 </td>
468 <td>
469 Pops two integers, does a bitwise or and pushes result
470 </td>
471 <td>
473 </td>
474 </tr>
475 <tr>
476 <td>
478 </td>
479 <td>
480 0000 1001
481 </td>
482 <td>
483 Bitwise inverse of value on stack
485 </td>
486 <td>
488 </td>
489 </tr>
490 <tr>
491 <td>
492 FLIP
493 </td>
494 <td>
495 0000 1010
496 </td>
497 <td>
498 Reverses the bit order of the value on the stack, i.e. abc->cba, 100->001, 110->011, etc.
500 The raison d'etre for this instruction is mainly to emulate other instructions.
501 </td>
502 <td>
504 </td>
505 </tr>
506 <tr>
507 <td>
509 </td>
510 <td>
511 0000 1011
512 </td>
513 <td>
514 No operation, clears IDIM flag as side effect, i.e. used between two
515 consecutive IM instructions to push two values onto the stack.
516 </td>
517 <td>
519 </td>
520 </tr>
521 <tr>
522 <td>
523 PUSHSPADD
524 </td>
525 <td>
527 </td>
528 <td>
529 a=sp; <br>
530 b=popIntStack()*4;<br>
531 pushIntStack(a+b);<br>
532 </td>
533 <td>
535 </td>
536 </tr>
538 <tr>
539 <td>
540 POPPCREL
541 </td>
542 <td>
544 </td>
545 <td>
546 setPc(popIntStack()+getPc());
547 </td>
548 <td>
550 </td>
551 </tr>
552 <tr>
553 <td>
555 </td>
556 <td>
558 </td>
559 <td>
560 int a=popIntStack();<br>
561 int b=popIntStack();<br>
562 pushIntStack(b-a);<br>
563 </td>
564 <td>
566 </td>
567 </tr>
568 <tr>
569 <td>
571 </td>
572 <td>
574 </td>
575 <td>
576 pushIntStack(popIntStack() ^ popIntStack());
577 </td>
578 <td>
580 </td>
581 </tr>
582 <tr>
583 <td>
584 LOADB
585 </td>
586 <td>
588 </td>
589 <td>
590 8 bit load instruction. Really only here for compatibility with
591 C programming model. Also it has a big impact on DMIPS test.
593 pushIntStack(cpuReadByte(popIntStack())&0xff);
594 </td>
595 <td>
597 </td>
598 </tr>
599 <tr>
600 <td>
601 STOREB
602 </td>
603 <td>
605 </td>
606 <td>
607 8 bit store instruction. Really only here for compatibility with
608 C programming model. Also it has a big impact on DMIPS test.
610 addr = popIntStack();<br>
611 val = popIntStack();<br>
612 cpuWriteByte(addr, val);
613 </td>
614 <td>
616 </td>
617 </tr>
618 <tr>
619 <td>
620 LOADH
621 </td>
622 <td>
624 </td>
625 <td>
627 16 bit load instruction. Really only here for compatibility with
628 C programming model.
631 pushIntStack(cpuReadWord(popIntStack()));
632 </td>
633 <td>
635 </td>
636 </tr>
637 <tr>
638 <td>
639 STOREH
640 </td>
641 <td>
643 </td>
644 <td>
645 16 bit store instruction. Really only here for compatibility with
646 C programming model.
648 addr = popIntStack();<br>
649 val = popIntStack();<br>
650 cpuWriteWord(addr, val);
651 </td>
652 <td>
654 </td>
655 </tr>
656 <tr>
657 <td>
658 LESSTHAN
659 </td>
660 <td>
662 </td>
663 <td>
664 Signed comparison<br>
665 a = popIntStack();<br>
666 b = popIntStack();<br>
667 pushIntStack((a < b) ? 1 : 0);<br>
668 </td>
669 <td>
671 </td>
672 </tr>
673 <tr>
674 <td>
675 LESSTHANOREQUAL
676 </td>
677 <td>
679 </td>
680 <td>
681 Signed comparison<br>
682 a = popIntStack();<br>
683 b = popIntStack();<br>
684 pushIntStack((a <= b) ? 1 : 0);
685 </td>
686 <td>
688 </td>
689 </tr>
690 <tr>
691 <td>
692 ULESSTHAN
693 </td>
694 <td>
696 </td>
697 <td>
698 Unsigned comparison<br>
699 long a;//long is here 64 bit signed integer<br>
700 long b;<br>
701 a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff<br>
702 b = ((long) popIntStack()) & INTMASK;<br>
703 pushIntStack((a < b) ? 1 : 0);
704 </td>
705 <td>
707 </td>
708 </tr>
709 <tr>
710 <td>
711 ULESSTHANOREQUAL
712 </td>
713 <td>
715 </td>
716 <td>
717 Unsigned comparison<br>
718 long a;//long is here 64 bit signed integer<br>
719 long b;<br>
720 a = ((long) popIntStack()) & INTMASK; // INTMASK is unsigned 0x00000000ffffffff<br>
721 b = ((long) popIntStack()) & INTMASK;<br>
722 pushIntStack((a <= b) ? 1 : 0);
723 </td>
724 <td>
726 </td>
727 </tr>
728 <tr>
729 <td>
730 EQBRANCH
731 </td>
732 <td>
734 </td>
735 <td>
736 int compare;<br>
737 int target;<br>
738 target = popIntStack() + pc;<br>
739 compare = popIntStack();<br>
740 if (compare == 0)<br>
741 {<br>
742 setPc(target);<br>
743 } else<br>
744 {<br>
745 setPc(pc + 1);<br>
747 </td>
748 <td>
750 </td>
751 </tr>
752 <tr>
753 <td>
754 NEQBRANCH
755 </td>
756 <td>
758 </td>
759 <td>
760 int compare;<br>
761 int target;<br>
762 target = popIntStack() + pc;<br>
763 compare = popIntStack();<br>
764 if (compare != 0)<br>
765 {<br>
766 setPc(target);<br>
767 } else<br>
768 {<br>
769 setPc(pc + 1);<br>
770 }<br>
771 </td>
772 <td>
774 </td>
775 </tr>
776 <tr>
777 <td>
778 MULT
779 </td>
780 <td>
782 </td>
783 <td>
784 Signed 32 bit multiply <br>
785 pushIntStack(popIntStack() * popIntStack());
786 </td>
787 <td>
789 </td>
790 </tr>
791 <tr>
792 <td>
794 </td>
795 <td>
797 </td>
798 <td>
799 Signed 32 bit integer divide.<br>
800 a = popIntStack();<br>
801 b = popIntStack();<br>
802 if (b == 0)<br>
803 {<br>
804 // undefined<br>
806 pushIntStack(a / b);<br>
807 </td>
808 <td>
810 </td>
811 </tr>
812 <tr>
813 <td>
815 </td>
816 <td>
818 </td>
819 <td>
820 Signed 32 bit integer modulo.<br>
821 a = popIntStack(); <br>
822 b = popIntStack();<br>
823 if (b == 0)<br>
824 {<br>
825 // undefined <br>
826 }<br>
827 pushIntStack(a % b); <br>
828 </td>
829 <td>
831 </td>
832 </tr>
833 <tr>
834 <td>
835 LSHIFTRIGHT
836 </td>
837 <td>
839 </td>
840 <td>
841 unsigned shift right.<br>
842 long shift;<br>
843 long valX;<br>
844 int t;<br>
845 shift = ((long) popIntStack()) & INTMASK;<br>
846 valX = ((long) popIntStack()) & INTMASK;<br>
847 t = (int) (valX >> (shift & 0x3f));<br>
848 pushIntStack(t);<br>
849 </td>
850 <td>
852 </td>
853 </tr>
854 <tr>
855 <td>
856 ASHIFTLEFT
857 </td>
858 <td>
860 </td>
861 <td>
862 arithmetic(signed) shift left.<br>
864 long shift;<br>
865 long valX;<br>
866 shift = ((long) popIntStack()) & INTMASK;<br>
867 valX = ((long) popIntStack()) & INTMASK;<br>
868 int t = (int) (valX << (shift & 0x3f));<br>
869 pushIntStack(t);<br>
870 </td>
871 <td>
873 </td>
874 </tr>
875 <tr>
876 <td>
877 ASHIFTRIGHT
878 </td>
879 <td>
881 </td>
882 <td>
883 arithmetic(signed) shift left.<br>
884 long shift;<br>
885 int valX;<br>
886 shift = ((long) popIntStack()) & INTMASK;<br>
887 valX = popIntStack();<br>
888 int t = valX >> (shift & 0x3f);<br>
889 pushIntStack(t);<br>
891 </td>
892 <td>
894 </td>
895 </tr>
897 <tr>
898 <td>
899 CALL
900 </td>
901 <td>
903 </td>
904 <td>
905 call procedure.<br>
906 <br>
907 int address = pop();<br>
908 push(pc + 1);<br>
909 setPc(address); <br>
910 </td>
911 <td>
913 </td>
914 </tr>
915 <tr>
916 <td>
917 CALLPCREL
918 </td>
919 <td>
921 </td>
922 <td>
923 call procedure pc relative<br>
924 <br>
925 int address = pop();<br>
926 push(pc + 1);<br>
927 setPc(address+pc); </td>
928 <td>
930 </td>
931 </tr>
934 <tr>
935 <td>
937 </td>
938 <td>
940 </td>
941 <td>
942 pushIntStack((popIntStack() == popIntStack()) ? 1 : 0); <td>
944 </td>
945 </tr>
946 <tr>
947 <td>
949 </td>
950 <td>
952 </td>
953 <td>
954 pushIntStack((popIntStack() != popIntStack()) ? 1 : 0); <td>
956 </td>
957 </tr>
958 <tr>
959 <td>
961 </td>
962 <td>
964 </td>
965 <td>
966 pushIntStack(-popIntStack());<td>
968 </td>
969 </tr>
972 </table>
974 <a name="interrupts"/>
975 <h2>Interrupts</h2>
976 The ZPU supports interrupts.
978 To trigger an interrupt, the interrupt signal must be asserted. The ZPU does
979 not define any interrupt disabling mechanism, this must be implemented by the
980 interrupt controller and controlled via memory mapped IO.
982 Interrupts are masked when the IDIM flag is set, i.e.
983 with consecutive IM instructions.
985 The ZPU has an edge triggered interrupt. As the ZPU notices that the interrupt
986 is asserted, it will execute the interrupt instruction. The interrupt signal
987 must stay asserted until the ZPU acknowledges it.
989 When the interrupt instruction is executed, the PC will be pushed onto the
990 stack and the PC will be set to the interrupt vector address (0x20).
992 Note that the GCC compiler requires three registers r0,r1,r2,r3 for some
993 rather uncommon operations. These 32 registers are mapped to memory locations 0x0,
994 0x4, 0x8, 0xc. The default interrupt vector at address 0x20 will load the
995 value of these memory locations onto the stack, call _zpu_interrupt and
996 restore them.
998 See <a href="../hdl/zpu4/test/interrupt/">zpu/hdl/zpu4/test/interrupt/</a> for C code and <a href ="../hdl/example/simzpu_interrupt.do">zpu/hdl/example/simzpu_interrupt.do</a>
999 for simulation example.
1001 <a name="startup"/>
1002 <h2>Custom startup code (aka crt0.s)</h2>
1003 To minimize the size of an application, one important trick is to
1004 strip down the startup code. The startup code contains microcode for emulation
1005 of instructions that may never be used by a particular application, or are made redundant because the instructions are implemented in RTL.
1007 The startup code is found in the GCC source code under gcc/libgloss/zpu,
1008 but to make the startup code more available, it has been duplicated
1009 into <a href="../sw/startup">zpu/sw/startup</a>
1011 On the <a href="#todo">TODO</a> list is work to make it easier to reduce code size.
1013 TODO is the following actually useful? if not remove or elaborate.
1015 To minimize startup size, see <a href="../roadshow/roadshow/codesize/">codesize</a>
1016 demo. This is pretty standard GCC stuff and simple enough once you've
1017 been over it a couple of times.
1020 <a name="vectors"/>
1021 <h3>Vectors</h3>
1022 <table border="1">
1023 <tr><td>Address</td><td>Name</td><td>Description</td></tr>
1024 <tr>
1025 <td>0x000</td>
1026 <td>Reset</td>
1027 <td>
1028 1.When the ZPU boots, this is the first instruction to be executed.
1029 <br>
1030 2.The stack pointer is initialised to maximum RAM address
1031 </td>
1032 </tr>
1033 <tr>
1034 <td>0x020</td>
1035 <td>Interrupt</td>
1036 <td>
1037 This is the entry point for interrupts.
1038 </td>
1039 </tr>
1040 <tr>
1041 <td>0x040-</td>
1042 <td>Emulated instructions</td>
1043 <td>
1044 Emulated opcode 34. Note that opcode 32 and opcode 33 are not normally used to emulate instructions as these memory addresses are already used by boot vector, GCC registers and the interrupt vector.
1045 </td>
1046 </tr>
1047 </table>
1049 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1051 <a name="implementations"/>
1052 <h1>Core Implementations</h1>
1053 zpu4 (superseding zpu3) are original work by &Oslash;yvind Harboe. All other implementations derive from zpu4.
1055 High on the <a href="#todo">TODO</a> list is to reduce the number of implementations taking the best from all. For example interrupts are not universally implemented, IO naming is inconsistent and memory architectures differ.
1057 Ultimately we should try to get closer to the opencores coding standard. You can find the document in the opencores cvsroot/common.
1059 For now if you are starting a design, zpu4 or zealot are probably the safest. zealot offers more customization through generics, but lacks interrupts. zpu4 gets more attention. Take your pick.
1061 <a name="performance"/>
1062 <h2>Performance Summary</h2>
1064 <a href="#todo">TODO</a> fill in performance table for Altera.
1066 Tests are done with the <a href="#zealot">Zealot</a>
1067 SoC-System and Xilinx ISE 12.2 with standard settings.
1068 For the MachXO2 device Lattice Diamond 3.1 with Synplify Pro I-2013.09L was used.
1070 <TABLE WIDTH=604 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7 CELLSPACING=0 STYLE="page-break-after: avoid">
1071 <TR VALIGN=TOP>
1072 <TD WIDTH=85> <P><B>CORE/Config</B></P> </TD>
1073 <TD WIDTH=85> <P><B>Spartan-3</B></P> </TD>
1074 <TD WIDTH=85> <P><B>Spartan-3E</B></P> </TD>
1075 <TD WIDTH=85> <P><B>Spartan-6</B></P> </TD>
1076 <TD WIDTH=85> <P><B>Virtex-5</B></P> </TD>
1077 <TD WIDTH=85> <P><B>MachXO2</B></P> </TD>
1078 <TD WIDTH=85> <P><B>DMIPS</B></P> </TD>
1079 </TR>
1081 <TR VALIGN=TOP>
1082 <TD WIDTH=85> <P>
1083 zpu4 small
1084 maxAddrBit=16
1085 </P> </TD>
1086 <TD WIDTH=85> <PRE>
1087 <!-- Spartan-3 -->
1088 591 LUT
1089 389 REG
1090 0 MULT18x18
1091 16 BRAM
1092 90 fmax
1093 </PRE> </TD>
1094 <TD WIDTH=85> <PRE>
1095 <!-- Spartan-3E -->
1096 626 LUT
1097 389 REG
1098 0 MULT18x18
1099 16 BRAM
1100 100 fmax
1101 </PRE> </TD>
1102 <TD WIDTH=85> <PRE>
1103 <!-- Spartan-6 -->
1104 639 LUT
1105 372 REG
1106 0 MULT18x18
1107 16 BRAM
1108 100 fmax
1109 </PRE> </TD>
1110 <TD WIDTH=85> <PRE>
1111 <!-- Virtex-5 -->
1112 561 LUT
1113 391 REG
1114 0 MULT18x18
1115 8 BRAM (RAMB36)
1116 175 fmax
1117 </PRE> </TD>
1118 <TD WIDTH=85> <PRE>
1119 <!-- MachXO2 -->
1120 886 LUT4
1121 459 REG
1123 4 EBR
1124 75 fmax
1125 </PRE> </TD>
1126 <TD WIDTH=85> <!-- DMIPS --> <P>0.5</P> </TD>
1127 </TR>
1129 <TR VALIGN=TOP> <TD WIDTH=85> <P>zpu4 medium</P> </TD>
1130 <TD WIDTH=85> <PRE>
1131 <!-- Spartan-3 -->
1132 1760 LUT
1133 514 REG
1134 3 MULT18x18
1135 16 BRAM (RAMB16)
1136 75 fmax
1137 </PRE> </TD>
1138 <TD WIDTH=85> <PRE>
1139 <!-- Spartan-3E -->
1140 1754 LUT
1141 509 REG
1142 3 MULT18x18
1143 16 BRAM (RAMB16)
1144 75 fmax
1145 </PRE> </TD>
1146 <TD WIDTH=85> <PRE>
1147 <!-- Spartan-6 -->
1148 1162 LUT
1149 481 REG
1150 3 MULT (DSP48A1)
1151 16 BRAM (RAMB16)
1152 80 fmax
1153 </PRE> </TD>
1154 <TD WIDTH=85> <PRE>
1155 <!-- Virtex-5 -->
1156 1299 LUT
1157 490 REG
1158 3 MULT (DSP48E)
1159 8 BRAM (RAMB36)
1160 125 fmax
1161 </PRE> </TD>
1162 <TD WIDTH=85> <PRE>
1163 <!-- MachXO2 -->
1164 2429 LUT4
1165 755 REG
1167 4 EBR
1168 65 fmax
1169 </PRE> </TD>
1170 <TD WIDTH=85><!-- DMIPS --><P>2.6</P> </TD>
1171 </TR>
1173 </TABLE>
1175 <a name="zpu4_small"/>
1176 <h2>zpu4 small</h2>
1177 Found in <a href="../hdl/zpu4/core/zpu_core_small.vhd">zpu/zpu/hdl/zpu4/core/zpu_core_small.vhd</a>
1179 The small ZPU4 implements the minimum instruction set. It is optimized for size and simplicity
1180 serving as a reference in both regards.
1182 It uses a RAM (dual port RAM w/read/write to both ports) as data & code storage and
1183 is implemented as a simple state machine.
1185 Essentially it has three states:
1186 <ol>
1187 <li>Fetch - starts fetch of next instruction
1188 <li>FetchNext - sets up operands for execute cycle
1189 <li>Decode - decodes instruction
1190 <li>Execute - well.. executes instruction
1191 </ol>
1192 The tricky bit is that there is a tiny bit of interleaving of
1193 states since the BRAM takes a cycle to perform a fetch/store. The above is the
1194 normal states the ZPU cycles through unless memory fetch, jumps, etc. take
1195 place.
1197 <a name="zpu4_medium"/>
1198 <h2>zpu4 medium</h2>
1199 Found in <a href="../hdl/zpu4/core/zpu_core.vhd">zpu/zpu/hdl/zpu4/core/zpu_core.vhd</a>
1201 The medium ZPU4 has a single port memory interface. All data, code and IO is
1202 accessed through this memory interface.
1204 It performs better(despite having less memory bandwidth than zpu_core_small.vhd)
1205 since it implements many more instructions.
1207 <a name="alzpu_pipe"/>
1208 <h2>Alvaro's pipelined ZPU</h2>
1209 All the rave in the mailing list. TBA.
1211 <a name="zealot"/>
1212 <h2>Zealot</h2>
1213 Small found in <a href="../hdl/zealot/zpu_small.vhdl">zpu/zpu/hdl/zealot/zpu_small.vhdl</a>
1215 Medium found in <a href="../hdl/zealot/zpu_medium.vhdl">zpu/zpu/hdl/zealot/zpu_medium.vhdl</a>
1217 README found in <a href="../hdl/zealot/0README.txt">zpu/zpu/hdl/zealot/0README.txt</a>
1219 The Zealot version of ZPU was contributed by Salvador E. Tropea.
1221 The key features are:
1224 <ul>
1225 <li>Includes a very basic <a href="#memorymap">PHI I/O</a> synthesizable core.
1226 It implements the 64 bits clocks counter (timer), GPIO and the UART. This is enough
1227 to run the DMIPS benchmark and a hello world application. I tested the UART
1228 @ 9600 bps and @ 115200 bps.</li>
1229 <li>The ZPU can be customized using generics. It allows the use of more
1230 than one core in the same project without problems.</li>
1231 <li>Implements the lshiftright instruction in hardware, this gives around
1232 10% boost in the DMIPS benchmark (Medium version).</li>
1233 <li>You can disable various instructions groups and let them to the
1234 emulation soft, so you can experiment with various LUTs vs DMIPS
1235 configurations (Medium version).</li>
1236 <li>The medium version provides aprox. 2.6 DMIPS @ 50 MHz and the small
1237 0.5 DMIPS @ 50 MHz.</li>
1238 <li>Enhanced trace module, it includes the assembler for the executed
1239 instruction and can also measure how much stack was consumed during the
1240 execution.</li>
1241 <li>Includes ready to use memory images for a hello world program and the
1242 DMIPS benchmark.</li>
1243 <li>Memory and trace blocks outside ZPU. This provides better modularity.</li>
1244 <li>Much better documented code than the original version.</li>
1245 </ul>
1247 Simulation and implementation files are provided. You need 16 kB of BRAMs
1248 for the "hello world" example and 32 kB for the DMIPS benchmark. The medium
1249 version takes around 1030 slices and 3 multipliers and the small version
1250 around 430 slices.<p>
1252 The generics for the Zealot Medium ZPU are:<p>
1254 <ul>
1255 <li><b>WORD_SIZE</b> (integer:=32) Data width, only 32 bits are really
1256 tested/supported. Adding support for 16 bits should be simple, but the
1257 toolchain needs to support it.</li>
1258 <li><b>ADDR_W</b> (integer:=16) Address bus width memory+I/O space. The MSB
1259 selects the address space (1=I/O).</li>
1260 <li><b>MEM_W</b> (integer:=15) Memory address bus width. It includes program,
1261 data and stack sections.</li>
1262 <li><b>D_CARE_VAL</b> (std_logic:='X') Value used to fill the unsused bits.
1263 For simulations this should be '0', for synthesis this is a value that your
1264 tools interprets as "don't care". Xilinx tools could get benefit from using
1265 'X'. This is particularly true to assign default values and for unreached
1266 cases. Note that I didn't find it useful.</li>
1267 <li><b>MULT_PIPE</b> (boolean:=false) Enables the multiplication pipeline.
1268 This can allow faster clocks but will make the mult instruction slower (more
1269 clocks consumed).</li>
1270 <li><b>BINOP_PIPE</b> (integer range 0 to 2:=0) Enables the pipeline for
1271 the -, =, &lt; and &lt;= operations. This can allow faster clocks but will
1272 make these instruction slower (more clocks consumed). This value is the
1273 amount of extra clocks added.</li>
1274 <li><b>ENA_LEVEL0</b> (boolean:=true) Enables the hardware implementation of
1275 eq, neqbranch, loadb and pushspadd instructions.</li>
1276 <li><b>ENA_LEVEL1</b> (boolean:=true) Enables the hardware implementation of
1277 lessthan, ulessthan, mult, storeb, callpcrel and sub instructions.</li>
1278 <li><b>ENA_LEVEL2</b> (boolean:=false) Enables the hardware implementation of
1279 lessthanorequal, ulessthanorequal, call and poppcrel instructions.</li>
1280 <li><b>ENA_LSHR</b> (boolean:=true) Enables the hardware implementation of
1281 lshiftright instruction.</li>
1282 <li><b>ENA_IDLE</b> (boolean:=false) Enables the enable_i usage. This signal
1283 can hold the CPU in an idle state if after reset this signal remains active.
1284 When disabled the enable_i signal isn't used and the idle state is removed.</li>
1285 <li><b>FAST_FETCH</b> (boolean:=true) This version of the ZPU fetches 4
1286 instructions at ones (32 bits), then they are decoded (2 cycles) and finally
1287 executed. The decoded instructions are stored in a "decode cache", the first
1288 instruction is immediately moved to the "current instruction" register and a
1289 "special instruction" replaces the first slot. This "special instruction"
1290 makes the CPU go to the fetch state. When you enable this generic the FSM
1291 does the fetch instead of waiting one clock cycle to go to the fetch state.
1292 This makes instructions run a little bit faster, but it can cost area and/or
1293 frequency.</li>
1294 </ul>
1297 <a name="zy2000"/>
1298 <h2>ZY2000</h2>
1299 Found in <a href="../hdl/zy2000/zpu_core.vhd">zpu/zpu/hdl/zy2000/zpu_core.vhd</a>
1300 Modified version of zpu4 medium for use with a wishbone bridge.
1302 The ZY2000 is a complete implementation including: ZPU, DRAM, soft-MAC, wishbone bridges, GPIO subsystem, etc. This also included an eCos HAL w/TCP/IP support.
1304 <a name="verilogwip"/>
1305 <h2>Verilog translation</h2>
1306 Found in <a href="../../wip/ZPU_CORE/src/zpu_core.v">zpu/wip/ZPU_CORE/src/zpu_core.v</a>
1308 The verilog version of ZPU (zpu4) was contributed by Jurij Kostasenko. No-one appears to be maintaining it, but it should be a useful starting point for further work. There are some useful scripts there.
1310 <a name="implementing"/>
1311 <h2>Implementing your own ZPU</h2>
1312 One of the neat things about the ZPU is that the instruction set and architecture
1313 is very small and it is easy to implement a ZPU from scratch or modify the
1314 existing ZPU implementations.
1316 Implementing a ZPU can be done without understanding the toolchain in
1317 detail, i.e. using exclusively HDL skills and only a rudimentary
1318 understanding of standard GCC/GDB usage is sufficient.
1320 A few tips:
1321 <ul>
1322 <li>Run zpu_core.vhd or zpu_core_small.vhd and generate an instruction trace
1323 from ModelSim or similar. To check that you own implementation is correctly
1324 implemented, verify that the instruction trace for the new and old
1325 ZPU implementations match. This gives you a simple way to do regression
1326 tests as you develop your ZPU.
1327 <li>To improve performance, you can add more instructions. The EMULATE instructions
1328 are optional in HDL since they will be emulated in software if they are not
1329 implemented in HDL. This allows you to run the ZPU executables unmodified
1330 regardless of which EMULATE instructions you implement.
1331 <li>Run the DMIPS test to measure your overall performance
1332 <li>Run the histogram.perl script on the instruction trace to generate
1333 histograms of the instructions. Profiling is essential to making
1334 the right choices w.r.t. optimization for your application.
1335 </ul>
1337 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1340 <a name="refdesign"/>
1341 <h1>Reference Designs</h1>
1342 The zpu core is independent of IO and memory architecture. Here are three levels of reference designs a user can refer to in order to get started in their own design, regardless of chosen core.
1344 TODO converge on a single IO structure for core implementations.
1346 TODO re-org CVS to make it easy to keep appropriate SW, RTL(verilog and VHDL) , scripts, verification stuff together.
1349 <a name="ref_min"/>
1350 <h2>Minimal (core+RAM)</h2>
1351 The minimum design is a zpu core with true dual port RAMs attached. This is handy for size/fmax trial in a particular FPGA, and maybe HDL regression. Maybe not a very useful starting point, unless you can DMA all you IO.
1353 TODO provide FPGA scripts.
1355 TODO provide HDL regression environment.
1357 <a name="ref_basic"/>
1358 <h2>Basic (core+RAM+UART+Timer)</h2>
1359 The minimum design required for hello_world and DMIPS applications. Requires more RAM and a UART (or something) for stdio. This is handy as a starting point for a new users design, and to run DMIPS evaluation, and maybe HDL regression.
1361 TODO provide FPGA scripts.
1363 TODO provide HDL regression environment.
1365 <a name="ref_soc"/>
1366 <h2>SOC (core+RAM+Wishbone+++)</h2>
1367 Large design(s) for one or more chosen eval board. Features dictated by board and available IP.
1369 <a name="rams"/>
1370 <h2>Common - RAM models</h2>
1371 single (1RW), simple dual(1R+1W), true dual(1RW+1RW), and xilinx distributed dual(1RW+1R) RAM models. Parameterized depth / width, and loadable from file. The goal is that ROM be independent of verilog/VHDL implementation of RAM.
1373 TODO RAM model contribution needed. What is in opencore/common is not adequate.
1375 <a name="wishbone"/>
1376 <h2>Common - Wishbone</h2>
1377 In <a href="../hdl/wishbone" target="_blank">hdl/wishbone</a> there is an implementation
1378 of a wishbone bridge. It was designed to work with <a href="#zy2000">ZY2000</a>
1380 TODO make wishbone bridge re-usable with all cores
1382 <a name="uart"/>
1383 <h2>Common - UART</h2>
1385 All self respecting embedded projects should have a debug channel
1386 to print stuff to. Typically this is a standard RS232 or UART, but
1387 it can also be something more exotic like a DCC JTAG channel.
1389 The point is that characters(bytes) are sent to/from the ZPU
1390 via some terminal.
1392 The ZPU defines in the memory map a UART / debug channel. This
1393 should be implemented by some suitable debug channel for
1394 the device in which the ZPU is implemented.
1396 www.opencores.org has several UART implementations. This is one
1397 of the simpler ones:
1399 <a href="http://www.opencores.org/projects.cgi/web/uart/overview">
1400 http://www.opencores.org/projects.cgi/web/uart/overview</a>
1401 <h3>Implementing your own UART / debug channel</h3>
1402 The first thing you need to do is to choose a debug channel for your
1403 hardware. This could be a UART, but it doesn't have to be.
1405 Secondly you should write a small HDL module that interface between
1406 the ZPU memory map of debug channel to the UART. This should
1407 be relatively simple as all you need to do is to let the ZPU
1408 query the FIFO in/out for busy flag and allow the ZPU to read/write
1409 data to the UART via the memory map.
1412 TODO explicit example with UART from opencores in the above ref designs.
1414 <!-- SPI controller -->
1415 <a name="spicontroller">
1416 <h2>SPI flash controller (read-only)</h2>
1417 This is a simple read-only SPI flash controller, with the following characteristics:
1419 <dl>
1420 <li>Fast-READ only implementation.
1421 <li>32-bit only access
1422 <li>Fast sequential read access - Uses low-clock approach</li>
1423 </dl>
1425 <h3>Version</h3>
1426 The current version is 1.2. This is also the first public version available.
1428 <h3>Timing overview</h3>
1430 <p>Simple timing overview, with one nonsequential access to address 0x0, followed by a sequential access to address 0x4.
1431 This simulation was done with Xilinx tools, after post-routing, and using a ZPU to access the SPI</p>
1432 <div>
1433 <img src="images/spi_timing_overview.png">
1434 </a>
1435 <p>Image 1: Timing overview</p>
1436 </div>
1438 On Image 2, you can see the clock almost perfectly centered on data, when we write to the SPI flash.
1440 <div>
1441 <img src="images/spi_readfast_timing.png">
1442 <p>Image 2: Issuing commands to the SPI</p>
1443 </div>
1445 As you can see from Image 3, I assume the worst-case read delay from SPI (which is 15ns, as you can see from the marker).
1447 <div>
1448 <img src="images/spi_read_timing.png">
1449 <p>Image 3: Reading from the SPI</p>
1450 </div>
1452 <h3>Usage</h3>
1454 Simple description of SPI controller interface:
1456 <table border="1">
1457 <tr>
1458 <th>Symbol</th>
1459 <th>Direction</th>
1460 <th>Bit width</th>
1461 <th>Purpose</th>
1462 </tr>
1463 <tr><td>adr</td><td>Input</td><td>24</td><td>Address where to read from SPI</td></tr>
1464 <tr><td>dat_o</td><td>Output</td><td>32</td><td>Data read from SPI</td></tr>
1465 <tr><td>clk</td><td>Input</td><td>1</td><td>Input clock. Used for both interface and SPI</td></tr>
1466 <tr><td>ce</td><td>Input</td><td>1</td><td>Chip Enable</td></tr>
1467 <tr><td>rst</td><td>Input</td><td>1</td><td>Asynchronous reset</td></tr>
1468 <tr><td>ack</td><td>Output</td><td>1</td><td>Data valid ACK</td></tr>
1469 <tr><td>SPI_CLK</td><td>Output</td><td>1</td><td>SPI output clock</td></tr>
1470 <tr><td>SPI_MOSI</td><td>Output</td><td>1</td><td>SPI output data from controller to chip</td></tr>
1471 <tr><td>SPI_MISO</td><td>Input</td><td>1</td><td>SPI input data from chip to controller</td></tr>
1472 <tr><td>SPI_SELN</td><td>Output</td><td>1</td><td>SPI nSEL (deselect, active low) signal</td></tr>
1473 </table>
1475 <h3>License</h3>
1476 The Verilog implementation is released under BSD license. See the file itself for more licensing details.
1478 <h3>Dowload</h3>
1479 Download the Verilog code here: <a href="/files/electronics/spi/spi_controller.v">spi_controller.v</a>
1481 <h3>Troubleshooting</h3>
1482 The current implementation is timed and optimized for myself. Your parameters might not be the same
1483 as those I defaulted, so read the code carefully. If you have any issue let me know.
1488 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1490 <a name="tools"/>
1491 <h1>Working with the tools and core</h1>
1492 TODO discussion of tools needed and choose some to be supported by project. Need to deal with cygwin vs linux, VHDL vs verilog, open vs closed.... plus language support in simulators is sometimes lacking.
1494 Xilinx ISE webpack is available for windows and linux
1495 <br>
1496 Altera Quartus web edition is windows only.
1497 <br>
1498 Lattice ispLEVER starter edition is windows only.
1500 None appear to come with a standalone simulator anymore. Not sure if any built in simulators are worth looking at... never have been in the past.
1503 Popular Simulation tools for this kind of project: Modelsim, GHDL, veriwell, cver, icarus, gtkwave... others?
1506 <a name="setuplinux"/>
1507 <h2>Setup - Linux toolchain</h2>
1508 You will need Java installed to run the simulator and some other stuff.
1510 TODO setup.sh script needs to detect linux/cygwin, and should have install path option.
1511 <pre>
1512 $ cd zpu/zpu/sw # path as appropriate
1513 $ sh setup.sh # untars the tool chain to ... TODO
1514 $ . env.sh # puts the tools in you path
1515 </pre>
1517 <a name="setupcygwin"/>
1518 <h2>Setup - Cygwin toolchain</h2>
1519 Install <a href="http://www.cygwin.com">Cygwin</a>
1520 You will need Java installed to run the simulator and some other stuff.
1521 <pre>
1522 $ cd zpu/zpu/sw # path as appropriate
1523 $ sh setup.sh # unzips the tool chain to /tmp/zpu/install/bin
1524 $ . env.sh # puts the tools in you path
1525 </pre>
1527 <a name="gcc2ram"/>
1528 <h2>GCC to RAM</h2>
1529 TODO some of this is generic, some is zpu4 specific. Should move to refdesign section when ref designs exist.
1531 The instructions are stored big endian. That is the first instruction is stored in the most significant byte, and the forth is in the least significant byte.
1533 <h3>Generating VHDL BRAM initialization </h3>
1534 <pre>
1535 $ zpu-elf-objcopy -O binary hello.elf hello.bin
1536 $ java -classpath ../simulator/zpusim.jar com.zylin.zpu.simulator.tools.MakeRam hello.bin &gt;hello.bram
1537 </pre>
1538 <h3>Build another test application for example simulation</h3>
1539 Here is how to build a rom image for an application using the
1540 zpu/example simulation files.
1541 <pre>
1542 $ cd zpu/roadshow/roadshow/dhrystone
1543 $ sh build.sh
1544 $ cd zpu/hdl/example
1545 $ gcc zpuromgen.c
1546 $ ./a
1547 Usage: ./a binary_file
1548 $ ./a ../../roadshow/roadshow/dhrystone/dhrystone.bin >app.txt
1549 </pre>
1550 Copy and paste app.txt into helloworld.vhd.
1553 TODO need to merge following with above.
1556 The ZPU comes with a standard GCC toolchain and an instruction set simulator. This allows compiling, running & debugging simple test programs. The Simulator has
1557 some very basic peripherals defined: counter, timer interrupt and a debug output port.
1559 <h3>Hello world example</h3>
1560 The ZPU toolchain comes with newlib & libstdc++ support which means that many C/C++ programs can be compiled without modification.
1561 <p>
1562 <pre>
1563 $ cd zpu/sw/helloworld
1564 $ zpu-elf-gcc -Os -phi hello.c -o hello.elf -Wl,--relax -Wl,--gc-sections
1565 or ? TODO which one
1566 $ zpu-elf-gcc -phi hello.c -o hello.elf
1567 $ zpu-elf-size hello.elf
1568 </pre>
1571 <a name="hdlsim"/>
1572 <h2>HDL simulation (ZPU4)</h2>
1573 TODO some of this is generic, some is zpu4 specific. Should move to refdesign section when ref design exists.
1575 For new users you will also find scripts in the zealot area that may be useful.
1577 You'll find a working simulation script in hdl/example/simzpu_small.do and hdl/example_medium/simzpu_medium.do, which
1578 show simulation of the small(zpu_core_small.vhd) and medium sized ZPU(zpu_core.vhd). hdl/example/simzpu_interrupt.do
1579 shows use of interrupts.
1581 When implementing the ZPU, copy the following files and modify them to your needs:
1582 <ol>
1583 <li>hdl/example/zpu_config.vhd - set up RAM size here
1584 <li>hdl/example/helloworld.vhd - dual port BRAM implementation.
1585 </ol>
1586 Obviously you must also connect the ZPU to the rest of your IO subsystem. IO is memory mapped(read/write) in the ZPU.
1588 <h3>Running example simulation</h3>
1589 The hdl/example directory has a simulation written for Xilinx WebPack ModelSim. From the ModelSim command prompt:
1590 <ol>
1591 <li>cd c:/&lt;installfolder&gt;/hdl/example
1592 <li>do zpusim_small.do
1593 </ol>
1595 After running the hello world simulation (see zpusim.do), two files are written to the hdl/example directory:
1596 <ol>
1597 <li>log.txt - contains the "Hello world!" text written to the debug channel/simplified UART.
1598 <li>trace.txt - a trace file for the CPU. The instruction set simulator has the capability of taking
1599 this file as input in order to verify that the HDL implementation matches the instruction set simulator.
1600 When a mismatch is found, the GDB debugger will break. Very handy for debugging custom ZPU implementations.
1601 </ol>
1604 <a name="gdbsim"/>
1605 <h2>GDB simulation</h2>
1606 <ol>
1607 <li>cd zpu/sw/helloworld
1608 <li>Launch the simulator from a seperate bash shell:<p>
1609 java -classpath ../simulator/zpusim.jar -Xmx512m com.zylin.zpu.simulator.Phi 4444
1611 <img src="images/zpusim.PNG" border=0>
1612 <li>Launch GDB:<p>
1613 ../install/bin/zpu-elf-gdb hello.elf
1614 <li>Connect to target, load and run application:<p>
1615 <pre>
1616 (gdb) target remote localhost:4444<br>
1617 (gdb) load<br>
1618 (gdb) continue<br>
1619 </pre>
1621 <img src="images/gccgdb.PNG">
1623 </ol>
1626 <a name="simulator"/>
1627 <h1>Simulator</h1>
1628 <P>The ZPU simulator is integrated into the Zylin Embedded CDT plugin
1629 to ease debugging of ZPU applications:</P>
1630 <P><A HREF="http://www.zylin.com/embeddedcdt.html">http://www.zylin.com/embeddedcdt.html</A></P>
1631 <P>The ZPU simulator has many features besides debugging an
1632 application:</P>
1633 <UL>
1634 <LI><P STYLE="margin-bottom: 0in">taking output from simulation(e.g.
1635 ModelSim) and matching that against the Java simulator, thus making
1636 it much easier to debug HDL implementations and also getting real
1637 world timing information
1638 </P>
1639 <LI><P STYLE="margin-bottom: 0in">can generate gprof output
1640 </P>
1641 <LI><P>generate various statistics
1642 </P>
1643 </UL>
1644 <P>The plugin is still pretty rough around the edges, and needs to
1645 get GUI support for enabling the ModelSim trace input feature.</P>
1646 <P ALIGN=CENTER><IMG SRC="images/compile.PNG" NAME="graphics7" ALIGN=BOTTOM WIDTH=669 HEIGHT=302 BORDER=0><BR><I>Compiling
1647 ZPU application</I></P>
1648 <P ALIGN=CENTER><IMG SRC="images/simulator.PNG" NAME="graphics9" ALIGN=BOTTOM WIDTH=722 HEIGHT=583 BORDER=0><BR><I>Setting
1649 up the simulator</I></P>
1650 <P ALIGN=CENTER><IMG SRC="images/simulator2.PNG" NAME="graphics11" ALIGN=BOTTOM WIDTH=722 HEIGHT=583 BORDER=0><BR><I>Choosing
1651 ZPU executable</I></P>
1652 <P ALIGN=CENTER STYLE="margin-bottom: 0in"><IMG SRC="images/simulator3.PNG" NAME="graphics13" ALIGN=BOTTOM WIDTH=1100 HEIGHT=720 BORDER=0><BR><I>Debug
1653 session</I></P>
1654 <P STYLE="margin-bottom: 0in"><BR>
1655 </P>
1658 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
1660 <a name="misc"/>
1661 <h1>Misc</h1>
1662 TODO Stuff that could probably find a better home.
1664 <a name="tuning"/>
1665 <h2>Speeding up the ZPU</h2>
1666 There are two aspects of speeding up the ZPU: making it perform better
1667 for a particular application and toying around with the ZPU architecture.
1668 <h3>Performance tips</h3>
1669 <ol>
1670 <li>Profile. Create a small sample and run in a simulator that is as close
1671 to the real deployment as possible. zpu4/core/histogram.perl is a script
1672 that will tell you which instructions take the most time.
1673 <li> Using the profile output, decide on which emulated instructions that
1674 it makes sense to implement in HDL for your particular application. Modifying
1675 zpu_core_small.vhd is not particularly hard. Most instructions can be
1676 transliterated into zpu_core_small.vhd from zpu_core.vhd without too much
1677 problem.
1678 <li>The memory subsystem may well turn out to be where you should concentrate
1679 your efforts.
1680 </ol>
1681 <h3>Toying around with the architecture</h3>
1682 Again: profile 90% of the time and spend the remaining 10% tinkering
1683 with the architecture.
1684 <ul>
1685 <li>There is a DMIPS program you can use to measure the performance of
1686 the ZPU in lieu of profiling a real application. The latter is obviously
1687 a superior solution.
1688 <li>Again: use histogram.perl to figure out which instructions you should add
1689 in HDL.
1690 <li>Tinker a bit with Fmax to find the maximum speed rating for your design.
1691 <li>zpu_core_small.vhd should be ca. 1 DMIPS and zpu_core.vhd should yield
1692 about 5-10 DMIPS before adding instructions runs out of steam.
1693 </ul>
1694 If you need to get ca. 20-50 DMIPS out of the ZPU you will have to
1695 write a heavily pipelined architecture with caches(if you are running
1696 against DRAM). This is *tricky*, but some proof of concept work was
1697 done to show 20 DMIPS w/the ZPU(the actual result was discarded since
1698 it was not complete and contained fatal flaws).
1700 Achieving above 50-100 DMIPS with the current ZPU architecture is probably
1701 a non-starter and a more conventional RISC design makes more sense here.
1703 The unique advantages of the ZPU is size in terms of HDL & code size.
1707 <a name="codesize"/>
1708 <h2>Optimizing for code size</h2>
1709 The ZPU toolchain produces highly compact code.
1710 <ol>
1711 <li>Since the ZPU GCC toolchain supports standard ANSI C, it is easy to stumble across
1712 functionality that takes up a lot of space. E.g. the standard printf() function is a beast. Some compilers drop e.g. floating point support
1713 from the printf() function and thus boast a "smaller" printf() when in fact they have a non-standard printf(). newlib has a standard printf() function
1714 and an alternative iprintf() function that works only on integers.
1715 <li>The ZPU ships with default startup code that works across various configurations of the ZPU, so be warned that there is some overhead that will
1716 not occur in the final application(anywhere between 1-4kBytes).
1717 <li>Compilation and linker options matter. The ZPU benefits greatly from the "-Wl,--relax -Wl,--gc-sections" options which is not used by
1718 all architectures(e.g. GCC ARM does not implement/need -Wl,--relax).
1719 </ol>
1720 <h3>Small code example</h3>
1721 <code>
1722 zpu-elf-gcc -Os -abel smallstd.c -o smallstd.elf -Wl,--relax -Wl,--gc-sections<br>
1723 zpu-elf-size small.elf<br>
1724 <br>
1725 $ zpu-elf-size small.elf<br>
1726 text data bss dec hex filename<br>
1727 2845 952 36 3833 ef9 small.elf<br>
1728 <br>
1729 </code>
1731 <h3>Even smaller code example</h3>
1732 If the ZPU implements the optional instructions, the RAM overhead can be reduced significantly.
1734 <code>
1735 zpu-elf-gcc -Os -abel crt0_phi.S small.c -o small.elf -Wl,--relax -Wl,--gc-sections -nostdlib <br>
1736 zpu-elf-size small.elf<br>
1737 <br>
1738 $ zpu-elf-size small.elf<br>
1739 text data bss dec hex filename<br>
1740 56 8 0 64 40 small.elf<br>
1741 <br>
1742 </code>
1744 <a name="ecos"/>
1745 <h2>Installing eCos build tools</h2>
1746 <code>
1747 tar -xjvf ecossnapshot.tar.bz2<br>
1748 tar -xjvf repository.tar.bz2<br>
1749 tar -xjvf ecostools.tar.bz2<br>
1750 # run this every time you open the shell<br>
1751 export PATH=$PATH:`pwd`/ecos-install<br>
1752 export ECOS_REPOSITORY=`pwd`/ecos/packages:`pwd`/repository<br>
1753 </code>
1754 <h3>Compiling eCos tests</h3>
1755 <code>
1756 ecosconfig new phi default<br>
1757 ecosconfig tree<br>
1758 make<br>
1759 cd kernel/current<br>
1760 make tests<br>
1761 </code>
1763 <h2>Code size ZPU</h2>
1764 <pre>
1765 $ zpu-elf-size *
1766 text data bss dec hex filename
1767 15761 1504 12060 29325 728d bin_sem0
1768 16907 1512 14436 32855 8057 bin_sem1
1769 17105 1524 30032 48661 be15 bin_sem2
1770 17186 1512 14436 33134 816e bin_sem3
1771 18986 1500 12036 32522 7f0a clock0
1772 15812 1504 13236 30552 7758 clock1
1773 25095 1972 13224 40291 9d63 clockcnv
1774 16437 1500 13224 31161 79b9 clocktruth
1775 15762 1504 12060 29326 728e cnt_sem0
1776 17124 1512 14436 33072 8130 cnt_sem1
1777 35947 1564 22512 60023 ea77 dhrystone
1778 16428 1500 13228 31156 79b4 except1
1779 15751 1504 12052 29307 727b flag0
1780 19145 1512 15624 36281 8db9 flag1
1781 20053 1516 102908 124477 1e63d fptest
1782 15998 1496 12092 29586 7392 intr0
1783 16080 1496 12200 29776 7450 kalarm0
1784 15327 1496 12036 28859 70bb kcache1
1785 15549 1496 13224 30269 763d kcache2
1786 18291 1500 12260 32051 7d33 kclock0
1787 16231 1500 13232 30963 78f3 kclock1
1788 16572 1496 13228 31296 7a40 kexcept1
1789 15618 1496 12060 29174 71f6 kflag0
1790 19287 1500 15624 36411 8e3b kflag1
1791 16887 1516 15628 34031 84ef kill
1792 16186 1496 12128 29810 7472 kintr0
1793 19724 1504 14516 35744 8ba0 klock
1794 18283 1500 14592 34375 8647 kmbox1
1795 15539 1496 12064 29099 71ab kmutex0
1796 16524 1504 15664 33692 839c kmutex1
1797 18272 1712 20348 40332 9d8c kmutex3
1798 18682 1608 20352 40642 9ec2 kmutex4
1799 15619 1496 14412 31527 7b27 ksched1
1800 15567 1496 12060 29123 71c3 ksem0
1801 17063 1500 14436 32999 80e7 ksem1
1802 15504 1496 13228 30228 7614 kthread0
1803 16167 1496 14412 32075 7d4b kthread1
1804 18281 1512 14580 34373 8645 mbox1
1805 20611 1508 14940 37059 90c3 mqueue1
1806 15672 1504 12064 29240 7238 mutex0
1807 16678 1516 15664 33858 8442 mutex1
1808 17694 1508 16868 36070 8ce6 mutex2
1809 18203 1720 20344 40267 9d4b mutex3
1810 16352 1508 14428 32288 7e20 release
1811 15890 1500 14412 31802 7c3a sched1
1812 44196 1612 286332 332140 5116c stress_threads
1813 17891 1524 16864 36279 8db7 sync2
1814 16943 1512 15644 34099 8533 sync3
1815 15467 1496 13064 30027 754b thread0
1816 16134 1496 14420 32050 7d32 thread1
1817 17560 1512 15636 34708 8794 thread2
1818 16279 1500 24028 41807 a34f thread_gdb
1819 17051 1504 20376 38931 9813 timeslice
1820 17146 1504 21564 40214 9d16 timeslice2
1821 37313 1512 422380 461205 70995 tm_basic
1822 </pre>
1823 <h3>Code size ARM (non-thumb)</h3>
1824 Thumb does not compile out of the box w/AT91 EB40a for which this test was made.<p>
1825 <pre>
1826 $ arm-elf-size *
1827 text data bss dec hex filename
1828 25204 692 16976 42872 a778 bin_sem0
1829 26644 700 22096 49440 c120 bin_sem1
1830 26996 712 55584 83292 1455c bin_sem2
1831 27008 700 22100 49808 c290 bin_sem3
1832 28992 688 16944 46624 b620 clock0
1833 25456 692 19532 45680 b270 clock1
1834 34572 1160 19520 55252 d7d4 clockcnv
1835 26224 688 19508 46420 b554 clocktruth
1836 25204 692 16976 42872 a778 cnt_sem0
1837 26888 700 22108 49696 c220 cnt_sem1
1838 44180 752 27416 72348 11a9c dhrystone
1839 26088 688 19520 46296 b4d8 except1
1840 25236 692 16968 42896 a790 flag0
1841 29532 700 24668 54900 d674 flag1
1842 29508 704 109652 139864 22258 fptest
1843 25932 684 17016 43632 aa70 intr0
1844 25824 684 17112 43620 aa64 kalarm0
1845 24728 684 16956 42368 a580 kcache1
1846 25168 684 19512 45364 b134 kcache2
1847 28112 688 17168 45968 b390 kclock0
1848 25976 688 19524 46188 b46c kclock1
1849 26372 684 19512 46568 b5e8 kexcept1
1850 25140 684 16968 42792 a728 kflag0
1851 29824 688 24660 55172 d784 kflag1
1852 26896 704 24656 52256 cc20 kill
1853 26088 684 17028 43800 ab18 kintr0
1854 30812 692 22176 53680 d1b0 klock
1855 28504 688 22260 51452 c8fc kmbox1
1856 24984 684 16984 42652 a69c kmutex0
1857 26504 692 24704 51900 cabc kmutex1
1858 28792 900 34892 64584 fc48 kmutex3
1859 29264 796 34896 64956 fdbc kmutex4
1860 25240 684 22084 48008 bb88 ksched1
1861 25044 684 16968 42696 a6c8 ksem0
1862 26988 688 22100 49776 c270 ksem1
1863 25028 684 19512 45224 b0a8 kthread0
1864 25996 684 22080 48760 be78 kthread1
1865 28552 700 22252 51504 c930 mbox1
1866 31324 696 22612 54632 d568 mqueue1
1867 25108 692 16980 42780 a71c mutex0
1868 26464 704 24700 51868 ca9c mutex1
1869 27624 696 27280 55600 d930 mutex2
1870 28596 908 34884 64388 fb84 mutex3
1871 26156 696 22100 48952 bf38 release
1872 25460 688 22084 48232 bc68 sched1
1873 56356 828 45892 103076 192a4 stress_threads
1874 27900 712 27288 55900 da5c sync2
1875 26760 700 24692 52152 cbb8 sync3
1876 24924 684 19356 44964 afa4 thread0
1877 25868 684 22084 48636 bdfc thread1
1878 27452 700 24680 52832 ce60 thread2
1879 26136 688 42704 69528 10f98 thread_gdb
1880 27212 692 34916 62820 f564 timeslice
1881 52728 700 123332 176760 2b278 tm_basic
1882 </pre>
1884 <a name="memorymap"/>
1885 <h2>Phi memory map</h2>
1886 TODO This probably belongs in the refdesign section. For now leaving it here because zealot refers to it. Not sure what else uses it.
1888 The ZPU architecture does not define a memory map as such, but the GCC + libgloss + ecos hal library uses the
1889 memory map below. "Phi" is just a three letter word for the particular memory layout below that came about
1890 while developing the ZPU.
1892 <TABLE WIDTH=604 BORDER=1 BORDERCOLOR="#000000" CELLPADDING=7 CELLSPACING=0 STYLE="page-break-after: avoid">
1893 <COL WIDTH=85>
1894 <COL WIDTH=42>
1895 <COL WIDTH=136>
1896 <COL WIDTH=283>
1897 <TR VALIGN=TOP>
1898 <TD WIDTH=85>
1899 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Address</B></FONT></FONT></P>
1900 </TD>
1901 <TD WIDTH=42>
1902 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Type</B></FONT></FONT></P>
1903 </TD>
1904 <TD WIDTH=136>
1905 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Name</B></FONT></FONT></P>
1906 </TD>
1907 <TD WIDTH=283>
1908 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2><B>Description</B></FONT></FONT></P>
1909 </TD>
1910 </TR>
1912 <TR VALIGN=TOP>
1913 <TD WIDTH=85>
1914 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0000</FONT></FONT></P>
1915 </TD>
1916 <TD WIDTH=42>
1917 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1918 </TD>
1919 <TD WIDTH=136>
1920 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
1921 enable</FONT></FONT></P>
1922 </TD>
1923 <TD WIDTH=283>
1924 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1925 [31:1] Not used</FONT></FONT></P>
1926 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1927 [0] Enable ZPU operations</FONT></FONT></P>
1928 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 ZPU
1929 is held in Idle mode</FONT></FONT></P>
1930 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 ZPU
1931 running</FONT></FONT></P>
1932 </TD>
1933 </TR>
1936 <TR VALIGN=TOP>
1937 <TD WIDTH=85>
1938 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0004</FONT></FONT></P>
1939 </TD>
1940 <TD WIDTH=42>
1941 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
1942 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1943 </TD>
1944 <TD WIDTH=136>
1945 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">GPIO data</FONT></FONT></P>
1946 </TD>
1947 <TD WIDTH=283>
1948 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit [31:0] input data 31:0</FONT></FONT></P>
1949 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit [31:0] output data 31:0</FONT></FONT></P>
1950 </TD>
1951 </TR>
1953 <TR VALIGN=TOP>
1954 <TD WIDTH=85>
1955 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0008</FONT></FONT></P>
1956 </TD>
1957 <TD WIDTH=42>
1958 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
1959 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1960 </TD>
1961 <TD WIDTH=136>
1962 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">GPIO direction</FONT></FONT></P>
1963 </TD>
1964 <TD WIDTH=283>
1965 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit [31:0] data direction 31:0</FONT></FONT></P>
1966 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0 output</FONT></FONT></P>
1967 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">1 input (default)</FONT></FONT></P>
1968 </TD>
1969 </TR>
1971 <TR VALIGN=TOP>
1972 <TD WIDTH=85>
1973 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A000C</FONT></FONT></P>
1974 </TD>
1975 <TD WIDTH=42>
1976 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
1977 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
1978 </TD>
1979 <TD WIDTH=136>
1980 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
1981 Debug channel / UART to ARM7 TX</FONT></FONT></P>
1982 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
1983 ZPU side</B></FONT></FONT></P>
1984 </TD>
1985 <TD WIDTH=283>
1986 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1987 [31:9] Not used</FONT></FONT></P>
1988 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1989 [8] TX buffer ready (valid on ready)</FONT></FONT></P>
1990 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 TX
1991 buffer not ready (full)</FONT></FONT></P>
1992 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 TX
1993 buffer ready</FONT></FONT></P>
1994 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
1995 [7:0] TX byte (valid on write)</FONT></FONT></P>
1996 </TD>
1997 </TR>
1998 <TR VALIGN=TOP>
1999 <TD WIDTH=85>
2000 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0010</FONT></FONT></P>
2001 </TD>
2002 <TD WIDTH=42>
2003 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2004 </TD>
2005 <TD WIDTH=136>
2006 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">ZPU
2007 Debug channel / UART to ARM7 RX</FONT></FONT></P>
2008 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
2009 ZPU side</B></FONT></FONT></P>
2010 </TD>
2011 <TD WIDTH=283>
2012 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2013 [31:9] Not used</FONT></FONT></P>
2014 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2015 [8] RX buffer data valid</FONT></FONT></P>
2016 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 RX
2017 buffer not valid</FONT></FONT></P>
2018 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 RX
2019 buffer valid</FONT></FONT></P>
2020 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2021 [7:0] RX byte (when valid)</FONT></FONT></P>
2022 </TD>
2023 </TR>
2024 <TR VALIGN=TOP>
2025 <TD WIDTH=85>
2026 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0014</FONT></FONT></P>
2027 </TD>
2028 <TD WIDTH=42>
2029 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read/</FONT></FONT></P>
2030 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2031 </TD>
2032 <TD WIDTH=136>
2033 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Counter(1)</FONT></FONT></P>
2034 </TD>
2035 <TD WIDTH=283>
2036 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2037 [0] Reset counter (valid for write)</FONT></FONT></P>
2038 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2039 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Reset
2040 counter</FONT></FONT></P>
2041 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2042 [1] Sample counter (valid for write)</FONT></FONT></P>
2043 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2044 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Sample
2045 counter</FONT></FONT></P>
2046 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2047 [31:0] Counter bit 31:0</FONT></FONT></P>
2048 </TD>
2049 </TR>
2050 <TR VALIGN=TOP>
2051 <TD WIDTH=85>
2052 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0018</FONT></FONT></P>
2053 </TD>
2054 <TD WIDTH=42>
2055 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2056 </TD>
2057 <TD WIDTH=136>
2058 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Counter(2)</FONT></FONT></P>
2059 </TD>
2060 <TD WIDTH=283>
2061 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2062 [31:0] Counter bit 63:32</FONT></FONT></P>
2063 </TD>
2064 </TR>
2065 <TR VALIGN=TOP>
2066 <TD WIDTH=85>
2067 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0020</FONT></FONT></P>
2068 </TD>
2069 <TD WIDTH=42>
2070 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read
2071 / Write</FONT></FONT></P>
2072 </TD>
2073 <TD WIDTH=136>
2074 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Global_Interrupt_mask</FONT></FONT></P>
2075 </TD>
2076 <TD WIDTH=283>
2077 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2078 [31:1] Not used</FONT></FONT></P>
2079 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2080 [0] Global intr. Mask</FONT></FONT></P>
2081 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 Interrupts
2082 enabled</FONT></FONT></P>
2083 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupts
2084 disabled</FONT></FONT></P>
2085 </TD>
2086 </TR>
2087 <TR VALIGN=TOP>
2088 <TD WIDTH=85>
2089 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0024</FONT></FONT></P>
2090 </TD>
2091 <TD WIDTH=42>
2092 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2093 </TD>
2094 <TD WIDTH=136>
2095 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">UART_INTERRUPT_ENABLE</FONT></FONT></P>
2096 </TD>
2097 <TD WIDTH=283>
2098 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2099 [31:1] Not used</FONT></FONT></P>
2100 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2101 [0] Debug channel / UART RX interrupt enable</FONT></FONT></P>
2102 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 Interrupt
2103 disable</FONT></FONT></P>
2104 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2105 enable</FONT></FONT></P>
2106 </TD>
2107 </TR>
2108 <TR VALIGN=TOP>
2109 <TD WIDTH=85>
2110 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0028</FONT></FONT></P>
2111 </TD>
2112 <TD WIDTH=42>
2113 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2114 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2115 </TD>
2116 <TD WIDTH=136>
2117 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">UART_interrupt</FONT></FONT></P>
2118 </TD>
2119 <TD WIDTH=283>
2120 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2121 [31:1] Not used</FONT></FONT></P>
2122 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2123 [0] Debug channel / UART RX interrupt pending (Read)</FONT></FONT></P>
2124 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 No
2125 interrupt pending</FONT></FONT></P>
2126 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2127 pending</FONT></FONT></P>
2128 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2129 [0] Clear UART interrupt (Write)</FONT></FONT></P>
2130 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2131 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2132 cleared</FONT></FONT></P>
2133 </TD>
2134 </TR>
2135 <TR VALIGN=TOP>
2136 <TD WIDTH=85>
2137 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A002C</FONT></FONT></P>
2138 </TD>
2139 <TD WIDTH=42>
2140 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2141 </TD>
2142 <TD WIDTH=136>
2143 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Interrupt_enable</FONT></FONT></P>
2144 </TD>
2145 <TD WIDTH=283>
2146 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2147 [31:1] Not used</FONT></FONT></P>
2148 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2149 [0] Timer interrupt enable</FONT></FONT></P>
2150 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 Interrupt
2151 disable</FONT></FONT></P>
2152 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2153 enable</FONT></FONT></P>
2154 </TD>
2155 </TR>
2156 <TR VALIGN=TOP>
2157 <TD WIDTH=85>
2158 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0030</FONT></FONT></P>
2159 </TD>
2160 <TD WIDTH=42>
2161 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read
2162 /</FONT></FONT></P>
2163 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2164 </TD>
2165 <TD WIDTH=136>
2166 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_interrupt</FONT></FONT></P>
2167 </TD>
2168 <TD WIDTH=283>
2169 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2170 [31:2] Not used</FONT></FONT></P>
2171 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2172 [0] Timer interrupt pending (Read)</FONT></FONT></P>
2173 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 No
2174 interrupt pending</FONT></FONT></P>
2175 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2176 pending</FONT></FONT></P>
2177 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2178 [1] Reset Timer counter (Write)</FONT></FONT></P>
2179 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2180 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Timer
2181 counter reset</FONT></FONT></P>
2182 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2183 [0] Clear Timer interrupt (Write)</FONT></FONT></P>
2184 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 0 N/A</FONT></FONT></P>
2185 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> 1 Interrupt
2186 cleared</FONT></FONT></P>
2187 </TD>
2188 </TR>
2189 <TR VALIGN=TOP>
2190 <TD WIDTH=85>
2191 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">0x080A0034</FONT></FONT></P>
2192 </TD>
2193 <TD WIDTH=42>
2194 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Write</FONT></FONT></P>
2195 </TD>
2196 <TD WIDTH=136>
2197 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Period</FONT></FONT></P>
2198 </TD>
2199 <TD WIDTH=283>
2200 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2201 [31:0] Interrupt period (write)</FONT></FONT></P>
2202 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> Number
2203 of clock cycles</FONT></FONT></P>
2204 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"> between
2205 timer interrupts</FONT></FONT></P>
2206 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt"><B>NOTE!
2207 </B>The timer will start at Timer_Periode value and count <B>down</B>
2208 to zero, and generate an interrupt</FONT></FONT></P>
2209 </TD>
2210 </TR>
2211 <TR VALIGN=TOP>
2212 <TD WIDTH=85>
2213 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">.0x080A0038</FONT></FONT></P>
2214 </TD>
2215 <TD WIDTH=42>
2216 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Read</FONT></FONT></P>
2217 </TD>
2218 <TD WIDTH=136>
2219 <P LANG="en-US" CLASS="western" ALIGN=CENTER><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Timer_Counter</FONT></FONT></P>
2220 </TD>
2221 <TD WIDTH=283>
2222 <P LANG="en-US" CLASS="western"><FONT FACE="Arial, sans-serif"><FONT SIZE=2 STYLE="font-size: 9pt">Bit
2223 [31:0] Timer counter (read)</FONT></FONT></P>
2224 <P LANG="en-US" CLASS="western"><BR>
2225 </P>
2226 </TD>
2227 </TR>
2228 <TR VALIGN=TOP>
2229 <TD WIDTH=85>
2230 <P LANG="en-US" CLASS="western"><BR>
2231 </P>
2232 </TD>
2233 <TD WIDTH=42>
2234 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2235 </P>
2236 </TD>
2237 <TD WIDTH=136>
2238 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2239 </P>
2240 </TD>
2241 <TD WIDTH=283>
2242 <P LANG="en-US" CLASS="western"><BR>
2243 </P>
2244 </TD>
2245 </TR>
2246 <TR VALIGN=TOP>
2247 <TD WIDTH=85>
2248 <P LANG="en-US" CLASS="western"><BR>
2249 </P>
2250 </TD>
2251 <TD WIDTH=42>
2252 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2253 </P>
2254 </TD>
2255 <TD WIDTH=136>
2256 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2257 </P>
2258 </TD>
2259 <TD WIDTH=283>
2260 <P LANG="en-US" CLASS="western"><BR>
2261 </P>
2262 </TD>
2263 </TR>
2264 <TR VALIGN=TOP>
2265 <TD WIDTH=85>
2266 <P LANG="en-US" CLASS="western"><BR>
2267 </P>
2268 </TD>
2269 <TD WIDTH=42>
2270 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2271 </P>
2272 </TD>
2273 <TD WIDTH=136>
2274 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2275 </P>
2276 </TD>
2277 <TD WIDTH=283>
2278 <P LANG="en-US" CLASS="western"><BR>
2279 </P>
2280 </TD>
2281 </TR>
2282 <TR VALIGN=TOP>
2283 <TD WIDTH=85>
2284 <P LANG="en-US" CLASS="western"><BR>
2285 </P>
2286 </TD>
2287 <TD WIDTH=42>
2288 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2289 </P>
2290 </TD>
2291 <TD WIDTH=136>
2292 <P LANG="en-US" CLASS="western" ALIGN=CENTER><BR>
2293 </P>
2294 </TD>
2295 <TD WIDTH=283>
2296 <P LANG="en-US" CLASS="western"><BR>
2297 </P>
2298 </TD>
2299 </TR>
2300 </TABLE>
2302 <hr> <!-- +++++++++++++++++++++++++++++++++++++++++++++++++++++ -->
2304 <a name="todo"/>
2305 <h1>TODO</h1>
2307 <a name="todolist"/>
2308 <h2>TODO list</h2>
2309 <ul>
2310 <li>fix the TODO in this doc that are just doc fixes
2311 <li>organize the TODO list by priority and assign responsibility... if there are takers.
2312 <li>converge on a single IO for core implementations.
2313 <li>fill in performance table for Altera and Lattice.
2314 <li>re-org CVS to make it easy to keep appropriate SW, RTL(verilog and VHDL) , scripts, verification stuff together. separation of tools, core, common, and ref design
2315 <li>provide FPGA scripts.
2316 <li>provide HDL regression environment.
2317 <li>RAM model contribution needed. What is in opencore/common is not adequate.
2318 <li>make wishbone bridge re-usable with all cores
2319 <li>explicit example with UART from opencores in the above ref designs.
2320 <li>discussion of tools needed and choose some to be supported by project. Need to deal with cygwin vs linux, VHDL vs verilog, open vs closed.... plus language support in simulators is sometimes lacking.
2321 <li>setup.sh script needs to detect linux/cygwin, and should have install path option.
2322 <li>shaping up the www.opencores.org pages.
2323 <li>BSD and GPL licenses in the appropriate places.
2324 <li>Currently there exists some pages at <A HREF="http://www.zylin.com/zpu.htm">http://www.zylin.com/zpu.htm</A> that explains about the ZPU. According to OpenCores policy this information should be moved to www.opencores.org. Patches gratefully accepted to do so!
2325 <li>eCos HAL could be less RAM hungry
2326 <li>Needs GDB stub support in eCos
2327 <li>Could do with a Verilog implementation(ca. 600 lines to translate)
2328 <li>Make little endian throughout. Currently instructions are stored big endian, loadb and storeb are big endian, but the data bus is treated as little endian. Creates some problems in type conversion.
2329 </ul>
2331 <a name="repository"/>
2332 <h2>Repository Re-org</h2>
2333 I am proposing the following structure for the repository. It follows somewhat the way I've organized this document with seperation of core, common, and three SOC ref designs. New users go straight to the SOC that best matches their needs.
2334 <pre>
2335 zpu/bin # scripts and toolchain? Want toolchain installed with project. Tidier when working in multi user / multi project environment
2336 zpu/doc #
2337 zpu/core/rtl # RTL for the various core implementations.
2338 zpu/core/sw # crt0.s ?
2339 zpu/common/rtl # Re-use RTL such as RAM and UART
2340 zpu/common/sim # Re-use RTL and tools for regresion testing
2341 zpu/common/sw # ?
2342 zpu/soc/minimal # Three levels of ref designs described above
2343 /basic
2344 /board
2345 zpu/soc/*/rtl # top level, arbiter, etc
2346 zpu/soc/*/sw # helloworld, dmips, etc. makefile/ROMS
2347 zpu/soc/*/sim # regression test area. makefile/scripts
2348 zpu/soc/*/fpga # syn and par area. makefile/scripts
2349 zpu/tools # zip/tarball of tool chains, simulator
2350 </pre>
2351 Not sure where ecos fits.
2353 <a name="nextgen"/>
2354 <h2>Next generation ZPU</h2>
2355 Based on feedback here is a list of a tenuous "consensus" for the next generation
2356 of the ZPU with some tentative ideas on implementation.
2357 <h3>Goals</h3>
2358 <ol>
2359 <li>Reduce minimum code size footprint, i.e. BRAM code overhead. Non-trivial
2360 usable applications in 4kBytes of BRAM (single BRAM block).
2361 <li>Reduce minimum FPGA logic footprint by 20% or more. Goal &lt;300 LUT for
2362 32 bit ZPU
2363 <li>Weed out unnecessary ZPU variations and merge in useful
2364 features to a few recommeneded ZPU implementations.
2365 <li>Will someone be willing to contribute a heavily pipelined ZPU?
2366 Performance goal of 10 DMIPS w/DRAM & cache.
2367 This ZPU could run a TCP/IP stack with relevant performance to compete
2368 with stripped down ARM7 type systems.
2369 </ol>
2370 <h2>GCC changes</h2>
2371 The GCC changes planned are 100% backwards compatible with default
2372 options. However, a raft of options will be added to disable
2373 functionality so as to allow study and experimentation with the
2374 ZPU architecture.
2375 <ol>
2376 <li>Add options that allow defining single entry for all unknown instructions. Precisely
2377 how unknown instructions are handled will be defined by the HDL implementation.
2378 Currently the GCC backend places relatively strict limitations on how unknown/emulated
2379 instructions are handled. This will allow HDL implementations to have
2380 sparser instruction set support. Also this can allow sparse implementations
2381 of emualted instructions. This is especially important to reduce minimal
2382 BRAM requirements for small applications.
2383 <li>GCC needs 4 "hard" registers. These are today mapped to memory. GCC
2384 will allow specifying what address to use or alternatively not to use
2385 memory mapped hard registers at all.
2386 <li>Strip away unused instructions from GCC and add options to GCC for not
2387 emitting more advanced instructions. This will e.g. convert MULT/DIV into
2388 function calls to libgcc and thus make it easier to determine that
2389 microcode is not needed.
2390 </ol>
2392 <a name="float"/>
2393 <h1>Floating point support</h1>
2394 The ZPU does not currently have floating point support. Feedback
2395 from users indicates that single precision floating point support for
2396 addition, multiplication and float-to-integer convesion would
2397 be useful for small ZPU programs that sit in a tight control
2398 loop. Essentially the ZPU is then measuring something, doing a
2399 few calculations and then modifying the control signal.
2401 Such control loops can be written in fixed point math, but that
2402 adds to the engineering effort and reduces clarity of the software
2403 implementation and the performance will probably be worse than
2404 for a hardware floating point version.
2405 <h2>Pipelined floating point module</h2>
2406 Design needs to be nailed down.
2407 <b>Goals:</b>
2408 <ul>
2409 <li> 32 bit single precision floating point
2410 <li> FADD => add two floats
2411 <li> FMULT => multiply two floats
2412 <li> FINT => convert float to int
2413 </ul>
2414 The problem is divided into two:
2416 <ol>
2417 <li>One top level VHDL module for each of the operations above.
2418 <li>Integration into ZPU's are a separate problem that will not be
2419 addressed in this project.
2420 <li>add a memory mapped coprocessor interface to the above. This
2421 yields an example of a coprocessor which can be used for any
2422 custom calculations and allows interest to be gauged.
2423 </ol>
2425 Throughput:
2427 <ol>
2428 <li>pipelined design where throughput is one operation per cycle
2429 with a fixed number of cycles delay.
2430 <li>there is no flow control or enable signal.
2431 </ol>
2435 GCC support is not hard, but modifying GCC should considered after
2436 interest in this feature beyond a coprocessor has been gauged.
2438 <h2>VHDL module interface</h2>
2440 Patches anyone???
2442 </body>
2443 <html>