FPGA SoC On-Chip Buses |
|||
Home On-chip Memory >> << Supercomputers
Usenet Postings |
Subject: Re: Microcomputer buses for use inside FPGA/ASIC devices? Newsgroups: comp.arch.fpga Date: Sat, 24 Jul 1999 21:36:20 -0700 Wade D. Peterson wrote in message <7ndpnl$pcu$1@news1.tc.umn.edu>... >I'm working on a project where we're doing a microcomputer bus (kind of like >VMEbus or PCIbus) for use *INSIDE* of FPGAs and ASICs. It's for hooking >system-on-chip (SOC) components together. If anyone has done this before, or >know of any references to this kind of project, I'd like to hear about it. >If anybody knows of similar technology, I'd like to hear about it. If there are >more, then my intention is to start a FAQ database on our website for all to >use. My 1995 J32 system had a 32-bit on-chip peripheral bus. The left 60% of the XC4010 was a 32-bit RISC processor, using a 32-bit long line bus to multiplex amongst the various execution stage results (including add/sub, logic, 1-, 2-, 4-bit shifts left and right, load data, sign extension data, return address). This used approximately 16x11=176 TBUFs. The right half of the XC4010 was a 32-bit long line peripheral bus. It had 4 byte-wide lanes. The processor was byte addressable with byte, 16-bit halfword, and 32-bit word data types. Call the processor result bus P[31:0], the peripheral data bus D[31:0], and the external RAM data bus XD[31:0]. I used these sets of TBUFs: (approx. 144 TBUFs + 32 OBUFTs): * store byte, halfword, word: D[7:0] <- P[7:0], D[15:8] <- P[15:8], D[31:16] <- P[31:16] * load byte, halfword, word: P[7:0] <- D[7:0], P[15:8] <- D[15:8], P[31:16] <- D[31:16] * store various byte lanes to external RAM (OBUFTs) XD[7:0] <- D[7:0] XD[15:8] <- D[15:8] XD[23:16] <- D[23:16] XD[31:24] <- D[31:24] * load various byte lanes from external RAM D[7:0] <- XD[7:0] D[15:8] <- XD[15:8] D[23:16] <- XD[23:16] D[31:24] <- XD[31:24] * copy bytes/halfwords to upper byte lanes D[15:8] <- D[7:0] D[23:16] <- D[7:0] D[31:24] <- D[15:8] * copy bytes from upper byte lanes D[7:0] <- D[15:8] D[7:0]] <- D[23:16] D[15:8] <- D[31:24] In case you are interested, here is some of the source code which generated this. It is my own "CNets HDL", a C++ class library for emitting XNF. ff() is a flip-flop, tbuf() is a tbuf. Note the use of tlocs (LOCs for TBUFs). void Mem::emit(Control& c) { net(zad24n) = adn(23,20) == 0U; net(zad20n) = adn(19,16) == 0U; ff(selROM, zad24n & zad20n, c.marce, _, init(1)); ff(selRAM, ~adn[23] & ~(zad24n & zad20n), c.marce); ackROM = start & selROM; ack = ackROM | ackRAM | ackUART; for (unsigned i = 0; i < 4; i++) bytesel[i] = (byte & ad(1,0) == i) | (half & ad(1,1) == (i>>1)) | word; // processor to internal dbus interface ff(doutbytet, ~write, start, _, init(1)); ff(douthalft, ~(write & (byte|half)), start, _, init(1)); ff(doutwordt, ~(write & (byte|half|word)), start, _, init(1)); // dbus internal/external interface: // emit 3state drivers to copy external dbus to/from internal dbus bus(dbusin, cbit); bus(dpads, cbit); for (i = 0; i < cbit; i++) { tsIgnore(dpads[i]); iopad(dpads[i], ploc(dpadlocs[i])); ibuf(dbusin[i], dpads[i]); unsigned t = 1 + even(i); tbuf(xd[i], dbusin[i], dinbyteextt[i / 8]); obuft(dpads[i], xd[i], doutextt); } // byte/halfword load/store alignment logic ff(b1b0t, ~( write & byte & ad[0]), start, _, init(1)); ff(b2b0t, ~( write & (byte|half) & ad(1,0) == 2), start, _, init(1)); ff(b3b1t, ~( write & ((byte&(ad(1,0)==3))|(half&ad[1]))), start, _, init(1)); ff(b0b1t, ~(~write & byte & ad[0]), start, _, init(1)); ff(b0b2t, ~(~write & (byte|half) & ad(1,0) == 2), start, _, init(1)); ff(b1b3t, ~(~write & ((byte&(ad(1,0)==3))|(half&ad[1]))), start, _, init(1)); for (i = 0; i < 8; i++) { unsigned t = 1 + even(i); tbuf(xd[i+ 8], xd[i ], b1b0t, tloc(rowForBit(i+ 8),20,t)); tbuf(xd[i+16], xd[i ], b2b0t, tloc(rowForBit(i+16),20,t)); tbuf(xd[i+24], xd[i+ 8], b3b1t, tloc(rowForBit(i+24),19,t)); tbuf(xd[i ], xd[i+ 8], b0b1t, tloc(rowForBit(i ),19,t)); tbuf(xd[i+ 8], xd[i+24], b1b3t, tloc(rowForBit(i+ 8),18,t)); tbuf(xd[i ], xd[i+16], b0b2t, tloc(rowForBit(i ),17,t)); } } The on-chip "peripherals were a UART and on-chip RAM and ROM, enough to boot and print a "hello world" message. There was also an integrated DRAM controller. You can see a floorplan of this at http://www3.sympatico.ca/jsgray/sld021.htm. Old articles which touched on this subject: http://deja.com/getdoc.xp?AN=120389301&fmt=text http://deja.com/getdoc.xp?AN=136481723&fmt=text http://deja.com/getdoc.xp?AN=280290025&fmt=text http://deja.com/getdoc.xp?AN=398007481&fmt=text Recently I designed another on-chip bus with particular CPU-to-bus-controller and bus-controller-to-peripheral interfaces. Please write me for more information. Jan Gray Subject: Re: Microcomputer buses for use inside FPGA/ASIC devices? Newsgroups: comp.arch.fpga Date: Sat, 24 Jul 1999 21:48:41 -0700 I wrote: >...The left 60% of the XC4010 was a 32-bit RISC processor. >...This used approximately 16x11=176 TBUFs. Sigh. Rather, 32x11 = 352 TBUFs. Jan Gray Subject: Re: Microcomputer buses for use inside FPGA/ASIC devices? Newsgroups: comp.arch.fpga Date: Mon, 26 Jul 1999 21:34:30 -0700 Wade D. Peterson wrote in message <7nf1rv$5r$1@news1.tc.umn.edu>... >1) When you say "on-chip peripheral bus" is this your terminology, or are you >refering to a so-called 'OPB' bus that I'm seeing on some cores? For example, I >believe that ARM processors use something called an 'OPB' bus. My terminology, just a descriptive phrase. (It hosted on-chip memory elements and peripheral elements and interfaced to off-chip memory.) >2) Do you think your peripheral bus is portable across multiple FPGA >architectures, or is it limited to Xilinx? It is port-able, but not especially so, portability was not a design goal. 1. design tool: the CNets C++ class library, would need to be retargeted. Easy for Orca or Virtex, somewhat less so for other families. 2. implementation: used generic logic expressions and flip-flops, but there were lots of 3-state buffers, and the design was optimized using LOC constraints that would not apply to a non-XC4000. 3. interfaces (signaling): would work unchanged across architectures. (I do not propose the J32 bus for any purpose. I thought it might of historical interest.) >> Old articles which touched on this subject: >I tried these links, but they appear to be dead. Try again! >> Recently I designed another on-chip bus with particular >> CPU-to-bus-controller and bus-controller-to-peripheral interfaces. ... >Do you have anything written up on these. Sorry, the docs are not yet ready for publication. But I think some of the design space issues are: * zero, one, or more processors? on-chip or off-chip processor? :-) * clocking -- do CPU clocks equal bus clocks? 1-1? 2-1? 1-2? * processor has one memory port or two (Harvard)? * one bus (share processor result bus with on-chip data bus) or two? * any access to processor resources (e.g. reg file ports)? * byte addressing? byte/halfword/word types? byte-lane shifting? * is the on-chip bus connected to an off-chip I/O or memory bus? same width? same clock discipline? * wait state insertion? * multi-master? arbitration? * interrupt requests? * DMA requests? * pipelined bus transactions? * split transactions? In my current work-in-progress, the bus is: 1-1 with on-chip CPU's clock, Harvard, one bus, byte addressable, byte/16-bit-word data types, attached to a double-cycled external data bus, with arbitrary wait-states, interrupts, DMA, and pipelined bus transactions. Other comments. FPGA Device Architects: this on-chip bus stuff is so much easier if you follow the XC4000 lead and provide the abstraction of long, wide, partitionable buses with *abundant* 3-state drivers -- one per logic cell is good. The bus control itself can be built in programmable logic. Finally, in designing a on-chip bus with an eye on standardization, note some interesting design tensions: 1. malleable or fixed bus topologies and clocking disciplines? -- why not take advantage of FPGA flexibility and define a general bus architecture space, making allowance for one or more 8-, 16-, 32-, even arbitrary k-bit buses, and other dimensions of the design space I described above? Then customers can specialize designs to suit. -- Oops, that adds complexity and makes validation much harder. 2. lightweight or heavyweight? My current bus has a control overhead of ~2 CLBs per peripheral. At the opposite extreme, imagine an on-chip PCI bus. The latter would offer many features, like configuration registers, but these would be of little value in a cheap SOC in an XCS10XL or 20XL. I can't wait to see an on-chip bus standard (or standards) for FPGAs -- then we might finally see a marketplace of plug-and-play processors and peripherals cores. Jan Gray
Copyright © 2000, Gray Research LLC. All rights reserved. |