The Myriad Uses of Block RAM


Flex10K CPUs >>
<< Floating point

Usenet Postings
  By Subject
  By Date

  Why FPGA CPUs?
  Homebuilt processors
  Altera, Xilinx Announce
  Soft cores
  Porting lcc
  32-bit RISC CPU
  Superscalar FPGA CPUs
  Java processors
  Forth processors
  Reimplementing Alto
  FPGA CPU Speeds
  Synthesized CPUs
  Register files
  Register files (2)
  Floating point
  Using block RAM
  Flex10K CPUs
  Flex10KE CPUs

  Multis and fast unis
  Inner loop datapaths

  SoC On-Chip Buses
  On-chip Memory
  VGA controller
  Small footprints

  CNets and Datapaths
  Generators vs. synthesis

FPGAs vs. Processors
  CPUs vs. FPGAs
  Emulating FPGAs
  FPGAs as coprocessors
  Regexps in FPGAs
  Life in an FPGA
  Maximum element

  Pushing on a rope
  Virtex speculation
  Rambus for FPGAs
  3-D rendering
  LFSR Design

Google SiteSearch
Subject: Xilinx Virtex announced, what to do with big blocks of RAM
Date: Tue, 27 Oct 1998 11:40:13 -0800
X-Unsent: 1 [ed: note, never posted]

See  Now both Altera and Xilinx
offer products with large blocks of embedded RAM.  What shall we do with it?

Simply put, if a function is, has, or can be implemented using a few KB or
less of RAM or ROM, it can probably be implemented using embedded RAM or ROM
in a suitable programmable logic device.

So here are some obvious potential applications:

* register file; many-hundred-bit-word register file; vector register file;
windowed register file, fixed or variable size, optionally tiled;

* multiple register file contexts, including user/kernel/interrupt handler
shadow contexts, or multiple threads' contexts;

* operand/data stacks, control (incl. return address) stacks; stack elements
referenced either directly or only top of stack; unified locals+operands
stack; storing one or more activation records; systems which automatically
store and reload same, burst or trickle-back;

* m-read n-write multiported versions of these register files or stacks, via
the embedded RAM's inherent multiport access, or time-multiplexed access, or
replicated copies, or supporting multiple concurrent writes using
 replication + each write updates only one replica and updates 'which
replica valid' state + read access selects valid replica };

* hybrid schemes with small fast multiported register files/stacks using
fine grained embedded RAMs, backed by larger multiple frame/context storage
using large embedded RAM blocks, providing fast call/return and/or fast
context switching;

* multiplier, divider, and/or trigonometic lookup tables, or partial lookup
tables, coefficient lookup tables, interpolant estimate tables;

* control stores: wide encoded state machines, microcode, nanocode
(multiple-level structures), possibly writeable;

* branch prediction mechanisms including branch target address caches,
branch target instruction caches, return address caches, branch history

* instruction buffers; loop buffers; on-chip instruction and/or data cache
data, direct mapped or small-n-way set associative; cache tags (for on- or
off-chip cache data); MOESI style cache coherence bits; snoop tags; schemes
combining and/or concurrently accessing data+tags in same RAM block; victim
buffers; write buffers and write-accumulation structures; predecoded,
decompressed, or canonicalized instruction caches; any of these optionally

* segmentation registers, translation lookaside buffers, and other per
segment, page or region memory mapping state to real address, present,
valid, and/or dirty bits or state including direct mapped entries,
sequentially probed entries, with grouped, random, sequential, linked list
and other such line replacement policies/mechanisms;

* per-task state tables, including priorities, task state, next-task info,
attributes, and masks; fast dedicated thread local storage;

* debug support tables including breakpoint code address and/or count
registers, breakpoint data address and/or value registers, nonsequential IP
history, branch taken/not taken history, memory access history; dynamically
reconfiguring an FPGA processor to insert such debugging features on demand;

* on-chip RAM or scratchpad RAM; multiple banks of same supporting
multiported access, or interleaving; optionally preinitialized; on-chip ROM;
use of these for interpreter or emulator code or data,

* use of on-chip RAM to buffer, exchange, or manage data between on- or
off-chip processor and on- or off-chip peripherals or coprocessors;

* DMA staging buffers; off-chip memory multiple-outstanding-transaction

* I/O buffers/FIFOs/queues in general, linear, circular, or linked list,

* on-chip/off-chip memory/peripheral controller's table mapping addresses to
peripheral selects and wait state control timings,

* DRAM open page registers;

* garbage collection support: read, write barriers via page table attribute
bits or region table address checks; card marking bit array (one bit per
memory tile of 256 bytes or so).

* on-chip message/cell buffers; queues; virtual channel message/cell
buffers; node/address-to-info maps; use of same for message passing or
shared memory multiprocessors and packet/cell switched network interconnect

* buffers for temporary storage of messages pending segmentation, and
buffers for subsequent reassembly of cells into messages;

* audio input or output buffers or delay lines; envelopes; audio samples,
wave tables; tone generators;

* video line input or output buffers or delay lines; sprite or overlay
storage; stipples; stencils; character or pattern generator ROM,

* graphics: display list, render command queue, vertex lists; transformation
matrices or stacks of same; per span or chunk compositing buffer; rendering
buffers for accumulating the current scan line's spans' colour, alpha,
and/or Z- information; texture cache;

* DCT and IDCT support (8x8 pixel blocks, quant coeff tables, huffman tables

* RAMDAC colour LUTs mapping colour index to colour, or mapping colour to
gamma corrected colour; VGA, XGA, etc. emulation state,

* bus interface configuration state memory, incl. PCI configuration memory;
peripheral device command FIFO, response FIFO;

* self-diagnosis: storage of one or more samples of selected, captured
readback data, readback bit skip counts,

* some of the above replicated on-chip; or shared amongst multiple on-chip
clients, including multiple processors;

* some combination of above stored together in a single embedded RAM block
or a bank of same;

* hybrid uses of large embedded RAM blocks together with smaller distributed
RAM blocks to achieve large storage capacity with highly multiported access
to a subset of that storage;

and, lest we forget,

* arbitrary lookup tables for functions of 8, 9, ... 12 input bits yielding
up to 16, 8, ... 1, outputs.

Should be fun...

(Absent from my list above are those uses of small RAMs that require content
addressibility and/or heavy multiporting: reservation stations, out-of-order
instruction issue/retire queues, fully associative TLBs, fully associative
caches, some compression algorithms, IP routing, etc.  I do not expect to
see context addressible RAM as an embedded block any time soon.)

"It's a good time to be us,"
Jan Gray

Copyright © 2000, Gray Research LLC. All rights reserved.
Last updated: Feb 03 2001