FPGA CPU News of April 2001

Home

May >>
<< Mar


News Index
2002
  Jan Feb Mar
  Apr May Jun
  Jul Aug Sep
2001
  Jan Feb Mar
  Apr May Jun
  Jul Aug Sep
  Oct Nov Dec
2000
  Apr Aug Sep
  Oct Nov Dec

Links
Fpga-cpu List
Usenet Posts
Site News
Papers
Teaching
Resources
Glossary
Gray Research

GR CPUs

XSOC
  Launch Mail
  Circuit Cellar
  LICENSE
  README
  XSOC News
  XSOC Talk
  Issues
  xr16

XSOC 2.0
  XSOC2 Log

CNets
  CNets Log
Google SiteSearch

Sunday, April 29, 2001
FPGA-FAQ.com now has the indexed comp.arch.fpga archives online. These archives are a treasure trove of history, lore, and prior art.

I wrote the Python code that formatted and indexed Philip Freidin's raw archives of almost seven years of comp.arch.fpga messages, augmented by those archives kept by Markus Wannemacher. I hope you like it.

The most prolific contributor (by far) is Ray Andraka of the Andraka Consulting Group, whose 1300+ articles total 95,000 lines / 560,000 words / 4 MB of text!

Now that Google Groups beta has Usenet archives back to 1995, more or less, you can also search Google's comp.arch.fpga archives.

Thursday, April 26, 2001
Nick Flaherty, Electronics Times: Altera goes for I/O bandwidth on FPGAs.

Crista Souza, EBN: Altera aims APEX II beyond periphery.

Three articles from Murray Disman, consulting editor for ChipCenter's PLD division.

1) Altera Introduces APEX-II Family:

"Altera has done the industry a great favor by changing the way it numbers its parts. It has replaced the now meaningless system gates used by some companies in the part's designator by a number that represents how many logic elements are contained in the part."
2) Altera Demos FPGA With Embedded ARM9:
"Altera has launched a barrage of publicity surrounding the first availability of the 32-bit ARM922T-based EPXA10 Excalibur device. Shipments are some three months behind the original schedule. It turns out it is not as simple as might be thought to integrate an embedded processor and an FPGA fabric."
3) Xilinx Releases Soft 32-bit Processor Core.

Tuesday, April 24, 2001
FCCM is next week. Fun plus. This year, I don't think I'll be there, alas.

Genuinely interesting
Proceler's application specific soft processors. Articles about Proceler. Richard Goering, EE Times: C design goes 'soft'.

Inner loop datapaths (1997):

"In theory, you could compile your dusty deck C, C++, Java, FORTRAN, Scheme, etc. and run it immediately on your FPGA CPU. Then automatically (profile driven) or through explicit directives, you can compile the inner loops to a custom datapath. This can either be manifest as an on-chip command oriented coprocessor, or in some cases as new instructions. The latter has the potential advantage of very high custom operation issue rates (today, 66 MHz) and access to processor register file, etc."
So perhaps "theory" is becoming "practice".

Altera Apex II
Altera press release: Altera Breaks Into High-Performance System Datapath with Introduction of APEX II Family. Data sheet.

0.15 micron, 8LM, all copper, 1.5V supply voltage. Lots of high speed I/O, lots and lots of LUTs; at 89,280 LUTs (and 1.5 Mb of ESB RAM), the high-end EP2A90 should be comparable in capacity to the announced 93,184 LUT (and 3 Mb block RAM) XC2V8000.

LogicLock (from data sheet):

"The LogicLock feature allows the designer to make pin and timing assignments, verify functionality and performance, and then set constraints to lock down the placement and performance of a specific block of logic using LogicLock constraints."
If it is anything like Xilinx's RLOCs it should prove invaluable to Altera expert designers, and should help Altera boost the performance of their soft IP cores -- such as Nios.

(By the way, it is important that such placement constraints can be schematic and HDL source code annotations -- it is rather less useful if the constraints are applied to post-synthesis entities. Is that the case with LogicLock?)

Noteworthy: no announcements of APEX II embedded hard processor cores, nor of embedded hard multipliers, nor of something comparable to the XCITE controlled impedance technology.

Dog gates
I enjoyed this bit from the APEX II Questions and Answers:

"Q. Why is the naming convention for APEX II devices different than other Altera LUT-based PLD families"

"A. The APEX II nomenclature is based on logic elements (LEs) rather than system gates. As PLDs become more and more complex, it becomes increasingly difficult to represent logic density, features, and memory using a single unit of measure. An LE-based nomenclature more accurately communicates the logic capacity of programmable logic devices, instead of using gate counts that do not follow a defined, PLD industry standard. Disproportionate growth in embedded memory and logic elements leads to unbalanced weighting in gate enumeration and ultimately to misleading density representations. An LE-based nomenclature will facilitate better device selection and avoid confusion that could result from using a gate-based nomenclature." (emphasis added)

I agree. Good for Altera. See also marketing gates redux. Even the XC40150XV has more LUTs than the XC2V1000, and the XCV1000 offers more LUTs than the XC2V2000. Let's "size" devices by their LUT counts (and optionally by the number and size of embedded RAM blocks).

[updated 04/25/01:]
Altera press release: Out with Gate Counts, In with Logic Elements.

Brown and Rose, Univ. of Toronto: Architecture of FPGAs and CPLDs: A Tutorial:

"Gate count is an especially contentious issue in the FPGA industry, and so the numbers given in this paper for all manufacturers should not be taken too seri-ously. Wags have taken to calling them 'dog' gates, in reference to the traditional ratio between human and dog years."

Monday, April 23, 2001
I've really enjoyed reading some of the class handouts from Prof. André DeHon's three term course on computer architecture at CalTech: CS/EE 184abc. These pages provide a fresh perspective on computer architecture, including new insights on computing with programmable logic structures. Highly recommended.

Xilinx sees the light
Anthony Cataldo, EE Times: Altera, Xilinx prep high-end PLDs as revenues dive:

"Xilinx, for its part, recently announced plans to field its own embedded 32-bit soft core, called Microblaze. The company is touting its comparatively high 125-MHz operation and low gate count. A designer could conceivably put up to 100 of the cores on one FPGA, Xilinx said." (emphasis added)
Let's see. At a stated 800-900 LUTs per processor ("comparatively low gate count" compared with Nios, perhaps, but still 4-5X larger than the 16-bit gr1040 and 3X larger than the 32-bit gr1050) and assuming no structure sharing, 100 processors would require 80,000-90,000 LUTs.

That would seem to rule out the XCV3200E (64,896 LUTs) and the XC2V6000 (67,584 LUTs). Perhaps it refers to the planned XC2V8000 or XC2V10000 (122,880 LUTs). But with "only" 192 block RAMs per XC2V10000, should we infer that each Microblaze can operate on one block RAM?

Xilinx: what device can you host 100 Microblazes in?

See also Multiprocessors and hard CPU cores do not moot soft cores and 500 soft CPUs per chip?.

I know from my experiences implementing sixty 180-LUT (8x6 CLB) gr1040s in one XCV600E that there's a big difference between mere feasibility (dividing {48x72 CLBs + 72 block RAMs} by {8x6 CLBs + 1 block RAM} => up to 72 CPUs/chip) and an implementation in hand (getting the tools to behave, and developing a compact interprocessor interconnect with a reasonable programming model).

'"We think we've reached the point where we'll get these soft processors into the mainstream," said Richard Sevcik, senior vice president of software and cores for Xilinx.'
In my opinion, it was Bryan Hoyer's Altera Nios program that finally brought soft FPGA CPU cores into the "mainstream".

Saturday, April 21, 2001
An early FPGA CPU project: Yoav Freund et al, UCSC: XC3020-based CPU (1990).

Thursday, April 19, 2001
The GCC imperative
As I've said before, lcc compiler support is nice and fairly easy to do, but the "key to the treasure chest" of open source software (C/C++ runtime libraries, TCP stacks, Linux, etc.) is proper GCC support. Take this announcement from Altera -- entirely predicated on the Nios GCC port.

Altera: Altera Taps into Linux and Ethernet Markets With Nios Embedded Processor Core:

"Nios Linux Development Kit ..."

"A complete Linux development environment including, uCLinux source code, SDRAM controller and memory module, Ethernet expansion board, host adapter board, hardware reference design, and web server application software provides a robust Linux development suite to enable rapid system development. ..."

"The Nios Linux Development Kit will cost $2,495 and is scheduled to ship in July 2001."

This press release had another interesting tid-bit: "Since its introduction in September of 2000, more than 1000 Nios Development Kits have been sold to SOPC developers."

Well, since its introduction in March of 2000, well over 1500 copies of the XSOC/xr16 Kit have been downloaded by visitors such as yourselves. So for the time being, we can amuse ourselves with the tag line

XSOC -- Perhaps the World's Most Popular FPGA SoC Kit :-)

Sunday, April 15, 2001
Fun with Virtex Delay-Locked Loops
My XCV600E prototyping board has a 50 MHz oscillator. I need 80-100 MHz. What to do? Remove the fixed oscillator and solder in an oscillator socket? Naah.

First I instantiated a CLKDLL and fed its CLK2X output through a BUFG into my clk network. Result -- design runs at 100 MHz. Great.

Next, for 80 MHz, I instantiated three CLKDLLs:

  1. to multiply by 2
  2. to multiply that by 2
  3. to divide that by 2.5
This makes 100 MHz from 50 MHz, 200 MHz from 100 MHz, and 80 MHz from 200 MHz.

Here's the code I used, in case you are curious. (No guarantees that this is the right and proper way to do this.)

  wire clkin_, clk1x, clk2x, lock2x, clk2x2, clk4x, lock4x,
    clk4x2, clkdiv, clk;

  IBUFG clkin__(.I(clkin), .O(clkin_));

  CLKDLL dll2x(.CLKIN(clkin_), .CLKFB(clk1x), .RST(1'b0), 
    .CLK0(clk1x), .CLK90(), .CLK180(), .CLK270(),
    .CLK2X(clk2x), .CLKDV(), .LOCKED(lock2x))
    /* synthesis xc_props="LOC=DLL2S" */;

  SRL16 lock2xq_(.D(lock2x), .CLK(clk2x), .Q(lock2xq), 
    .A3(1'b1), .A2(1'b1), .A1(1'b1), .A0(1'b1));
  wire rst4x = ~lock2xq;

  CLKDLL dll4x(.CLKIN(clk2x), .CLKFB(clk2x2), .RST(rst4x), 
    .CLK0(clk2x2), .CLK90(), .CLK180(), .CLK270(),
    .CLK2X(clk4x), .CLKDV(), .LOCKED(lock4x))
    /* synthesis xc_props="LOC=DLL2P" */;

  SRL16 lock4xq_(.D(lock4x), .CLK(clk4x), .Q(lock4xq), 
    .A3(1'b1), .A2(1'b1), .A1(1'b1), .A0(1'b1));
  wire rstdiv = ~lock4xq;

  CLKDLL dlldiv(.CLKIN(clk4x), .CLKFB(clk4x2), .RST(rstdiv), 
    .CLK0(clk4x2), .CLK90(), .CLK180(), .CLK270(),
    .CLK2X(), .CLKDV(clkdiv), .LOCKED())
    /* synthesis xc_props="LOC=DLL3P,CLKDV_DIVIDE=2.5" */;

  BUFG clkbuf(.I(clkdiv), .O(clk));
Note the two SRL16 instances. These 16-tap shift registers seem to be necessary to hold the downstream CLKDLLs in reset until each upstream CLKDLL locks. XAPP132: Using the Virtex Delay-Locked Loop:
"In order to achieve lock, the DLL may need to sample several thousand clock cycles. After the DLL achieves lock the LOCKED signal activates. ...

When using this circuit it is vital to use the SRL16 cell to reset the second DLL after the initial chip reset. If this is not done, the second DLL may not recognize the change of frequencies from when the input changes from a 1x (25/75) waveform to a 2x (50/50) waveform.

"What's an SRL16?", you may be wondering. See Ken Chapman, Xilinx: The SRL16E: Part 1. Part 2. Part 3.

Monday, April 9, 2001
Congratulations to Xilinx on their MicroBlaze announcement. 50 dhrystones is an excellent result for a soft core.

Xilinx Announces MicroBlaze: World's Fastest FPGA Soft Processor. Architecture. Overview. The MicroBlaze Soft Processor versus the Competition. "The MicroBlaze processor needs only half as many Luts to deliver twice the performance as proven with the industry standard Dhrystone 2.1 Benchmark." Catchy slogan.

Let us hope this is the start of a trend to evaluate FPGA soft cores using both performance and performance/logic cell figures of merit. See also my FPGA soft CPU benchmarking comments.

Other comments. I really like the integration with the CoreConnect OPB bus. This should let you use your soft peripherals with both the coming embedded PowerPC hard core and the MicroBlaze soft processor core. As I wrote two years ago,

"I can't wait to see an on-chip bus standard (or standards) for FPGAs -- then we might finally see a marketplace of plug-and-play processors and peripherals cores."
I think that day is nearly here.

The MicroBlaze announcement is all about Virtex-II. I wonder if Xilinx will provide MicroBlaze for Virtex/E/Spartan-II?

Answer: yes: Michael Santarini, EE Times: Xilinx rolls soft core 32-bit MPU, eyes networking: "It can be implemented as a standalone processor in Spartan II or Virtex FPGAs".

Thursday, April 5, 2001
Xilinx Media Alert for Embedded Systems Conference:
"Processor solutions from Xilinx: "MicroBlaze -- world's fastest FPGA soft processor Running at 125 MHz, MicroBlaze beats the competition with twice the performance in half the size. Come learn more."

FPGA CPU News, Vol. 2, No. 4
Back issues: Vol. 2 (2001): Jan Feb Mar; Vol. 1 (2000): Apr Aug Sep Oct Nov Dec.
Opinions expressed herein are those of Jan Gray, President, Gray Research LLC.


Copyright © 2000-2002, Gray Research LLC. All rights reserved.
Last updated: May 14 2001