Home
May >>
<< Mar
News Index
2002
Jan Feb Mar
Apr May Jun
Jul Aug Sep
2001
Jan Feb Mar
Apr May Jun
Jul Aug Sep
Oct Nov Dec
2000
Apr Aug Sep
Oct Nov Dec
Links
Fpga-cpu List
Usenet Posts
Site News
Papers
Teaching
Resources
Glossary
Gray Research
GR CPUs
XSOC
Launch Mail
Circuit Cellar
LICENSE
README
XSOC News
XSOC Talk
Issues
xr16
XSOC 2.0
XSOC2 Log
CNets
CNets Log
|
|
|
|
FPGA-FAQ.com now has the indexed
comp.arch.fpga archives online.
These archives are a treasure trove of history, lore, and prior art.
I wrote the Python code that formatted and indexed Philip Freidin's
raw archives of almost seven years of comp.arch.fpga messages,
augmented by those archives
kept by Markus Wannemacher. I hope you like it.
The most prolific contributor (by far) is
Ray Andraka
of the Andraka Consulting Group,
whose 1300+ articles total 95,000 lines / 560,000 words / 4 MB of text!
Now that Google Groups beta has Usenet archives back to 1995, more or less,
you can also search Google's
comp.arch.fpga archives.
|
|
FCCM is next week. Fun plus.
This year, I don't think I'll be there, alas.
Genuinely interesting
Proceler's application specific
soft processors.
Articles about Proceler.
Richard Goering, EE Times:
C design goes 'soft'.
Inner loop datapaths (1997):
"In theory, you could compile your dusty deck C, C++,
Java, FORTRAN, Scheme, etc. and run it immediately
on your FPGA CPU. Then automatically (profile driven)
or through explicit directives, you can compile the inner
loops to a custom datapath. This can either be manifest
as an on-chip command oriented coprocessor, or in some
cases as new instructions. The latter has the potential
advantage of very high custom operation issue rates
(today, 66 MHz) and access to processor register
file, etc."
So perhaps "theory" is becoming "practice".
Altera Apex II
Altera press release:
Altera Breaks Into High-Performance System Datapath with Introduction of APEX II Family.
Data sheet.
0.15 micron, 8LM, all copper, 1.5V supply voltage. Lots of high speed I/O, lots and lots of LUTs;
at 89,280 LUTs (and 1.5 Mb of ESB RAM), the high-end EP2A90 should be comparable
in capacity to the announced 93,184 LUT (and 3 Mb block RAM) XC2V8000.
LogicLock (from data sheet):
"The LogicLock feature allows the designer to make pin and
timing assignments, verify functionality and performance, and then set
constraints to lock down the placement and performance of a specific
block of logic using LogicLock constraints."
If it is anything like Xilinx's RLOCs it should prove invaluable
to Altera expert designers, and should help Altera boost the
performance of their soft IP cores -- such as Nios.
(By the way, it is important that such placement constraints can be
schematic and HDL source code annotations -- it is rather less useful
if the constraints are applied to post-synthesis entities. Is that
the case with LogicLock?)
Noteworthy: no announcements of APEX II embedded hard processor cores,
nor of embedded hard multipliers, nor of something comparable to the XCITE
controlled impedance technology.
Dog gates
I enjoyed this bit from the
APEX II Questions and Answers:
"Q. Why is the naming convention for APEX II devices different than
other Altera LUT-based PLD families"
"A. The APEX II nomenclature is based on logic elements (LEs)
rather than system gates. As PLDs become more and more complex, it
becomes increasingly difficult to represent logic density, features,
and memory using a single unit of measure. An LE-based nomenclature
more accurately communicates the logic capacity of programmable logic
devices, instead of using gate counts that do not follow a defined, PLD
industry standard. Disproportionate growth in embedded memory and logic
elements leads to unbalanced weighting in gate enumeration and ultimately
to misleading density representations. An LE-based nomenclature will
facilitate better device selection and avoid confusion that could result
from using a gate-based nomenclature." (emphasis added)
I agree. Good for Altera.
See also marketing gates redux.
Even the XC40150XV has more LUTs than the XC2V1000,
and the XCV1000 offers more LUTs than the XC2V2000.
Let's "size" devices by their LUT counts (and optionally
by the number and size of embedded RAM blocks).
[updated 04/25/01:]
Altera press release: Out with Gate Counts, In with Logic Elements.
Brown and Rose, Univ. of Toronto: Architecture of FPGAs and CPLDs: A Tutorial:
"Gate count is an especially contentious issue in the FPGA industry,
and so the numbers given in this paper for all manufacturers should not
be taken too seri-ously. Wags have taken to calling them 'dog' gates,
in reference to the traditional ratio between human and dog years."
|
|
I've really enjoyed reading some of the class handouts from Prof.
André DeHon's three term course on computer architecture at CalTech:
CS/EE 184abc.
These pages provide a fresh perspective on computer architecture,
including new insights on computing with programmable logic structures.
Highly recommended.
Xilinx sees the light
Anthony Cataldo, EE Times:
Altera, Xilinx prep high-end PLDs as revenues dive:
"Xilinx, for its part, recently announced plans to field its own
embedded 32-bit soft core, called Microblaze. The company is touting
its comparatively high 125-MHz operation and low gate count. A designer
could conceivably put up to 100 of the cores on one FPGA, Xilinx said."
(emphasis added)
Let's see.
At a stated 800-900 LUTs per processor ("comparatively low gate count"
compared with Nios, perhaps, but still 4-5X larger than the
16-bit gr1040 and 3X larger than the 32-bit gr1050)
and assuming no structure sharing, 100 processors would require
80,000-90,000 LUTs.
That would seem to rule out the XCV3200E (64,896 LUTs) and the XC2V6000 (67,584 LUTs).
Perhaps it refers to the planned XC2V8000 or XC2V10000 (122,880 LUTs).
But with "only" 192 block RAMs per XC2V10000, should we infer that
each Microblaze can operate on one block RAM?
Xilinx: what device can you host 100 Microblazes in?
See also Multiprocessors and
hard CPU cores do not moot soft cores and
500 soft CPUs per chip?.
I know from my experiences implementing sixty 180-LUT (8x6 CLB)
gr1040s in one XCV600E that there's a big difference between
mere feasibility (dividing {48x72 CLBs + 72 block RAMs} by {8x6 CLBs + 1 block RAM} => up to 72 CPUs/chip)
and an implementation in hand (getting the tools to behave, and
developing a compact interprocessor interconnect with a reasonable
programming model).
'"We think we've reached the point where we'll get these soft processors
into the mainstream," said Richard Sevcik, senior vice president of
software and cores for Xilinx.'
In my opinion, it was Bryan Hoyer's Altera Nios program that finally brought
soft FPGA CPU cores into the "mainstream".
|
|
The GCC imperative
As I've said before, lcc compiler support is nice and fairly easy to do,
but the "key to the treasure chest" of open source software
(C/C++ runtime libraries, TCP stacks, Linux, etc.) is proper GCC support.
Take this announcement from Altera -- entirely predicated on the Nios GCC port.
Altera: Altera Taps into Linux and Ethernet Markets With Nios Embedded Processor Core:
"Nios Linux Development Kit ..."
"A complete Linux development environment including, uCLinux source code,
SDRAM controller and memory module, Ethernet expansion board, host
adapter board, hardware reference design, and web server application
software provides a robust Linux development suite to enable rapid
system development. ..."
"The Nios Linux Development Kit will cost $2,495 and is scheduled to ship in July 2001."
This press release had another interesting tid-bit: "Since its introduction
in September of 2000, more than 1000 Nios Development Kits have been sold
to SOPC developers."
Well, since its introduction in March of 2000, well over 1500 copies
of the XSOC/xr16 Kit have been downloaded by visitors such as yourselves.
So for the time being, we can amuse ourselves with the tag line
XSOC -- Perhaps the World's Most Popular FPGA SoC Kit :-)
|
|
Fun with Virtex Delay-Locked Loops
My XCV600E prototyping board has a 50 MHz oscillator.
I need 80-100 MHz. What to do?
Remove the fixed oscillator and solder in an oscillator socket?
Naah.
First I instantiated a CLKDLL and fed its CLK2X output through
a BUFG into my clk network. Result -- design runs at 100 MHz.
Great.
Next, for 80 MHz, I instantiated three CLKDLLs:
- to multiply by 2
- to multiply that by 2
- to divide that by 2.5
This makes 100 MHz from 50 MHz, 200 MHz from 100 MHz, and 80 MHz from 200 MHz.
Here's the code I used, in case you are curious.
(No guarantees that this is the right and proper way to do this.)
wire clkin_, clk1x, clk2x, lock2x, clk2x2, clk4x, lock4x,
clk4x2, clkdiv, clk;
IBUFG clkin__(.I(clkin), .O(clkin_));
CLKDLL dll2x(.CLKIN(clkin_), .CLKFB(clk1x), .RST(1'b0),
.CLK0(clk1x), .CLK90(), .CLK180(), .CLK270(),
.CLK2X(clk2x), .CLKDV(), .LOCKED(lock2x))
/* synthesis xc_props="LOC=DLL2S" */;
SRL16 lock2xq_(.D(lock2x), .CLK(clk2x), .Q(lock2xq),
.A3(1'b1), .A2(1'b1), .A1(1'b1), .A0(1'b1));
wire rst4x = ~lock2xq;
CLKDLL dll4x(.CLKIN(clk2x), .CLKFB(clk2x2), .RST(rst4x),
.CLK0(clk2x2), .CLK90(), .CLK180(), .CLK270(),
.CLK2X(clk4x), .CLKDV(), .LOCKED(lock4x))
/* synthesis xc_props="LOC=DLL2P" */;
SRL16 lock4xq_(.D(lock4x), .CLK(clk4x), .Q(lock4xq),
.A3(1'b1), .A2(1'b1), .A1(1'b1), .A0(1'b1));
wire rstdiv = ~lock4xq;
CLKDLL dlldiv(.CLKIN(clk4x), .CLKFB(clk4x2), .RST(rstdiv),
.CLK0(clk4x2), .CLK90(), .CLK180(), .CLK270(),
.CLK2X(), .CLKDV(clkdiv), .LOCKED())
/* synthesis xc_props="LOC=DLL3P,CLKDV_DIVIDE=2.5" */;
BUFG clkbuf(.I(clkdiv), .O(clk));
Note the two SRL16 instances. These 16-tap shift registers seem
to be necessary to hold the downstream CLKDLLs in reset until
each upstream CLKDLL locks.
XAPP132: Using the Virtex Delay-Locked Loop:
"In order to achieve lock, the DLL may need to sample several thousand
clock cycles. After the DLL achieves lock the LOCKED signal activates. ...
When using this circuit it is vital to use the SRL16 cell to reset
the second DLL after the initial chip reset. If this is not done,
the second DLL may not recognize the change of frequencies from when
the input changes from a 1x (25/75) waveform to a 2x (50/50) waveform.
"What's an SRL16?", you may be wondering.
See Ken Chapman, Xilinx: The SRL16E:
Part 1.
Part 2.
Part 3.
|
|
Congratulations to Xilinx on their MicroBlaze announcement.
50 dhrystones is an excellent result for a soft core.
Xilinx Announces MicroBlaze: World's Fastest FPGA Soft Processor.
Architecture.
Overview.
The MicroBlaze Soft Processor versus the Competition.
"The MicroBlaze processor needs only half as many Luts to deliver twice the performance as proven with the industry standard Dhrystone 2.1 Benchmark."
Catchy slogan.
Let us hope this is the start of a trend to evaluate FPGA soft cores
using both performance and performance/logic cell figures of merit. See also my FPGA soft CPU
benchmarking
comments.
Other comments. I really like the integration with the CoreConnect OPB bus.
This should let you use your soft peripherals with both the coming
embedded PowerPC hard core and the MicroBlaze soft processor core.
As I wrote two years ago,
"I can't wait to see an on-chip bus standard (or standards) for FPGAs -- then
we might finally see a marketplace of plug-and-play processors and
peripherals cores."
I think that day is nearly here.
The MicroBlaze announcement is all about Virtex-II. I wonder if Xilinx will provide MicroBlaze for Virtex/E/Spartan-II?
Answer: yes:
Michael Santarini, EE Times:
Xilinx rolls soft core 32-bit MPU, eyes networking: "It can be implemented as a standalone processor in Spartan II or Virtex FPGAs".
|
|
Xilinx Media Alert for Embedded Systems Conference:
"Processor solutions from Xilinx: "MicroBlaze -- world's fastest FPGA soft processor
Running at 125 MHz, MicroBlaze beats the competition with twice the performance in half the size.
Come learn more."
|
FPGA CPU News, Vol. 2, No. 4
Back issues: Vol. 2 (2001): Jan Feb Mar; Vol. 1 (2000): Apr Aug Sep Oct Nov Dec.
Opinions expressed herein are those of Jan Gray, President, Gray Research LLC.
|