fpgacpu.org - FPGA CPU News of August 2001

FPGA CPU News of August 2001

Home

Sep >>
<< Jul

News Index
2002
  Jan Feb Mar
  Apr May Jun
  Jul Aug Sep
2001
  Jan Feb Mar
  Apr May Jun
  Jul Aug Sep
  Oct Nov Dec
2000
  Apr Aug Sep
  Oct Nov Dec

Links
Fpga-cpu List
Usenet Posts
Site News
Papers
Teaching
Resources
Glossary
Gray Research

GR CPUs

XSOC
  Launch Mail
  Circuit Cellar
  LICENSE
  README
  XSOC News
  XSOC Talk
  Issues
  xr16

XSOC 2.0
  XSOC2 Log

CNets
  CNets Log

Friday, August 31, 2001
.NET
I've been writing some C# code to load the metadata tables of .NET assemblies. This ground work will allow me to navigate to the IL (bytecodes) for each method.
.NET has reflection interfaces in the System.Reflection namespace. But for my purposes they are too high level -- you can't seem to obtain the raw IL for a method.
I could use the complementary unmanaged (COM) metadata interfaces -- these certainly do provide the rva (funky offset) to the IL -- but they don't seem to expose a typelib to easily import their types and interfaces into the .NET world. That is, these APIs are easy to use from C++ (and other COM client languages) outside the CLR, but I want to work entirely inside the CLR. That way my code will more easily move to hypothetical CLR platforms that don't have COM or Win32.
So I have been loading (deserializing) the dozens of different tables that constitute the serialized (physical representation) of the CLR metadata. Rather than writing a ton of special-purpose read table methods, I have put the extensible attribute facility to good use.
You can define new attribute classes, like
class Fixed : Attribute { Fixed(int width) { ... } ... }
and then attribute your data types:
class Foo { [Fixed(8)] string Name; ... }
and then recover the attributes at runtime using reflection:
// approximately correct foreach (FieldInfo field in typeof(Foo).GetFields(...)) { Attr[] attrs = field.GetCustomAttributes(typeof(Fixed)); if (attrs.Length > 0) { Fixed fixed = (Fixed)attrs[0]; byte[] bytes = binaryReader.ReadBytes(fixed.width); ... }
With attributes like [Fixed(n)] I simply write the class declarations for each row in each table, judiciously annotating fields with various custom attributes, and that creates enough type and attribute metadata to drive a reflection-driven deserializer.
By the way, .NET's C# and Managed C++, with these extensible type attributes, would seem to be a beautiful platform for netlist generators, as well as software-to-hardware synthesis tools, for it is so easy to annotate your declarations with attributes to control bit-widths, technology mappings, etc.
And using System.Reflection.Emit, you can hypothetically walk your hardware representation (graph), cons up the IL for a custom purpose simulation method, and the CLR will just-in-time-compile and run it for you.
Speaking of such tools ...
JHDL
Richard Goering, EE Times: Lab to offer open-source Java-based FPGA tool. The BYU Configurable Computing Lab's JHDL tools will be "available on an open-source basis within a few months". (Earlier this summer, Mike Butts published an independent reimplementation of the xr16 instruction set architecture, written in JHDL (announcement).)
Power
Peter Clarke, EE Times: Power called 'new frontier' of field-programmable logic. At FPL Stanford's Michael Flynn comments on FPL power issues.
At ISCA'00 last year, I attended a really great tutorial, Low Power Design: From Soup to Nuts, presented by Mary Jane Irwin and Vijaykrishnan Narayanan of Penn State. They made the point that there are opportunities to save power at every abstraction level in a digital system design.
Similarly, it seems to me that power reduction can be and must be targeted in the FPGA VLSI implementation, in the FPGA architecture, in the P&R tools, in the synthesis tools, and in the design methodology.
(For example, perhaps some programmable interconnect could be speed optimized, and other routing, power optimized. The (timing driving) P&R tools know which nets are time critical and could put those on the speed optimized routing channels -- and conversely move slack time nets to lower power (smaller drivers/no drivers) channels.)
Power issues can be rather subtle. Consider this comp.arch.fpga thread on high speed low power DES cores. Someone wrote "But it should be unquestionable that clocked CMOS devices draw more power than unclocked CMOS devices of the same technology" and I disagreed because in inadequately pipelined designs, the power (energy) saved not driving a high fanout high frequency clock net might still be squandered by high speed internal logic glitching.
Interesting tidbit from the EE Times article:
'Flynn questioned the conventional wisdom that it is inefficient to implement software-programmable processors on a hardware-programmable fabric. Papers on "soft" machines had appeared in IBM Corp.'s R&D journal back in 1984, he said.'
I'll try to get a reference to this paper.

Wednesday, August 22, 2001
.NET in hardware?
OK, so you've been drinking the .NET Kool-Aid too. You're thinking, you know, it's such fun to define a new instruction set, and it's so straightforward to build a stack machine processor that directly executes (the easy subset of the) JVM bytecodes -- whether or not that's a good idea -- and stack machines are such a good architectural match for modern programmable logic devices with embedded block RAMs -- and so (you're wondering) would it make sense to build a CPU that directly executes CLR IL instructions?
Not so fast. Read this nice short paper: K John Gough, Stacking them up: a Comparison of Virtual Machines (PostScript) (PDF):
"The underlying execution mechanism of .NET is an evaluation stack, and a set of instruction [sic] which manipulate this stack. To take the same example, the code required to take two local integer variables, add them, and deposit the result in a third local variable would be --

ldloc.1 ; push local variable 1
ldloc.2 ; push local variable 2
add ; add the two top elements
stloc.3 ; pop result into variable 3"

"These instructions are all generic, with the type of the "add" being determined from the inferred type of the stack contents. In this case the type will be known from the declared types of the local variables. ..."
"The most striking design difference [compared to JVM] is that the .NET machine designers seem to have been willing to surrender the option of interpretive execution. This choice is signalled by a number of details. Perhaps the first hint is the presence in the instruction set of generic instructions such as add with no specified data type. In order to interpret such an instruction it would be necessary for the interpreter to track the data type of the stop of stack element. This would appear to require more computation than the rest of the interpreter fetch execute cycle, thus extracting crippling performance penalties. ..."
"However, it would be possible to perform an offline pre-processing setp which converts to a different form which is more interpreter-friendly. Particularly if the conversion accepts responsibility for the type safety issues, the form could be trimmed of its symbolic information, and achieve much higher code density. This would be perfectly acceptable for real "embedded" systems which did not have to resolve the issues of dynamic loading of modules, and runtime type safety and security model checks. ..."
I'm not sure it's "crippling", but I agree that .NET IL was clearly designed to be just-in-time compiled to native code. Again, harking back to my old Java processors piece, it seems to me that a simple generic RISC, possibly enhanced by synchronization or write barrier support, plus a JIT compiler, would make a great embedded .NET engine -- showing superior absolute speed and superior price/performance (benchmarks-per-LUT).
Nevertheless do not be surprised when you read announcements of "Java processor retreads" that now also run some (easy) subset of IL in hardware. And perhaps two or three years from now, all those RISC processor designs that sport bonus Java interpreter instruction decoders (apparently chasing Java2 Micro Edition design-ins) may "suddenly" spring hardware assist for running the .NET Compact Framework platform.
Multithreading and garbage collection
Hardware support for CLR and JVM object system semantics, including generational garbage collection, is a promising idea. Indeed since the single "sure thing" trends in mainstream CPU architecture over the next decade are multithreading and chip multiprocessing, I'm particularly interested in non-pausing, multithreaded, concurrent collectors (where you don't stop the many mutator threads to scavenge live objects), where hardware assisted read barriers (or that ilk) may help to yield acceptable performance.
Another approach: one thing that's cool about the .NET CLR architecture, compared to most JVM implementations that I've seen, is the notion of an AppDomain -- a lightweight process-like abstraction that provides complete application isolation -- such that several applications may be executing side-by-side in the same process. (This may not seem very compelling for your average Windows application -- who cares if you can host a word processor and a spreadsheet in the same process -- but it is helpful to host hundreds of application sessions on an (application) server, whose underlying host operating system may not cope well with hundreds of host processes.)
I mention AppDomains because they are an easy way to apply the manifold threads (hardware contexts) we will have at our disposal circa 2006, even without a non-pausing garbage collector. Rather than have 16 physical threads running helter-skelter over an undifferentiated mass of VM threads, and stopping all those threads for each (pausing) generational GC, you can use AppDomains to partition the mutator threads (and their heaps) into cliques and pause one clique while letting the others run free.
I wonder if Microsoft takes advantage of this in their multithread/server-optimized CLR garbage collector.
Prior Work

First Workshop on Hardware Support for Objects and Microarchitectures for Java.

Second Annual Workshop on Hardware Support for Objects and Microarchitectures for Java.

(This year's (third) workshop was cancelled.)

Tuesday, August 21, 2001
Drinking the .NET Kool-Aid

I spent three intense twelve hour days last week getting my brain enlarged at Developmentor's totally awesome Conference.NET conference.
It was perhaps the best conference I've ever attended. Over and over, meaty technical sessions presented at a sublime level of detail and concreteness.
My interests were mostly low level, so I attended (some overlap) these classes: Inside the CLR Type System via C#; WSDL, Schemas, and SOAP; Asynchronous Programming; The HTTP Pipeline; Debugging and Diagnostics; System.XML; Context; Remoting; XML Schema; Hailstorm Overview; Interop; Transaction Management; Managed Execution Unmasked; .NET Reverse Engineering Techniques; Code Access Security; plus keynotes from David Chappell, Brian Harry (CLR), and Scott Guthrie (ASP.NET), and amazing emceeing by Don ("MsgWaitForMultipleObjectsEx -- kiss kiss kiss kiss!") Box.
Conference materials.
Recommended: TechNetCasts: Don Box, The .NET Platform -- How We Got Here, Where Are We Going? (streaming MP3):
"I have brought you into this room for a reason. I hold in my hand, two pills. A red pill and a blue pill. If you take the blue pill, you'll wake up tomorrow as if this conference never happened, you'll go back to your COM programming, it will be great, you'll be happy, life will be fantastic. If you take the red pill, you'll find exactly how deep the rabbit hole goes. ..."
"What happened to the people who didn't make the transition from physical memory to virtual memory? Charles Darwin happened to them. Natural selection took its course. They left the industry or they became games programmers. ...
"I have news for you people. Another flood is coming. The waters are rising and they're going to wash away the people who don't move with this platform shift."
"People that stand behind and say "I need my virtual memory and threads -- if you take it away from me I'm screwed," are going to get screwed, and it's going to be OK. Some of them will make the change later and become bitter; some of them will leave the industry; and some of them will just go write low level grungy plumbing and that's OK ... "
"There's a flood coming and its going to wash away people who don't make this change. ... if you talk to people who really look at the trends in this industry ... it feels ... we are moving a world that there are basically two places that code runs -- the JVM and the CLR. ..."
"Why? It's just too hard to get programs to work with virtual memory and threads ... We're moving to a world based upon types, objects, and values. And really the tradeoff is productivity versus control."
[transcribed by JG from the audio]
David Chappell, Succeeding with .NET (streaming MP3).
"Bottom line -- if you can't stand change, get out of the software business."
.NET is a sea change for mainstream (Windows) software development. So much of the .NET Framework is about raising the level of abstraction to improve programmer productivity; gone is the error-prone tedium of writing dual-interface COM objects; gone is the apartheid betwixt C++, VB, and ASP/script programmers; and, as with Java, gone are a great many tiresome burdens and categories of bugs. Unlike Java, .NET embraces many programming languages and (better) preserves legacy code and platform expertise alike.
Developers that take the time to kick the .NET tires are going to love it.
So I'm going to be spending a lot of time exploring the .NET Framework in the weeks ahead, and I may even start a new blog to record my travails.
You may be wondering, how is .NET relevant to FPGA CPUs? Among other things, the .NET CLR looks like a great platform to build multi-language cross-compilers for new instruction set architectures.
Yesterday, for curiosity's sake, I rebuilt the lcc-xr16 compiler with the beta 2 Visual C++ .NET compiler, specifying the -clr flag -- and now it runs nicely in managed code under the Common Language Runtime. (Try that with a JVM!)
More FPGA coprocessors with SDRAM DIMM interfaces
Earlier I wrote about the Nuron AcB (Adaptable Computing Board).
Here are two university projects in the same vein, the latter of which is discussed in this thread in comp.arch.fpga.

P.H.W. Leong, et al, Chinese Univ. Hong Kong, "Pilchard - A Reconfigurable Computing Platform with Memory Slot Interface", (Postscript), presented at the 2001 IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM).
Write combining writes to FPGA at over 400 MB/s, reads at 64 MB/s (both excellent results), but no word on access latencies -- how long would a write and read-back take (number of instruction issue slots)?

Penn State Univ. Center for Electronic Design, Communications, & Computing: SmartDIMM.

Xilinx Terabit Networking Forum
Today I attended the local video downlink of this forum (thank you, Avnet, for your downlink hospitality).
I didn't find it quite so riveting as February's XtremeDSP Simulcast, perhaps because it seemed slightly less technical than the latter, and perhaps because there was no match for the Erich Goetting Virtex-II/Pro disclosures.
But, knowing little of network device interface standards, and curious about the myriad acronyms in the Xilinx Virtex-II literature, I did value today's session for providing background and context. In fact, I learned a ton of new concepts and factoids -- and so now I can drop router interface buzzwords at parties.
This forum might as well have been called the SelectIO Forum, because programmable I/O was the key enabler of everything we saw today.
Thanks to this forum, I now understand some of the challenges in building 10 and 40 Gb/s routers and how FPGAs may fit in. I now have some context -- the what, where, and why of interfaces with names and acronyms like HyperTransport, POS-PHY L4/L5, UTOPIA, Flexbus 3/4, SPI-4 P1, SPI-4 P2, SPI-5, SFI-4, SFI-5, CSIX, and so forth. What interfaces are HSTL or LVDS, parallel or serial, how wide, where the clock is, and so forth. Good stuff.
There was good background on the Mindspeed Skyrail 3.125 Gb/s links coming in Virtex-II Pro. And I learned little bits of trivia, like there is both a 10 Gb ethernet (LAN) and a 10 Gb ethernet (WAN) that run at different data rates.
Looking back at the big menagerie of interfaces, some interfaces were clearly solving distinct problems (say HyperTransport and POS PHY L4/L5), but I honestly didn't understand why other interfaces (like CSIX) could not be built atop the latter. And I guess I completely missed RapidIO's raison d'etre...
If you missed this forum, look for it as a webcast at xilinx.com in the days to come. (By the way, it is ironic that there was no live webcast for a high speed networking forum.)
Xilinx open kimono
Mark Aaldering of Xilinx gave an interesting talk on some of the very impressive high speed interface cores that Xilinx has built, which include a 10 Gb/s ethernet MAC, CSIX, PL4, and PCI-X. (Incidentally, the PL4 interface uses the Virtex-II clean clock muxing feature.)
Aaldering said that in addition to the forthcoming Virtex-II Pro multigigabit (MGT) serial links at 3.125 Gb/s, you can expect two next-generation offerings with 6.25 Gb/s and later 10 Gb/s. (In an earlier session, Rich Sevcik said approximately that "in the next year or two we're going to be moving that to 10 gig technology".)
I also learned that you will be able to bond together 4 of the new MGT links to build a 4-bit wide 10 Gb/s link.
In a flashback to Goetting's February talk, Aaldering showed a slide with (apparently) that same huge FPGA with 4 embedded PowerPC 405s and apparently some two dozen 3.125 Gb/s serial links -- which Aaldering said would be coming (I assume that means would be announced) at the end of the year.
Back then, I wrote:
"Whether this diagram depicts a hypothetical planned device, a trial balloon, a clever misdirection, or something else, does not matter. It is clear that this shows some flavor of the shape of things to come. It's time to start thinking imaginatively about how to best use such a monster -- not to mention a rack full of them."
No misdirection, rather truly the shape of things to come. But it still boggles the mind.

Wednesday, August 15, 2001
Xilinx: Xilinx Ships Complete 10 Gigabit Ethernet Solution.
Anthony Cataldo, EE Times: Xilinx ships 10-Gbit core for Ethernet.
'For box-to-box connections, Xilinx intends to support Infiniband using a special device that embeds a PowerPC in an FPGA fabric, which the company has been jointly developing with IBM Corp. for two years. "Infiniband is a big one," Hedayati said. "You need to have 3.125 serial I/Os and it requires such a huge software approach that you need a processor on-chip. We're working on a custom job where the processor is sitting in the center to get the maximum performance. This is not a standard core cell. The processor is getting immersed in the FPGA fabric.'
[updated 08/21/01] Murray Disman, ChipCenter: Xilinx Ships 10Gb Ethernet Core.

Saturday, August 11, 2001
On FPGAs as PC Coprocessors, redux
On comp.arch.fpga, Jason Morris asked:
"What is the best reconfigurable PCI processor board to use as a computation accelerator?"
and Dave Feustel replied:
"You will find some relevant observations about this at http://www.fpgacpu.org. The real short version is that the PCI bus interface is so slow relative to the speeds of the cpu and accelerator that it usually isn't worth adding an accelerator if the interface is via the PCI bus. But if the accelerator and the cpu chip are directly connected (possibly by the AMD HyperTransport bus) things improve considerably"
See FPGAs as coprocessors (1996):
"So as long as FPGAs are attached on relatively glacially slow I/O buses -- including 32-bit 33 MHz PCI -- it seems unlikely they will be of much use in general purpose PC processor acceleration. Sure, for applications such as cryptography, image and signal processing, they might be a win (***), given a semi-autonomous problem which either fits in the FPGA and local storage, or which can employ DMA to stream data into or through the FPGA without much CPU intervention or management."
Plug-in FPGA coprocessors
While I felt PCI's write+read latency was so slow as to rule out many interesting coprocessor applications, there are other interfaces in a PC that provide an opportunity to plug in some programmable logic (and with less latency than PCI). These interfaces include:

the processor package pins themselves

the north bridge -- a custom north bridge could include programmable logic or fast interface to same -- more on this below

the built-in glueless multiprocessor interface -- make a PLD that looks like a peer MPU and intervenes in snoopy cache transactions

the old and obsolete external async SRAM L2 cache -- poor: often eight x8 DIPs

the newer (and still obsoleted) 160-pin COAST (cache on a stick) sync SRAM L2 cache module (circa 1997/98?) -- good: high speed, synchronous, slot + daughter card

DRAM and now SDRAM main memory SIMMs/DIMMs

the AGP socket -- perhaps less latency, but just as "relatively glacially slow" as PCI (if you take into account the last five years' CPU and FPGA speed improvements)

I used to favor the COAST module approach. Today, the SDRAM (DDR SDRAM, etc.) DIMM socket makes a ubiquitous, low latency interface. It should be possible to overlay memory mapped programmable logic control registers on a traditional SDRAM module.
There is a company, Nuron, that is (was?) doing something like that. Their web site used to describe the Nuron AcB, an SDRAM DIMM module with an integrated 600K or 1M gate FPGA. Applications would include encryption, compression, etc. Based upon their literature, Nuron held my regard as one of the most promising reconfigurable computing accelerator architectures.
(One challenge with this strategy is you are "strapped onto a treadmill", chasing rapidly changing, increasingly complex interfaces that evolve without any concern for your accelerator plug-in. While a COAST inteface seemed perfect circa 1997 or so, COAST is dead and forgotten, as would be an EDO DRAM SIMM or DIMM approach. And back in 1996 I thought Xilinx should have built a glueless interface to the Pentium or Pentium Pro for the XC6200. I wrote: "I expect the PPro external bus to be just as ubiquitous and as long lived as have been the 486 and Pentium buses." How could I have been more wrong! Soon Intel brought out Slot-1, and later, switched back to PGA sockets. Similarly, SDRAM DIMM is obsoleted by new standards -- DDR-SDRAM DIMMs and D-RDRAMs.)
Ubiquitous, built-in FPGA coprocessors?
Hybrid devices, with fixed logic alongside programmable logic, or one embedded within the other, are an emerging trend. Indeed, now that both ASIC houses and PLD vendors alike are discussing licensing of programmable logic fabrics for use in ASICs, perhaps the time has finally come to embed $1 of programmable logic directly in the processor or the north bridge. In theory, it might be a good use of an increasingly generous transistor budget, and could give a leg up on competitors' processors / chip sets. But in these two very cost sensitive, high volume markets, I doubt we'll see embedded programmable logic until it can be shown that doing so would dramatically speed up some-or-most end-user's computing tasks to the extent (as with 3-D acceleration) that they would be willing to pay more for the more-hardware-accelerated product.
If most users spent their days waiting on Photoshop filters, speech recognition, or various kinds of encryption, perhaps we'd already be using FPGA coprocessors. But most users spend their time at the computer staring at an idling Internet Explorer, Outlook, or Word, where the local delays, if any, are cache and page misses rather than signal processing.
Perhaps it's a chicken-and-egg problem. Perhaps we are not seeing computationally demanding applications (that go beyond what a general purpose CPU can do) because there are no programmable hardware accelerators available to make such applications feasible. So another stumbling block in forging a volume reconfigurable computing coprocessor industry is software interface standards.
Ideally you would see Microsoft put forward a hypothetical, evolving, "DirectCoprocessor" or "DirectDSP" interface, reminiscent of Direct3D, that would provide APIs for common, computationally expensive, stream- and signal-processing computing tasks, such as encryption, compression, searching, sorting, correlating, voice recognition, image processing, etc. This would bridge the chasm between applications and the hardware application accelerators, and help fertilize an economically viable and competitive marketplace for such products. But before Microsoft would recognize a need for such an abstraction layer, the basic approach would probably have to proven in a number of successful domain-specific products.
Perhaps if ubiquitous speech recognition, or vision, or high speed encryption, or wireless, emerges as a computing bottleneck, there will emerge such domain-specific products where a programmable logic approach is shown to be superior (esp. time-to-market and field-upgradability) to a cheaper, fixed-function approach. Don't hold your breath.
New interconnect standards?
Dave Feustel mentioned HyperTransport. There we have a standard high bandwidth electrical interface that can be directly connected to modern FPGAs. Some questions for those of you more in the know --

Is HyperTransport low latency (vs PCI or PCI-X)?

Will commodity PCs or servers ever sport HyperTransport slots? (I suspect not.)

PCI-X defines an add-in card physical interface, right? What is best case total latency for a read and a write to a PCI-X add-in card?

What about 3GIO aka Arapahoe? Will this support expansion cards (or just be a point-to-point inter-IC interconnect)? (If so, when? 2003?)

Friday, August 10, 2001
Putting all those dual-ported block RAMs to good use dept.
Craig Matsumoto, EE Times: IP Semi crafts network processor for FPGA. "SpeedRouter appears to be the first FPGA-based network processor, but as such, it may require more user programming than most other off-the-shelf NPUs."
IP Semiconductors' SPEEDRouter V1.1 Product Specification describes a "programmable lookup and classification engine". There is no evidence of any traditional general purpose (programmable) processor(s) under the hood. Instead, it appears that router features are programmed as Verilog modules with "well defined" datapath interfaces that are then compiled together with the SPEEDRouter framework to achieve a packet processor with that set of features. (Please let me know if I have misunderstood or misstated the SPEEDRouter approach.)
(By the way, you could use the same approach for adding customizable processing stages in the midst of other deeply-pipelined FIFO datapath applications. For instance, the rendering pipelines of modern 3-D graphics accelerators are some dozens of stages deep -- and so one can certainly imagine a hybrid FPGA/ASIC device with embedded programmable logic for implementing features such as programmable shaders. See section 9.6 of Michael McCool's paper, SMASH: A Next-Generation API for Programmable Graphics Accelerators. Hmm. This (programmable shader in programmable logic) would actually be closer to dynamic reconfigurable computing, e.g. changing the circuit every few thousand (or million) cycles or so. Similarly, I wonder when (if) we'll see an FPGA-based packet processor using partial reconfiguration to customize the datapath according to changing routing requirements.)
Stated size/speed (V600E-8): 4973 CLB slices + 32 BRAM, 80 MHz. (About 72% of slices and 44% of BRAMs of the 6912 slice + 72 BRAM XCV600E.)
Here are some articles on network processors from the Linley Group. See also our earlier coverage of FPGA network processors.

Monday, August 6, 2001
Another reconfigurable computing success story
Brian Von Herzen, President of Rapid Prototypes, Inc., writes:
"I am happy to report on a new reconfigurable design we have just completed that made the cover of Electronic Design Magazine, June 18."
"A single Xilinx Virtex E-8 device serves as a 250 MHz logic analyzer with 108 channels that can capture continuously for 256 million samples. Whenever a knob is turned on the logic analyzer, the Virtex device reconfigures itself for the new pattern. This approach utilizes the partial reconfiguration capability of the FPGA to provide dynamic on-the-fly reprogramming every time a search pattern is changed. The net result is an economical and high performance design that leverages the unique capabilities of a reconfigurable FPGA. Without the fine control over the partial reconfiguration available in the Virtex-E hardware and applications notes, such a design would not be possible in a single device. New features in the Xilinx design software should facilitate reconfigurable designs in the future."
Dave Bursky, Electronic Design: Single Instrument Analyzes Protocols And Logic:
"When the LVDS data enters the analyzer, it first encounters a 36-term pattern-recognition array. These terms are used to drive the real-time filter, the 12-by 12-level trigger sequencer, the six real-time statistic counters, and the capture engine that determines which data samples are stored in the buffer memory. Support for 36 terms by 108 channels is another benefit to partial reconfiguration of the FPGA. As the user defines the patterns in software, the hardware for recognizing those patterns is generated in its simplest form and downloaded into the FPGA on-the-fly."
Data Transit Corp.'s Bus Doctor.

Friday, August 3, 2001
Bits and bites.
Here are the accepted papers for the upcoming FPL 2001 conference at Queen's University Belfast, Aug 27-29, 2001. It looks like quite a good program.
David Conroy's PDP-8/X site is worth another look. He has now put up the source code to the processor and IOU, and his design uses a C++ netlist generator class library very similar to CNets. Interesting reading.
What became of Metaflow's LeonCenter.com?
And what became of T7L? If I recall correctly, they were perhaps the first to make a commercial venture of FPGA CPU intellectual property. They sold a family of scalable RISC CPUs and tools. Some old links: press release; IEEE announcement.
And here is an old slide deck, Roll Your Own RISC, from a 1998 talk at Embedded Systems Conference, by Tom Cantrell (Circuit Cellar) and Philip Freidin (Fliptronics). (I was there. The XSOC project was born at that conference.)

Wednesday, August 1, 2001
More catch up.
Xilinx continues to add new and interesting (so-called) techXclusives. For instance, here is Ken Chapman on composing Virtex-II multipliers.
[06/25/01] Peter Clarke, EE Times: Xilinx, ASIC vendors talk licensing.
System ACE
[05/12/01] Xilinx announces System ACE. Ho hum. I'd rather see a built-in, glueless interface to low-cost bytewide flash ROMs for cost-sensitive Spartan-II-type designs, like the master parallel mode that XC4000s have.
Murray Disman, ChipCenter: Xilinx Announces System ACE Technology.
QuickMIPS
[06/18/01] QuickLogic announces QuickMIPS.
Murray Disman, ChipCenter: Quicklogic Announces QuickMIPS Architecture.
Chris Evans-Pughe, Electronics Weekly: PLDs Get to the Core of Matters -- QuickMIPS contains hardwired 32-bit MIPS core:
"Based around MIPS Technologies Inc.'s 4Kc core, QuickMIPS will have blocks such as Ethernet MAC, PCI, memory controller, FireWire and USB embedded on-chip. Two 32-bit AMBA buses will give high- or low-speed access to and from the CPU and programmable logic. Master-and-slave bus interconnects will ease the addition of AMBA-bus compliant intellectual property (IP) in the programmable logic."

FPGA CPU News, Vol. 2, No. 8
Back issues: Vol. 2 (2001): Jan Feb Mar Apr May Jun Jul; Vol. 1 (2000): Apr Aug Sep Oct Nov Dec.
Opinions expressed herein are those of Jan Gray, President, Gray Research LLC.