Silicon Graphics, Inc.
Origin Technology

From Busses to Modular, 
Distributed Crossbars

Not All Shared-memory
Systems Are Equal



John R. Mashey,
Director, Systems Technology

Many people get confused about the different kinds of computer systems architectures available that are all labeled shared-memory systems, and therefore considered the same. A good analog is to believe that since they all have wheels; cars, trucks, and airplanes are equivalent!

This page is a brief tutorial on popular kinds of shared-memory systems.


Figure 1 shows some common system architectures used to build (mostly) Shared-memory Multiprocessors (SMPs).

Shared-bus Systems
The most familiar design is the "fixed bus" or shared-bus multiprocessor system. The bus is a path, shared by all processors, but usable only by one at a time to handle transfers from CPU to/from memory. This is a long-established design in the minicomputer and microprocessor business, and they are built by many vendors as servers, and sometimes workstations or PCs. Typical systems have backplane busses 256-bits wide, and no more than a few feet long, which limits the number of CPUs that can be supported. By communicating on the bus, all CPUs see all memory requests, and can synchronize their local cache memories: this is called cache-coherency.

Such systems include the Silicon Graphics Challenge/Onyx systems, OCTANE desktop, Sun's UltraEnterprise (300-6000), Digital's 8400, and many others - most server vendors offer such systems.

Switch-based SMP / Central Crossbar
Mainframes and supercomputers have often used a crossbar "switch" to build SMP systems with higher bandwidth than feasible with busses, where the switch supports multiple concurrent paths to be active at once. This normally incurs substantially higher cost, as the crossbar is constructed in one (or a few) large units, and a large crossbar contains many large chips with many pins. Such systems include most mainframes, the CRAY T90, and Sun's new UltraEnterprise 10000.

Shared-bus and central crossbar systems are usually called UMAs, or Uniform Memory Access systems, that is, any CPU is equally distant in time from all memories. It is sad, but true, that common usage in the computer industry uses the same UMA acronym to mean Unified memory Architecture for desktop machines that contain one memory for all uses, like SGI's O2. (This terminology clash is not Silicon Graphics' fault!)

Clusters
Many vendors connect bus-based SMPs into larger groups called clusters, using network interconnects of various kinds, generally with lower bandwidth and higher latency than the SMP bus.

MPP - Massive Parallel Processing
Many vendors have built systems that could handle hundreds of processors, where each processor (or small group of processors) had its own separate memory system. The processors might communicate via explicit message-passing, or they might share a global address space covering all of the memories, but seldom support cache-coherency, requiring software synchronization instead. Well-known systems include: the Thinking Machines CM-5, IBM's SP/2, Cray Research's T3D and T3E, but many have been built by now-defunct companies.

CC-NUMA - Cache-Coherent Non-Uniform Memory Access Recently, vendors have begun shipping new systems that are constructed by connecting SMP nodes into systems that can be scaled larger than bus-based SMPs (as MPPs could be scaled). The nodes are connected by an interconnect, whose speed and nature varies widely. Normally, the memory "near" a CPU can be accessed faster than memories that are "further away". This attribute leads to the "Non" in Non-Uniform. Experience shows that a modest amount of non-uniformity works fine, whereas a large ratio of remote to local access time makes programmers switch to message-passing, rather than using shared-memory. CC-NUMA systems include the Convex Exemplar, Sequent NUMA-Q, Silicon Graphics/CRAY S2MP (Origin and Onyx2). The Convex and Sequent systems use a "ring" interconnect, i.e., each SMP plugs into a ring that requires a message and reply to travel entirely around the ring. The S2MP systems are rather different, as they use crossbar switches for high bandwidth and low latency. Unlike central crossbars, these are modular, distributed crossbars, so the systems can start small and be scaled up by buying more crossbars in an incremental fashion. For various reasons, we expect more systems to evolve towards CC-NUMA, and the rest of this explains why and how.

Figure 2 (20K) shows the crucial technology trend: storage (DRAM and disk) increases by 4X every 3 years. Starting in 4Q94, two vendors (Digital and Silicon Graphics) both started shipping systems that 1) used 64-bit CPUs, 2) had 64-bit or 64/32-bit operating systems and 3) might actually be purchased with 4 GB or more of physical memory - a real reason to want 64-bits. This continued growth puts pressure on system design, as one would like bandwidth to grow in rough proportion to storage capacity. Be warned: these charts use a logarithmic scale on the vertical axis.

Unfortunately, as shown in Figure 3 (24K), bandwidths have not been tracking the 4X/3 growth rate of storage capacity.

For years, shared-bus speed was increasing in parallel with storage, until 1993, with the Challenge/Onyx systems offering 1.2 GB/s. Unfortunately, since then, classic bus-based SMPs have improved only at a 2X/3 years rate, as seen by Digital's 8400 (1.6 GB/s) in 1995, and Sun's UltraEnterprise 6000 (2.5 GB/s) in mid-1996. For various electrical-engineering reasons, it is getting more difficult to build economical busses that go very much faster.

Also, the speed of individual I/O channels is low, and increasing fairly slowly, as shown in Figure 4 (22K). Most desktops support one PCI bus (100 MB/s or .1 GB/s), which unfortunately is no faster than the Silicon Graphics original Indigo (GIO32 bus). The double-speed PCI64 (200 MB/s or .2 GB/s) is similar to the GIO64 bus used in Indigo2 and Indy systems, and is useful as a commodity, industry-standard bus, but is not a step up.

While CPUs get upgraded often, the investment in I/O bus and boards is so much higher that people change them far less often. Thus, it is a good idea to change I/O busses less often, but with higher improvements, and the new Xtalk (or XIO) connection used in S2MP and OCTANE is 1.2 GB/s, a 5X jump over GIO64 and PCI64. For context, that means that a single high-speed I/O connection could consume an entire Challenge or Onyx bus. Something better is needed.

One could build a central crossbar system to get more bandwidth, but this gets expensive quickly. For example, a Sun Ultra Enterprise 10000 has a list price of about US$875,000 for a 16-CPU (4 US$125,000 boards, each with 4 CPUs) system. Simple arithmetic shows that a possible minimal entry system would cost US$500,000 for 4 CPus (US$875K - 3*US$125K). Thus, like bus-based SMPs, the entry system must pay for the bandwidth necessary for the largest system, and the bandwidth must generally be paid for at the beginning.

In Figure 5 (23K), we see another approach, as taken by S2MP systems. Both I/O bandwidth and interconnection bandwidth can be purchased incrementally, so that the bandwidths can start low, and be scaled high. This required a great deal of difficult engineering to make this work and still be economical!

While the details are discussed elsewhere, this is a quick high-level view of ways in which people build scalable computer systems. For many reasons, as described in the references, we expect to see more CC-NUMA systems over the next few years.

Not all are equal!

References

1. Daniel E. Lenoski,
Scalable Shared-Memory Multiprocessing and the
Silicon Graphics S2MP Architecture,
Distinguished Lecture Series XIV, 1996.,
University Video Communications,
P. O. Box 5129,
Stanford, CA 94309 USA 408-379-0100,
URL: http://www.uvc.com/

2. D. Lenoski and W-D Weber,
Scalable Shared-Memory Multiprocessing,
Morgan Kaufman Publishers, San Francisco, 1995, ISBN 1-55860-315-8.,

John L Hennessy & David A Patterson,
Computer Architecture: A Quantitative Approach, Second Edition,
Morgan Kaufmann Publishers, San Francisco, CA, 1996,
ISBN 1-55860-329-8.,


Please send us your feedback on this article.

SOLUTIONS | PRODUCTS | TECHNOLOGY | SUPPORT | PRESS


Silicon Surf [Sales Info] [Origin]
We welcome feedback and comments at webmaster@www.sgi.com.
Copyright © 1997 Silicon Graphics, Inc. All Rights Reserved. TrademarkInformation