Next generation consoles and how to misuse statistics.

· 2013 words · 10 minute read

Next generation consoles and how to misuse statistics.

Microsoft recently released a document to various gaming organizations. Only one site I know of published it with a huge warning up front that it was direct from microsoft with no editing whatsoever and was extremely slanted. Further research shows that its slant approaches vertical. Since then I’ve noticed on many gaming sites that Microsoft fanboys have been abusing these numbers that they apparently don’t even understand and using them to tout the superiority of their choice of two systems that aren’t even OUT yet. The Sony fanboys understanding of the technical issues seems even more lacking and they seem incapable of coming up with a coherent response. I’m going to start with the microsoft document and refute some of its points then move on to some common misconceptions I’ve seen on the various gaming boards. It will get a bit hairy and technical here. Beware.


First, the link to the MS document. Drivel
  1. The first graph: “General Purpose Performance Comparison”

    This shows the Xbox bar to be 3x as tall as the PS2 graph. While this is a true statement if you consider full modern PowerPC SISD (Single Instruction Single Data) chips to be the only source for “General Purpose Performance.” The special purpose chips can be easily used to do General purpose calculations as well as they are in fact full fledged System-on-Chip computers with a very simplified pipeline.

    If you do an examination of the total processing power of the CPU you would get the following:
    Cell has a primary processing unit that is a full fledged, modern PPC on a brand new core design. The primary processor has two pipelines and can run two non-dependent tasks at once. The SIMD cores (also referred to as SPEs) are a brand new RISC design that resembles the original PowerPC chips except modified for Vector processing. They have direct access to 256k of local memory that is positioned in a similar manner to L1 cache and should have roughly the same speed but without the hardware overhead of synchronizing it with L2 cache and main memory. (This task is done by the central core in software.) The SPEs are dual pipeline as well and operate on 128bit blocks of 32 bit words every instruction. This means that if you need to do the same thing to a bunch of data at once, then you can do it 4x as fast on the SPE as you could on the main processor. This means that for 1 main core and 8 SPE cores you have 18 pipelines.

    The Xbox 360 CPU is apparently 3 PowerPC cores arranged around a common L2 cache. Each core is dual pipeline, each core has 2 Vector units, 1 integer, 1 floating point, 1 branch, and one load-store unit. This gives 2 pipelines per core that can either be SISD or SIMD but not both simultaniously. These processors have almost no advantages in branch prediction over the Cell unlike I had originally assumed.

    Cell: 18 total processing pipelines. 2 SISD 8 SIMD Arithmetic (Single Instruction Multiple Data), 8 SIMD Branch/Load
    Xbox: 6 total processing pipelines. 6 SISD or SIMD

    Cell then has twice the pipelines of the Xbox 360 and more of these can be used to calculate vectors of data rather than a single machine word at a time. They can just as easily be used to calculate one machine word at a time as long as you align the words on the vector boundries. Theoretically then, barring differences because of the Xbox360’s more CISC version of the PowerPC core, the Cell should be able to do twice as many floating point AND integer operations per second not even counting the SIMD capabilities and using the SIMD processors as if they were SISD. On the flip side, 16 of CELL’s pipelines are extremely stripped down and require more compiler optimization and could end up with larger code and more cycles per instruction. Altavec and SPE are able to process a similar amount per pipeline, both doing about 128 bits of information at a time.

  2. Second Graph: “CPU Floating Point Performance Comparison”
    Here they show the Xbox graph to be 55% to 60% of what the Cell can do. While this could be accurate judging by the above discussion they also use the following text “Cell’s claimed advantage is on streaming floating point work which is done on its seven DSP processors” contains several misconceptions. The first is that the SPE chips are mere DSPs. They are actually full fledged processors in their own right. Granted, they are very primitive in design compared to the main core in order to keep their size small and lack most of the code protections and multiuser features that we’ve come to expect from a modern processor design. The next misconception is that they are only good for Floating point performance. While it is true that they are more optimized for floating point, their Integer performance is also good. It uses the same register for both types of data however, and it calculates Floating point earlier in the pipeline which lends itself to slightly more speed using them for floats vs integers. The SPEs are 32 bit processors that calculate 128bits of data per instruction on each pipe.

  3. Third Graph: “GPU Shader Operations per Second”
    I do not have any information at all about the PS3 graphics chip, so I can’t do much speculation on this graph.

  4. Fourth Graph: “Total Memory System Bandwidth”
    This shows Xbox with 278GB/s and PS3 with 48 GB/s
    This part of the presentation HAS to have been written by a marketing exec and not anyone with any technical background whatsoever. Further research on this shows that their Total Memory System Bandwidth takes into account the bandwidth of the internal cache memory of the GPU, going between the GPU and its own cache. This accounts for 256GB/s of their proposed 278GB/s. A further reading of the release from ATI shows that its GPU’s bandwidth to main memory is a mere 25.6 GB/s. The Xbox CPU is likely on the same memory bus as the GPU and can boast a similar 25.6 GB/s performance. The PS3 actually has an 8-way point to point memory bus that directly connects its RAM chips to its processor. Each link on this bus operates at 3.2GB/s. There are 8 links giving it a total bandwidth of 25.6 GB/s. A graph with two equal bars is much less entertaining than one with wildly unequal bars however.
    Actual CPU to Memory transfer speed: 25.6 GB/s for both PS3 and Xbox360.

    As far as the 256GB/s bandwidth of the GPU goes, they’re comparing apples to oranges here. Pulling numbers out of my own backside, the closest comparison on the PS3 Cell would be the speed of the L1 and L2 caches to the main processor, and the speed of the SPEs to their Local ram. The SPEs are said to fetch two 64 bit instructions and up to 512 bits of data per cycle at 3.2GHz. This clocks in at 230 Gigabytes per second for EACH SPE. (This is assuming they don’t need separate instructions to pull vectors from memory which doesn’t seem to be the case because the instructions include space for 2 source vectors addresses and a destination vector address.) Given that this would be over a terabyte per second taking into account the 7 SPEs, I’m not even going to bother adding the L1/2 cache speed in.

  5. On the second page they reiterate their CPU performance graphs and complain about splitting the work between 8 processors. I posted elsewhere about how the 8 cores are just an extension of the already proven 2-step vector pipeline that the PS2 emotion engine has. The fact is that programming games on Symmetric processors is insanely hard, and you typically end up pipelining various tasks between threads anyway. We don’t know enough about the XNA programming environment nor the CELL programming environment at this time to say which one makes it easier to balance the threads across the multiple processors. They repeat the falsehood that the SPEs are not good for general purpose programming.

  6. They reiterate the above memory bandwidth graph in a later page along with a few other. They state that it has 256GB/s EDRAM and 22.4GB/s GDDR3 ram. This GDDR3 estimate is actually lower than ATI’s stated memory bus speed. The EDRAM is the 10 megabytes of cache located on the GPU that is not accessible by the CPU at that speed.

  7. The next one we haven’t seen before is “GPU Programmable Shader GFLOPS”
    This is another case of “we don’t have enough information.” They pull some guesstimates out of their backside on this comparing it to existing Nvidia GPUs. They aren’t worth talking about.

  8. Moving on to the common misconceptions, we see people arguing that Xbox 360’s 12x dvd drive is faster than Sony’s BluRay 4x drive. This is much the same argument that you used to see with the 48x CD drives against 10x DVD drives. The base BluRay drive is much faster than the base DVD drive that they are comparing the x speeds to. A recent article stated that BluRay is capable of 36MB/s on a 1x drive, while DVD only does 5MB/s. Arguments about Bluray only being 1.5x as fast as DVD also seem incorrect.
    A person on another forum pointed out the following table for Bluray vs. DVD performance.
    1x Bluray ROM = 54 Megabits/sec = 6.75 MB/sec
    2x Bluray ROM = 108 Megabits/sec = 13.5 MB/sec
    4x Bluray ROM = 216 Megabits/sec = 27.0 MB/sec
    1x DVD ROM = 11.1 Megabits/sec = 1.39 MB/sec
    4x DVD ROM = 44.4 Megabits/sec = 5.55 MB/sec
    12x DVD ROM = 133 Megabits/sec = 16.65 MB/sec
    I’ll do some further checking on this. If true, it would mean that Softpedia is talking out of their hind end about blueray’s 36 MB performance. It did seem rather fishy that the number was so high.
    The numbers I’ve found seem to match his on DVD, but are actually slower for Bluray.
    1x Bluray 1.0 = 36.5 Megabits/sec = 4.56 MB/sec
    2x Bluray 1.0 = 73 Megabits/sec = 9.12 MB/sec
    4x Bluray 1.0 = 146 Megabits/sec = 18.25 MB/sec
    This would still mean that a 4x bluray would beat the 12x dvd. Will Sony put a 4x in the system? I don’t see how they can afford not to.

  9. “All of the demos the PS3 had were prerendered, while all of Xbox 360’s demos were live.”
    Another common misconception. While most of them were prerendered, the Unreal Tournament 2007 demo was actually done live on the hardware. This is understandable because the hardware is actually at least 6 months farther from release than Xbox360 is. What most people fail to note however is that Xbox360’s demo kiosks were NOT running on the Xbox 360 hardware. See the below AnandTech article for pictures of them running on Mac G5 computers. You can hardly expect a system that is 6 months minimum from release to be fully operational, just like it’s unrealistic to expect that all the demos for a system that is still almost a year from release to be operating on the real hardware. The fact that Epic managed to get a real live demo running on the actual hardware for the show was astounding.



References:


  1. Real World Tech ISSCC 2005: The CELL Microprocessor

  2. Electronic Design: CELL Processor Gets Ready To Entertain The Masses

  3. Softpedia: The chronicles of a futile battle: Blu-Ray vs. HD-DVD

  4. ISSCC 2005: A Streaming Processor unit for a Cell Processor

  5. ARS Technical: Introducing the IBM/Sony/Toshiba Cell Processor

  6. Tech Report: Details of ATI’s Xbox 360 GPU unveiled

  7. Anandtech: E3 2005 - Day 1: The Xbox 360 Update

  8. Forum post on the actual speed of bluray.

  9. ARS Technical: Xbox360




All in all, it’s not the power of the systems but the games and how the developers use that power that will determine the winners of the console war. I suggest everyone chill out and wait until the PS3 is released to start bickering about which console is best. For now from the stats, it’s far from certain which one is truly better.