The Best Server CPUs part 2: the Intel "Nehalem" Xeon X5570
by Johan De Gelas on March 30, 2009 3:00 PM EST- Posted in
- IT Computing
Market Analysis
We'll wrap up with a quick look at the complete market to see how the most interesting CPUs from Intel and AMD compare. In the first column you will find the market. The second column shows the percentage of server shipments to this market. Some markets generate more revenue for server manufactures like ERP, OLTP, and OLAP; however, we have no recent numbers on this so we'll just keep it in mind. The green zones of the market are the ones where we have a decent benchmark that AMD wins, the blue ones represent the Intel zone, and the red parts are - for now - unknown. Let's first look back at the situation from a few months ago.
AMD "Shanghai" Opteron 2.7GHz vs. Xeon "Harpertown" 3GHz | ||||
Market | Importance | First bench | Second bench | Benchmarks/remarks |
ERP, OLTP | 10-14% | 21% | 5% | SAP, Oracle |
Reporting, OLAP | 10-17% | 27% | MySQL | |
Collaborative | 14-18% | N/a | ||
Software Dev. | 7% | N/a | ||
e-mail, DC, file/print | 32-37% | N/a | ||
Web | 10-14% | 2% | ||
HPC | 4-6% | 28% | -3% to 66% | LS-DYNA, Fluent |
Other | 2%? | -18% | -15% | 3DSMax, Cinebench |
Virtualization | 33-50% | 34% | VMmark |
The market was almost completely green. AMD's "Shanghai" Opteron was reigning supreme in the HPC and virtualization market. It was clearly in the lead in the OLTP and OLAP market and it had a small advantage in the web market and probably also in the collaborative software market. Since the AMD servers also consumed less power (the Xeons used power hungry FB-DIMMs), you could say that AMD was the "smarter" choice in about 90-98% of the market.
Then a Tsunami called "Nehalem" was launched…
Nehalem Performance Overview | ||||
Server Software Market | Importance | Benchmarks used | Intel Xeon X5570 vs. Opteron 2384 | Intel Xeon X5570 vs. Xeon 5450 |
ERP, OLTP | 10-14% | SAP SD 2-tier (Industry Standard benchmark) | 81.40% | 119% |
Oracle Charbench (Free available benchmark) | 84.70% | 94% | ||
Dell DVD Store (Open Source benchmark tool) | 66.20% | 78% | ||
Reporting, OLAP | 10-17% | MS SQL Server (Real world vApus benchmark) | 76.50% | 107% |
Collaborative | 14-18% | MS Exchange LoadGen (MS own load generator for MS Exchange) | Estimated 75-95% | 93% |
e-mail, DC, file/print | 32-37% | See MS Exchange | ||
Software Dev. | 7% | None | ||
Web | 10-14% | MCS eFMS (Real world vApus benchmark) | 36.80% | 39% |
HPC | 4-6% | LS-DYNA (Industry Standard) | 57.00% | 101% |
<1% | LINPACK | 15.00% | 1% | |
Other | 2%? | 3DSMax (Our own bench) | 50.30% | 24% |
Virtualization | 50% | VMmark (Industry standard) | 58.70% | 114% |
…and nothing that was not called Xeon X55xx was still standing. The Xeon X55xx series simply crushes the competition and reduces the older Xeons to expensive space heaters, with the exception of the rendering and dense matrix HPC market. If you are consolidating your servers, buying a new heavyweight back end database server or mail server, there is only one choice at this moment: the Xeon X55xx series. Period.
AMD after the Sledgehammer blow
Is this the end of the line for the Sunnyvale based company? Is the launch of Bulldozer the day that never comes? Is AMD broken, beat and scarred? Scarred: who would not after this kind of blow. Beaten? For now. But not broken; AMD dies hard. After more than a full year of rather poor execution (Q2 2007 to Q3 2008), AMD is finally shaping up and executing like in the K7-K75 days. The 45nm process technology is very healthy and the speed path problems of Barcelona have been fixed in Shanghai. The result is that only four months after the successful launch of the 2.7GHz Shanghai, we are already seeing a speed bump while the power dissipation stays the same. The 2.9GHz chip was flying towards our lab while I was writing this conclusion; we'll add it as soon as possible.
The 2.9GHz part will not be able to come close to the top Nehalems; however, with the right pricing it might be an attractive alternative to the lower end Xeon 55xx series. Considering that a triple channel board equipped with DDR3 will result in a somewhat more expensive server, AMD might still be able to compete at the lower end. What is more, faster versions of Shanghai strengthen the position of AMD in the small but profitable octal CPU market. For example, 2.9GHz will allow SUN and HP to produce massive monster servers that can support more than 20 tiles and performance scores above 30 in VMmark. Faster versions of Shanghai with vast amounts of memory should also keep the 4-way server market open for AMD.
The hex-core version of Shanghai "Istanbul" is already running VMware ESX 3.5, which indicates that the launch of AMD's hex-core is going to be sooner than expected. AMD will have to surprise us with better than expected power consumption and clock speeds, but if they do, AMD might be in the race again. We doubt AMD will be able to outperform the best Xeon 55xx, but at least it has a chance to stay competitive with the midrange Intel options. Until then, aggressive pricing is the only weapon left.
44 Comments
View All Comments
snakeoil - Monday, March 30, 2009 - link
oops it seems that hypertreading is not scaling very well too bad for inteleva2000 - Tuesday, March 31, 2009 - link
Bloody awesome results for the new 55xx series. Can't wait to see some of the larger vBulletin forums online benefiting from these monsters :)ssj4Gogeta - Monday, March 30, 2009 - link
huh?ltcommanderdata - Monday, March 30, 2009 - link
I was wondering if you got any feeling whether Hyperthreading scaled better on Nehalem than Netburst? And if so, do you think this is due to improvements made to HT itself in Nehalem, just do to Nehalem 4+1 instruction decoders and more execution units or because software is better optimized for multithreading/hyperthreading now? Maybe I'm thinking mostly desktop, but HT had kind of a hit or miss reputation in Netburst, and it'd be interesting to see if it just came before it's time.TA152H - Monday, March 30, 2009 - link
Well, for one, the Nehalem is wider than the Pentium 4, so that's a big issue there. On the negative side (with respect to HT increase, but really a positive) you have better scheduling with Nehalem, in particular, memory disambiguation. The weaker the scheduler, the better the performance increase from HT, in general.I'd say it's both. Clearly, the width of Nehalem would help a lot more than the minor tweaks. Also, you have better memory bandwidth, and in particular, a large L1 cache. I have to believe it was fairly difficult for the Pentium 4 to keep feeding two threads with such a small L1 cache, and then you have the additional L2 latency vis-a-vis the Nehalem.
So, clearly the Nehalem is much better designed for it, and I think it's equally clear software has adjusted to the reality of more computers having multiple processors.
On top of this, these are server applications they are running, not mainstream desktop apps, which might show a different profile with regards to Hyper-threading improvements.
It would have to be a combination.
JohanAnandtech - Monday, March 30, 2009 - link
The L1-cache and the way that the Pentium 4 decoded was an important (maybe even the most important) factor in the mediocre SMT performance. Whenever the trace cache missed (and it was quite small, something of the equivalent of 16 KB), the Pentium 4 had only one real decoder. This means that you have to feed two threads with one decoder. In other words, whenever you get a miss in the trace cache, HT did more bad than good in the Pentium 4. That is clearly is not the case in Nehalem with excellent decoding capabilities and larger L1.And I fully agree with your comments, although I don't think mem disambiguation has a huge impact on the "usefullness" of SMT. After all, there are lots of reasons why the ample execution resources are not fully used: branches, L2-cache misses etc.
IntelUser2000 - Tuesday, March 31, 2009 - link
Not only that, Pentium 4 had the Replay feature to try to make up for having such a long pipeline stage architecture. When Replay went wrong, it would use resources that would be hindering the 2nd thread.Core uarch has no such weaknesses.
SilentSin - Monday, March 30, 2009 - link
Wow...that's just ridiculous how much improvement was made, gg Intel. Can't wait to see how the 8-core EX's do, if this launch is any indication that will change the server landscape overnight.However, one thing I would like to see compared, or slightly modified, is the power consumption figures. Instead of an average amount of power used at idle or load, how about a total consumption figure over the length of a fixed benchmark (ie- how much power was used while running SPECint). I think that would be a good metric to illustrate very plainly how much power is saved from the greater performance with a given load. I saw the chart in the power/performance improvement on the Bottom Line page but it's not quite as digestible as or as easy to compare as a straight kW per benchmark figure would be. Perhaps give it the same time range as the slowest competing part completes the benchmark in. This would give you the ability to make a conclusion like "In the same amount of time the Opteron 8384 used to complete this benchmark, the 5570 used x watts less, and spent x seconds in idle". Since servers are rarely at 100% load at all times it would be nice to see how much faster it is and how much power it is using once it does get something to chew on.
Anyway, as usual that was an extremely well done write up, covered mostly everything I wanted to see.
7Enigma - Wednesday, April 1, 2009 - link
I think that is a very good method for determining total power consumption. Obviously this doesn't show cpu power consumption, but more importantly the overall consumption for a given unit of work.Nice thinking.
JohanAnandtech - Wednesday, April 1, 2009 - link
I am trying to hard, but I do not see the difference with our power numbers. This is the average power consumption of one CPU during 10 minutes of DVD-store OLTP activity. As readers have the performance numbers, you can perfectly calculate performance/watt or per KWh. Per server would be even better (instead of per CPU) but our servers were too different.Or am I missing something?