Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
MySQL Results: Scaling
Back to our main subject, our astute readers have probably already noticed a weird anomaly. Let us analyze this further. If you look closely at both our measurements, Quad-core and Dual-core x86, you'll notice that the scaling is negative. To make it more clear, we made an average of all concurrency numbers from 5 and higher.
This is nothing short of amazing. It seems like an anomaly, but this is not the case. These benchmarks have been checked, verified and checked again. They are accurate. The x86 cores running on Linux perform better with two cores than with four cores, but the T1 running Solaris actually improves performance going from 4 to 8 cores.
So who is guilty? Linux or the Opteron system? We had to test with Solaris on the Opteron to be sure. However, the Serverworks chipset of our MSI 1U server was not supported by x86 Solaris. So we went back to our homebuilt server, based on the MSI K8N Master2-FAR.
And this puts the performance of our UltraSparc T1 in a whole different perspective. First of all, it is clear that while MySQL might not be the most scalable database, the current kernel of Linux is not helping matters. We did tweak the Linux kernel two ways: the 2.6.15 kernel was optimized for either Intel's or AMD's architecture and the AMD architecture also got NUMA support.
So what is going on here? After talking to our MySQL guru (P. Zaitsev), it turns out that in some circumstances, MySQL might cause trouble for the Linux mutex (mutual exclusion) implementation: "mutex ping-pong". The mutex implementation makes sure that two threads cannot access data in the main memory that is locked by another thread.
It seems however more a MySQL problem than a Linux one, as other databases like DB2 scale very well in Linux. For DB2 under the same load we noticed a performance increase of no less than 80-85% when going from two to four cores. Also, with some loads, the bad scaling kicks in later than our "Select dominated" load. Intel's performance labs told us that they also ran into the same problem.
These issues are not as severe as the problems we encountered with MySQL in Mac OSX. Note that Apple seems to have recognized the problem and seems to offer a workaround. We'll report back with other MySQL workloads to investigate the MySQL scaling problem further.
PostGreSQL Results
PostgreSQL 8.0.7, another open source database, uses processes and not threads to deal with connections. The consequence is that the benchmark numbers are a lot more stable: once each core is busy with it's process, you almost get maximum performance. In other words, the results didn't change much from 5, 10 or 25 concurrent users. To keep things simple, we only list the numbers with 20 users, which results in peak performance. The queries per second numbers at 5 and 25 were only a few percent lower. We did not include the T2000 Sun Server as the optimal PostGreSQL configuration is still under investigation.
Another clear victory for Woodcrest. On the Opteron, every 10% in clockspeed increase seems to result in a 7% performance increase. So if we extrapolate, an Opteron 3 GHz would arrive at 616 queries per second.
Back to our main subject, our astute readers have probably already noticed a weird anomaly. Let us analyze this further. If you look closely at both our measurements, Quad-core and Dual-core x86, you'll notice that the scaling is negative. To make it more clear, we made an average of all concurrency numbers from 5 and higher.
MySQL Linux (Queries/s) | |||||
Sun T1 4/8 cores 1 GHz |
MSI K2-102A2M Opteron 275 |
Xeon 5160 Woodcrest 3 GHz |
MSI K2-102A2M Opteron 280 |
||
Average Dual-core (T1: quad-core) |
362 | 749 | 996 | 805 | |
Average Quad-core (T1: octal-core) |
433 | 590 | 904 | 622 | |
Speedup Dual to Quad | 20% | -21% | -9% | -23% |
This is nothing short of amazing. It seems like an anomaly, but this is not the case. These benchmarks have been checked, verified and checked again. They are accurate. The x86 cores running on Linux perform better with two cores than with four cores, but the T1 running Solaris actually improves performance going from 4 to 8 cores.
So who is guilty? Linux or the Opteron system? We had to test with Solaris on the Opteron to be sure. However, the Serverworks chipset of our MSI 1U server was not supported by x86 Solaris. So we went back to our homebuilt server, based on the MSI K8N Master2-FAR.
MySQL Solaris (Queries/s) | |||
Sun T1 4/8 cores 1 GHz | Opteron 280 Solaris | Opteron 280 Linux | |
Average Dual-core (T1: quad-core) |
362 | 456 | 799 |
Average Quad-core (T1: octal-core) |
433 | 605 | 625 |
Speedup Dual to Quad | 20% | 33% | -22% |
And this puts the performance of our UltraSparc T1 in a whole different perspective. First of all, it is clear that while MySQL might not be the most scalable database, the current kernel of Linux is not helping matters. We did tweak the Linux kernel two ways: the 2.6.15 kernel was optimized for either Intel's or AMD's architecture and the AMD architecture also got NUMA support.
So what is going on here? After talking to our MySQL guru (P. Zaitsev), it turns out that in some circumstances, MySQL might cause trouble for the Linux mutex (mutual exclusion) implementation: "mutex ping-pong". The mutex implementation makes sure that two threads cannot access data in the main memory that is locked by another thread.
It seems however more a MySQL problem than a Linux one, as other databases like DB2 scale very well in Linux. For DB2 under the same load we noticed a performance increase of no less than 80-85% when going from two to four cores. Also, with some loads, the bad scaling kicks in later than our "Select dominated" load. Intel's performance labs told us that they also ran into the same problem.
These issues are not as severe as the problems we encountered with MySQL in Mac OSX. Note that Apple seems to have recognized the problem and seems to offer a workaround. We'll report back with other MySQL workloads to investigate the MySQL scaling problem further.
PostGreSQL Results
PostgreSQL 8.0.7, another open source database, uses processes and not threads to deal with connections. The consequence is that the benchmark numbers are a lot more stable: once each core is busy with it's process, you almost get maximum performance. In other words, the results didn't change much from 5, 10 or 25 concurrent users. To keep things simple, we only list the numbers with 20 users, which results in peak performance. The queries per second numbers at 5 and 25 were only a few percent lower. We did not include the T2000 Sun Server as the optimal PostGreSQL configuration is still under investigation.
PostgreSQL 8.0.7 (Queries/s) | |
DL385 1 x Opteron 280 | 517 |
Intel 2 x Xeon "Irwindale" 3.6 GHz | 448 |
MSI 1U 1 x Opteron 275 | 490 |
MSI 1U 1 x Opteron 280 | 524 |
Intel 1 x Xeon 5160 WC 3 GHz | 673 |
Another clear victory for Woodcrest. On the Opteron, every 10% in clockspeed increase seems to result in a 7% performance increase. So if we extrapolate, an Opteron 3 GHz would arrive at 616 queries per second.
91 Comments
View All Comments
snorre - Thursday, June 8, 2006 - link
Anandtech is going down the drain, there are no doubts left about it IMHO."Woodcrest" may be a nice improvement for Intel, but comparing it to clearly crippled (both software and hardware wise) Opteron systems is pretty lame by any standard.
Remember: Fool us once shame on us, fool us twice shame on YOU!
This is your third strike in my book, so now your officially out in THG hell.
I hope you wake up and smell the coffee soon...
Slappi - Thursday, June 8, 2006 - link
Exactly.I just can't believe what I am seeing here.
This site was once THE HARDWARE SITE for me and I always recommended it to others.
If Intel has a better chip hey that's great! But.... what is with the OBVIOUS underhanded reporting against AMD and for INTEL that has been going on here for the past few months?
It is so blatant here that I am starting to wonder of Intel's new chips are a lot of smoke and mirrors. If it is such a great chip it should speak for itself, not with all this closed testing and crippled AMD machines. Makes me wonder.
You would think after reading all the Anand Intel press that the new CPUs could cure cancer and cook dinner.
duploxxx - Thursday, June 8, 2006 - link
i can give 2 pages full of rather strange figures and compares about this review. but i hope you'll bring the readers the windows benches fast and compare with other published benches so everybody can see that the linux optimization can shift wherever you want.you use workstaion/budget motherboard against the intel server board. use a sun galaxy or hp proliant.
the specint and specfp are not correct, even intel gives way other numbers
some benches are done with one socket others with 2 socket. why?
mysql benches are optimized for two cores thats very clear.. the perfromance drop on opteron is much more the the one on woodcrest. knowing the architecture of the opteron this should be the other way round. the opteron is lacking here due to the motherboard
you can extrapolate it in a different way showing different results, again you use 2 different opterons and use thsi difference to calculate 3.0, both setups are workstation and therefore performance is wrong. some benches you even talk and calculate 2 systems but not showing on the graphs.
your conclusion: is rather funny. you state that the wooodcrest is the best performing server on a platform that has maybe 2% worlwide support with benches that can not be compared to other publication. no linnear powerconsuption with other servers because no exual hardware setup and most systems use 2gb/cpu thats a +28w consumption for the woodcrest.
as stated from line 1 give some real world benches where people can compare with other posted results.
zsdersw - Thursday, June 8, 2006 - link
The MSI K8N Master2-FAR board is a server motherboard. So are the boards in the other two Opteron servers.
MrKaz - Thursday, June 8, 2006 - link
I don’t know if you all already have realized but that is what it will look like the 4x4 boards.And that’s NOT a server board, ONLY ONE of the processors is accessing directly to the memory and that must IMPACT the performance.
http://www.msi.com.tw/images/product_img/mbd_img/9...">http://www.msi.com.tw/images/product_img/mbd_img/9...
AnandThenMan - Thursday, June 8, 2006 - link
Anyone that calls that MSI mobo a "server board" is a freakin retard.As for this "review" it has to be the worst on Anandtech in at least 6 months.
zsdersw - Thursday, June 8, 2006 - link
I guess MSI themselves must be retards then. Look where it's listed: http://www.msi.com.tw/program/products/server/svr/...">http://www.msi.com.tw/program/products/server/svr/...
ashyanbhog - Thursday, June 8, 2006 - link
for those who think MSI board must be good because they list it on their server pages,Just look at the memeory banks
MSI has a single bank, forcing the 2nd CPU to share the memory channel, reducing memory bandwidth to both CPUs, and increasing memory latencies. They are discarding NUMA capabailities to keep the price at around 250$
http://www.msi.com.tw/program/products/server/svr/...">http://www.msi.com.tw/program/products/server/svr/...
Now check Tyan k8we and Supermicro h8dci boards linked below. Notice that they all carry two seperate memory banks, giving each processor its own dedicated bank. This doubles the available memory bandwidth and keeps lantencies low.
http://www.tyan.com/products/html/thunderk8we.html">http://www.tyan.com/products/html/thunderk8we.html
http://www.supermicro.com/Aplus/motherboard/Optero...">http://www.supermicro.com/Aplus/motherboard/Optero...
Iwill D8kn is another similar board that I can recall. They all recommend that you put atleast on card in each bank in a two processor setup to utilize the extra bandwidth.
But adding this extra bank comes at a cost, all the above boards are priced around $500 mark. Its common knowledge in the AMD community that one needs get the boards with seperate memory banks if on is looking for a high performance machine.
If you still have doubt, check the review on GamePC, linked below. Notice that the Tyan TIGER k8we, (with single memory channel to both CPUs like the MSI board) is beaten in every benchmark by Tyan THUNDER k8we (which has dedicated memory channels for both CPUs)
BasMSI - Friday, June 9, 2006 - link
MSI lists them as Workstation boards, not server boards.http://www.msi.com.tw/program/products/server/svr/...">>>See link<<
They should have used the K8D-Master series, those are server boards and do have NUMA.
zsdersw - Friday, June 9, 2006 - link
It's under the "Server and Rackmount" section of their website.