
Original Link: https://www.anandtech.com/show/1066
AMD's Athlon XP 3000+: Barton cuts it close
by Anand Lal Shimpi on February 10, 2003 2:24 AM EST- Posted in
- CPUs
The past year has seen Intel regain much of the face they had lost in the CPU industry. The sole reason we are at 3.06GHz today with Hyper-Threading support is because, ever since the release of the Pentium 4, AMD had turned up the heat on Intel. Call it the waking of a giant or competition at its best, but no matter how you slice it, the past twelve months has shown us Intel at their finest.
Intel’s 0.13-micron transition was executed without a hitch, and their Northwood Pentium 4 core more than made up for the disappointment that was the Willamette. By the end of 2002, the Pentium 4 was no longer a joke but rather a serious alternative to the Athlon XP – as well as the world’s fastest desktop microprocessor.
The success of the Pentium 4 in 2002 was not only because of the Northwood core, but also because of Intel’s re-entry into the desktop chipset market. Although they have irritated motherboard manufacturers with their seemingly infinite revisions to the original 845 chipset, the quality and performance of the 845 chipsets has helped move the Pentium 4 into more and more PCs.
With all of this said, almost one year ago, AMD managed to create quite a bit of worry in Intel’s Santa Clara offices, with their demonstration of A0 Hammer silicon during IDF week. The Athlon was AMD’s first very well-executed processor, and if they could pull off the Hammer launch just as well, then Intel would surely be in trouble. A 2GHz Hammer (Athlon 64) launched toward the end of last year would have put the Pentium 4 to shame, and that is the reason Intel made the decision to pull in the release of the Hyper-Threading enabled 3.06GHz Pentium 4. Originally Hyper-Threading was to be a Prescott-only feature, due out in the second half of 2003, but the fear of AMD marketing the world’s only 64-bit desktop processor led Intel to war with the world’s only SMT desktop processor. In the end, it was better safe than sorry for Intel, as we have all heard the news of AMD’s latest Athlon 64 delay – it won’t be until September of this year before we see the Athlon 64 on desktops.
This time around, the delay can’t be blamed on manufacturing or the design of the chip, as the Opteron is still scheduled to launch relatively on-time in April. AMD’s official stance is that they are waiting for a 64-bit version of Windows XP to ship Athlon 64 with, which will make marketing the chip and AMD’s 64-bit strategy much easier to live out. However, after so many delays, how can AMD get away with pushing Athlon 64 back another seven months?
Looking back at what we learned at Comdex, Intel isn’t going to be aggressively ramping clock speeds of their Pentium 4 until Prescott at the end of this year. The fact of the matter is that, at 3.06GHz, the 0.13-micron Northwood core is producing an incredible amount of heat and making it beyond 3.2GHz will be tough given the current manufacturing process.
It is actually a very good thing for Intel that AMD has delayed the Athlon 64 even further; a competitive launch from AMD at this point would push Intel to release an even hotter and lower yield Pentium 4 that we could only hope would not be reminiscent of the recalled 1.13GHz Pentium III. In that case, Intel wasn’t ready to release any faster CPUs when AMD turned the competition up a notch, resulting in a CPU being pushed into the public’s hands that didn’t meet Intel’s standards and leaving the boys in blue with omelet-ridden faces.
With the Pentium 4 only hitting 3.2GHz in the near future, Intel is hoping to keep the market buying and upgrading their systems by bringing a few other features to the Pentium 4. AMD’s only threats for the first half of this year are an 800MHz FSB, Hyper-Threading and mainstream dual-channel DDR chipsets from Intel.
Taking those threats into account, AMD concluded that they would be able to remain competitive without playing the Hammer-card just yet. The focus has been Athlon 64 for the past several months, but after missing numerous internal deadlines and dealing with difficulty of launching a brand new CPU architecture under a very limited budget, the first half of 2003 will be carried by a little talked about core called Barton.
If you had asked us a year ago if we would be pitting yet another Athlon XP against Intel’s fastest, we wouldn’t have believed you. But as usual, it’s the unexpected case that ends up as reality, so today we take a look at the newest extension of the Athlon XP family: Barton.
What's a Barton?
With AMD's 0.13-micron transition finally going as planned and 90nm still a 2004 item, it is no surprise that Barton is another 0.13-micron core. Replacing the short-lived Thoroughbred, Barton adds one major feature to the Athlon XP's list - a 512KB L2 cache. Other than a larger cache (and a respectively larger die), Barton is no different from Thoroughbred.
Thoroughbred (left) vs. Barton (right)
The larger L2 cache obviously improves performance a bit (we will explain exactly how and why later), and with that, AMD has given the Barton cores a slightly modified model rating system.
The Barton cores are going to be made available in three clock speeds, all using the new 333MHz FSB. The speeds and model ratings are summarized by the table below:
AMD
Athlon XP Model Numbers
|
|||
CPU Name |
Clock
Speed
|
||
Athlon XP 3000+ (Barton) |
2.167GHz
|
||
Athlon XP 2800+ (Barton) |
2.083GHz
|
||
Athlon XP 2800+ (333MHz FSB) |
2.25GHz
|
||
Athlon XP 2700+ (333MHz FSB) |
2.167GHz
|
||
Athlon XP 2600+ (333MHz FSB) |
2.083GHz
|
||
Athlon XP 2600+ |
2.13GHz
|
||
Athlon XP 2500+ (Barton) |
1.83GHz
|
||
Athlon XP 2400+ |
2.00GHz
|
||
Athlon XP 2200+ |
1.80GHz
|
||
Athlon XP 2100+ |
1.73GHz
|
||
Athlon XP 2000+ |
1.67GHz
|
||
Athlon XP 1900+ |
1.60GHz
|
||
Athlon XP 1800+ |
1.53GHz
|
||
Athlon XP 1700+ |
1.47GHz
|
||
Athlon XP 1600+ |
1.40GHz
|
||
Athlon XP 1500+ |
1.33GHz
|
As you can see, the new Athlon XP 3000+ actually carries a lower clock speed than the previous flagship 2800+ (2.167GHz vs. 2.25GHz). The idea behind this is that the larger L2 cache makes up for the negative difference in clock speed, but as you'll see, this isn't always the case.
The Barton 3000+ and 2800+ will be available immediately, with the higher volume 2500+ coming at the end of February.
The other thing the larger cache brings to the table is a much larger die; unfortunately, this die drives the manufacturing cost of Barton noticeably higher than the previous Thoroughbred core. Whereas Thoroughbred was made up of around 37.6 million transistors, Barton weighs in with a plentiful 54.3 million transistors. The added transistors increases the die size from 84 mm^2 to approximately 101 mm^2. Recently, we had the opportunity to ask AMD's CTO, Fred Weber about manufacturing, and one of the tidbits of information he left us with was that AMD's manufacturing sweet-spot exists between 50 mm^2 and 100 mm^2. Once dies get above 100 mm^2, they start to be significantly more expensive than those that can fit within that 50 - 100 mm^2 range; as you can tell, Barton is at the very edge of that scale, making it difficult for AMD to maintain as large of a price advantage over Intel.
![]() |
![]() |
Being on the edge of this sweet-spot doesn't mean AMD will be selling Athlon XPs for as much as Intel does their Pentium 4s, but it does mean that prices either have to go up to compensate or that profits have to go down. Given AMD's current financial status, we would say that the former option is more likely.
Here's AMD's pricing structure for the Athlon XP line as of the time of publication:
As you can see, the Athlon XP 3000+ is not a cheap part, but neither is the 3.06GHz Pentium 4 it's up against.
Now that you've been exposed to Barton at a high level, let's talk about its major attraction - this added L2 cache.
What are the Benefits of a Larger Cache?
Bigger is better right? So a 512KB L2 cache must be better than a 256KB one - after all, AMD wouldn't spend 17 million transistors for no gain. Although it's very true that a larger cache is generally beneficial, the real question is how beneficial and in what situations. To answer that question, we should have a quick lesson in caches and what makes them so useful.
Think of a cache as a bridge between two entities - a slower and a faster one. In this case, the cache we are talking about is part of a multilevel cache system and it helps to bridge the gap between the CPU and main memory.
It's no surprise that main memory runs significantly slower than today's CPUs. Not only does memory run at significantly slower clock speeds (e.g. 200MHz for DDR400) than today's CPUs, but main memory is physically located very far away from the processor. Our multi-gigahertz CPUs have to waste well over 100 clock cycles to retrieve data from main memory as their requests must cross over slow front-side buses, through an external memory controller, to the memory and back. Making this trip can wreak havoc on performance, especially for CPUs with very long pipelines, as these pipelines generally remain idle if the data necessary to populate them has to be fetched from main memory.
The idea behind a processor's caches is that you store important data in these high speed memories (now located on the processor's die itself), so that most of the time, your CPU doesn't have to make the long trip to main memory. The reason caches are split into multiple levels is because the larger your cache is, the longer it takes to fetch data. Therefore, it ends up being that having one smaller but very low latency cache combined with a larger and somewhat higher latency (but still significantly quicker than main memory) cache provides the best balance of performance in today's microprocessors. These two caches are the Level 1 (L1) and Level 2 (L2) caches you hear about all the time.
Caches work based on two major principles - spatial and temporal locality. These two principles are simple; spatial locality states that, if you are accessing data, then, the data around it will be accessed soon, and temporal locality states that if you are accessing data, chances are that you'll access that same piece of data again. In practice, this means that frequently accessed data is kept in cache, as well as data physically around it. Since caches are of relatively small sizes (rightfully so, it would be cost and performance prohibitive to have main memory-sized caches), the algorithms they use to make sure that the right information remains in the cache is even more critical to performance than the sheer size of the cache.
With Barton, AMD left their L1 the same as before, but increased their L2 cache size by a total of 256KB. AMD didn't change any of the specifications of the cache (e.g. it is still a 16-way set associative L2 cache) Luckily, AMD increased the cache size without sacrificing access time, but where will the added L2 cache help?
Let's look at those two principles we mentioned before, spatial and temporal locality. If an application's usage pattern does not abide by either one of these principles, then it doesn't matter how much cache you add, the performance will not improve. So what are some examples of applications that are and are not cache-friendly?
For starters, let's talk about things that don't abide by the principle of temporal locality - mainly multimedia applications, more specifically - encoding applications. If you think about how encoding works, the data is never reused, simply encoded on a bit-by-bit basis and then the original data is never touched again. At the other end of the spectrum, we have things like office applications that happily abide by the principle of temporal locality. In these sorts of applications, you are often re-using data, performing very similar tasks to them over and over again and thus making great use of larger caches.
The principle of spatial locality applies to a much wider range of applications, including multimedia encoding applications because of the fact that data is generally stored in contiguous form in main memory and is thus very cache-friendly. Spatial locality is why you will see some improvement from larger caches even in applications that don't exhibit much temporal locality.
AMD’s Cache Benefits vs. Intel’s Cache Benefits
All caches are not created equal and thus you should not expect AMD to benefit as much as Intel did from going to a 512KB L2 cache. Intel follows a much more conventional L1/L2 cache architecture that uses what is known as the inclusive principle; the inclusive principle states that the contents of the L1 cache are also included in the L2 cache. The obvious downside to this is that the L2 cache contains some data that is redundant that the CPU will never use (if it needs it, it will get it from the faster L1 cache). From the CPU's point of view, an inclusive cache just means it has less room to store its much needed data in, but from the standpoint of the rest of the system an inclusive cache does provide one advantage - if data is updated in main memory (e.g. through DMA), the memory controller only has to check the L2 cache to update data, and there is no need to check L1 for coherency. This is a small but important benefit to an inclusive cache architecture.
The opposite, obviously, is a cache subsystem that follows the exclusive principle - such as the Athlon XP's cache. In this case, the contents of the L1 cache are not duplicated in the L2 cache, thus favoring cache size over the added latency of checking for two levels of cache coherency in DMA situations. The exclusive approach makes much more sense for AMD, considering the Athlon XP has an extremely large 128KB L1 cache that would be very costly to duplicate in L2 (compared to Intel's 8KB L1 Data cache that is easily duplicated in L2).
Both architectures have their pros and cons, but are best suited for the particular CPU we are talking about. Recognizing the differences, however, helps us understand why AMD will benefit differently from Intel when it comes to the 256KB to 512KB cache leap, but this still isn't the full story.
In order to see the differences, we compared four CPUs - a 2.167GHz Barton to a 2.167GHz Thoroughbred, and a 2.00GHz Northwood to a 2.00GHz Willamette and compared the performance benefit of going to a 512KB L2 cache for both the Athlon XP and Pentium 4 in the chart below:
As you can see, the Pentium 4 consistently received a bigger performance boost from the move to a 512KB L2 cache. Intel seems to think that this is related to AMD's use of an exclusive cache architecture, but it also may have something to do with the fact that the Pentium 4 is penalized much more from having to wait to go to memory than the Athlon XP. What's very interesting to note is the performance improvement the Pentium 4 realizes in situations where a simple increase in cache size shouldn't boost performance that much, such as the encoding applications and 3D rendering apps.
What about the 400MHz FSB?
After Comdex, the word on the street was that AMD would be moving Barton to a 400MHz FSB in the near future but that the CPU would debut with a 333MHz FSB. As you can tell by today's release, we are still dealing with 333MHz FSB CPUs, but what is there to be said about the potential impact of a 400MHz FSB?
A larger L2 cache means that Barton has to go to main memory much less often (assuming that our applications do abide by the principles of spatial and temporal locality), which means that it has to send requests and receive data across the FSB much less frequently compared to an identically clocked Thoroughbred.
Since Barton is being launched at speeds slower than the fastest Thoroughbred, the immediate need for a 400MHz FSB isn't apparent - remember, FSB traffic should be reduced by the larger L2 cache. However, as Barton ramps up in clock speed, the move to a 400MHz FSB may become more appetizing as higher clocked Athlon XPs will require data at a faster rate to keep their pipelines filled.
So today, Barton would benefit less from a 400MHz FSB than the Thoroughbred core, which isn't much at this point either. Remember that the main benefit of the 333MHz FSB was latency reduction because of the fact that the FSB and memory bus were finally operating at the same clock speed once again, and not because of the increase in FSB bandwidth.
Headroom with Barton?
More transistors and a die-size that is on the edge of AMD's manufacturing sweet-spot means that Barton isn't going to be any sort of overclocking monster, but just for kicks, we decided to see how far we could push the core.
Using conventional cooling, we were able to hit 2.324GHz at stock voltage, an overclock of 7%, but maintaining stability was another question. We're sure that with little effort the 2.2 - 2.3GHz range is attainable, but anything above that will require a bit more. With AMD working on an Athlon XP 3200+ to go up against Intel's forthcoming Pentium 4 3.2GHz, we can at least see that the current Barton cores shouldn't have much of a problem taking AMD there.
The Test
Windows
XP Professional Test Bed
|
|
Hardware
Configuration
|
|
CPU |
AMD
Athlon XP 3000+ (2.167GHz) Barton
AMD Athlon XP 2800+ (2.25GHz) AMD Athlon XP 2700+ (2.167GHz) AMD Athlon XP 2600+ (2.083GHz) AMD Athlon XP 2400+ (2.00GHz) AMD Athlon XP 2200+ (1.80GHz) AMD Athlon XP 2100+ (1.73GHz) AMD Athlon XP 2000+ (1.67GHz) AMD Athlon XP 1900+ (1.60GHz) AMD Athlon XP 1800+ (1.53GHz) AMD Athlon XP 1700+ (1.47GHz) AMD Athlon XP 1600+ (1.40GHz) AMD Athlon XP 1500+ (1.33GHz) Intel Pentium 4 3.06GHz Intel Pentium 4 2.80GHz Intel Pentium 4 2.66GHz Intel Pentium 4 2.53GHz Intel Pentium 4 2.50GHz Intel Pentium 4 2.40GHz Intel Pentium 4 2.26GHz Intel Pentium 4 2.20GHz Intel Pentium 4 2.0AGHz Intel Pentium 4 2.0GHz Intel Pentium 4 1.9GHz Intel Pentium 4 1.8AGHz Intel Pentium 4 1.8GHz Intel Pentium 4 1.7GHz Intel Pentium 4 1.6AGHz Intel Pentium 4 1.6GHz Intel Pentium 4 1.5GHz |
Motherboard |
ASUS
A7N8X - NVIDIA nForce2 Chipset
Intel D850EMV2 - Intel 850E Chipset |
RAM |
2
x 256MB DDR400 CAS2 Corsair XMS3200 DIMM
2 x 256MB PC1066 Samsung RIMMs |
Sound |
None
|
Hard Drive |
80GB
Western Digital Special Edition 8MB Cache ATA/100 HDD
|
Video Cards |
ATI
Radeon 9700 Pro
|
Content Creation Performance
For "Content Creation" performance we use two benchmarks - the new Content Creation Winstone 2003 and Internet Content Creation SYSMark 2002. Since SYSMark isn't the best for comparing AMD and Intel CPUs, the focus here should be on Content Creation Winstone 2003, so we start with a description of the benchmark from its creators, VeriTest:
Multimedia Content Creation Winstone is a system-level, application-based benchmark that measures a PC's overall performance when running top, Windows-based, 32-bit, multimedia content creation applications on Windows 2000 (SP2 or higher), Windows 98, Windows ME, and Windows XP. Multimedia Content Creation Winstone 2003 uses the following applications:
Adobe® Photoshop® 7.0
Adobe® Premiere® 6.0
Macromedia® Director 8.5.1
Macromedia® Dreamweaver 4
Microsoft® Windows MediaTM Encoder 7.01.00.3055
Netscape® 6.2.3
NewTek's LightWave® 7.5
Sonic Foundry® Sound Forge® 6.0Following the lead of real users, Multimedia Content Creation Winstone 2003 keeps multiple applications open at once and switches among those applications. Multimedia Content Creation Winstone 2003 is a single large test that runs the above applications through a series of scripted activities and returns a single score. Those activities focus on what we call "hot spots," periods of activity that make your PC really work--the times where you're likely to see an hourglass or a progress bar
Content Creation Performance
Content Creation Winstone 2003
Intel Pentium 4 3.06GHz
Intel Pentium 4 3.06GHz HT
Intel Pentium 4 2.80GHz
Intel Pentium 4 2.66GHz
Intel Pentium 4 2.53GHz
Intel Pentium 4 2.50GHz
AMD Athlon XP 2800+ (2.25GHz)
AMD Athlon XP 3000+ (2.167GHz) Barton
Intel Pentium 4 2.40GHz
AMD Athlon XP 2700+ (2.167GHz)
Intel Pentium 4 2.26GHz
AMD Athlon XP 2600+ (2.083GHz)
Intel Pentium 4 2.20GHz
AMD Athlon XP 2400+ (2.00GHz)
Intel Pentium 4 2.0AGHz
AMD Athlon XP 2200+ (1.80GHz)
AMD Athlon XP 2100+ (1.73GHz)
Intel Pentium 4 2.0GHz
Intel Pentium 4 1.8AGHz
AMD Athlon XP 2000+ (1.67GHz)
Intel Pentium 4 1.9GHz
AMD Athlon XP 1900+ (1.60GHz)
Intel Pentium 4 1.8GHz
AMD Athlon XP 1800+ (1.53GHz)
Intel Pentium 4 1.6AGHz
AMD Athlon XP 1700+ (1.47GHz)
Intel Pentium 4 1.7GHz
AMD Athlon XP 1600+ (1.40GHz)
Intel Pentium 4 1.6GHz
AMD Athlon XP 1500+ (1.33GHz)
Intel Pentium 4 1.5GHz
49.5
48.6
46.8
45.3
43.7
42
41.6
40.9
40.6
40.4
40.1
39.1
38.1
37.1
35.5
34.3
33.5
33
32.8
32.5
31.8
31.4
30.6
30.5
29.7
29.5
29.2
28.4
27.9
27.3
26.5
|
0|
10|
20|
30|
40|
50|
59
A predominantly multimedia based benchmark, we see that the added L2 cache doesn't help AMD at all and the old 2800+ manages to outperform the new Barton based 3000+. In the end, the Pentium 4 comes out on top by a decent margin.
The important thing to take away from this CPU scaling graph is to note how well the Pentium 4 fared after the move to the Northwood core (look at the 2000 mark on the graph). The performance gap between the Pentium 4 and Athlon XP seems to be growing in favor of Intel, as is evident by the increasing differential towards the end of the graph.
Content Creation Performance (continued)
SYSMark 2002 isn't the world's best CPU benchmark for comparing the Athlon XP to the Pentium 4, but it does still make for a good comparison of processors within a given family. Thus we include it here to compare the Barton core to its predecessors, but first here's a list of the applications found in the Internet Content Creation suite of the benchmark:
Adobe Photoshop® 6.01
Adobe Premiere® 6.0
Microsoft Windows Media Encoder 7.1
Macromedia Dreamweaver 4
Macromedia Flash 5
|
Despite the shortcomings of the benchmark, ICC SYSMark 2002 paints a picture very similar to what we saw in Content Creation Winstone 2003. The main difference here is that the Barton core actually gives AMD a relatively decent boost.
Once again we see that the Pentium 4 only becomes competitive above 2GHz (the sub-2GHz Northwoods were left out of the CPU scaling charts for simplicity's sake), but the performance difference between the Pentium 4 and Athlon XP is somewhat exaggerated by this particular benchmark. What SYSMark is good for however is comparing CPUs within a particular microprocessor family, in which case you can see how much of a benefit the added L2 cache of the 3000+ gives the Athlon XP by that spike at the very end of the green line.
General Usage Performance
Although not as performance-critical as content creation applications, it is the set of every day applications like Office and other general usage programs that the majority of users find themselves interacting with the most, thus performance here is also very important.
We start with VeriTest's Business Winstone 2002:
The Business Winstone tests are "market-centered" tests. Business applications are the popular applications employed by most users every day.
Five Microsoft Office 2002 applications (Access, Excel, FrontPage, PowerPoint, and Word)
Microsoft Project 2000
Lotus Notes
WinZip 8.0
Norton AntiVirus
Netscape Communicator
|
The Athlon XP does extremely well in business/general usage applications as is made evident by Business Winstone 2002. The primary reason for this is that these applications are predominantly integer applications, meaning their code makes use of the CPU's integer execution units. By nature, integer code has a great deal of conditional branches, mostly in the form of equality testing (e.g. if x = 0 then y) which can greatly penalize a long-pipeline architecture such as that employed by the Pentium 4. The Northwood core helped the Pentium 4 keep up in these situations but overall, the Athlon XP is still the best bang for your buck here and the highest performer with the XP 3000+.
General Usage Performance (continued)
Next we have the Office portion of SYSMark 2002; once again, not the end-all be-all comparison benchmark for differing architectures, but good at comparing performance within individual processor families.
The applications tested include:
Microsoft Word 2002
Microsoft Excel 2002,
Microsoft PowerPoint 2002
Microsoft Outlook 2002,
Microsoft Access 2002,
Netscape Communicator® 6.0
Dragon NaturallySpeaking Preferred v.5
WinZip 8.0
McAfee VirusScan 5.13.
|
Gaming Performance - Unreal Tournament 2003 (Flyby)
With this review we continue to use the final retail version of Unreal Tournament 2003 as a benchmark tool. The benchmark works similarly to the demo, except there are higher detail settings that can be chosen. As we've mentioned before, in order to make sure that all numbers are comparable you need to be sure to do the following:
By default the game will detect your video card and assign its internal defaults based on the capabilities of your video card to optimize the game for performance. In order to fairly compare different video cards you have to tell the engine to always use the same set of defaults which is accomplished by editing the .bat files in the X:\UT2003\Benchmark\ directory.
Add the following parameters to the statements in every one of the .bat files located in that directory:
-ini=..\\Benchmark\\Stuff\\MaxDetail.ini -userini=..\\Benchmark\\Stuff\\MaxDetailUser.ini
For example, in botmatch-antalus.bat will look like this after the additions:
..\System\ut2003 dm-antalus?spectatoronly=true?numbots=12?quickstart=true -benchmark -seconds=77 -exec=..\Benchmark\Stuff\botmatchexec.txt -ini=..\\Benchmark\\Stuff\\MaxDetail.ini -userini=..\\Benchmark\\Stuff\\MaxDetailUser.ini -nosound
Remember to do this to all of the .bat files in that directory before running Benchmark.exe.
|
The performance here is very competitive between the Athlon XP and the Pentium 4, with the new 3000+ just barely edging out the 3.06GHz Pentium 4.
This type of a CPU scaling graph is very common it seems, with the higher speed Athlon XPs and Pentium 4s closing in on each other quite quickly in an effort to keep the race very close.
Gaming Performance - Unreal Tournament 2003 (Botmatch)
With previous versions of UT2003, Botmatch couldn't be used to compare different systems as there was a bug in the benchmark that could cause inflated numbers on AMD systems vs. Intel systems. We went to Epic with the problem and they provided us with a beta patch in time for this review, the fix will make it into the next publicly available patch release in several weeks.
For those of you that aren't familiar, the Botmatch test focuses mostly on physics and artificial intelligence performance in UT2003, the two areas that are the most CPU dependent in the game.
|
Here we see a continued advantage by the Athlon XP, with the XP 3000+ still outpacing the 3.06GHz Pentium 4.
Gaming Performance - 3DMark 2001SE
Although not much of a real-world gaming benchmark, 3DMark 2001SE has become a very popular method of measuring system performance so we include it here to compare the impact of CPU speed on 3D graphics performance.
Remember the job of the CPU here is to mainly facilitate in sending vertices to the GPU as well as any physics/AI going on in the game scenarios.
|
The 3000+ earns its rating by coming in right next to the 3.06GHz Pentium 4.
Gaming Performance - Quake III Arena
An extremely dated benchmark, Quake III Arena has become much more of a CPU and platform test than anything because of the fact that current generation graphics cards are no where near stressed by it. We used our old 1.29f build of the game with the classic demo "four" at High Quality defaults, with everything maxed out at 1024x768.
|
A 7% advantage goes to the 3.06GHz Pentium 4 here, holding the Athlon XP 3000+ back.
If you notice, the Pentium 4 overtakes the Athlon XP at just past the 2400 mark; what happens at this point is the introduction of the 533MHz FSB for the Pentium 4, indicating that the performance here was FSB/memory bus limited for the Pentium 4.
Gaming Performance - Jedi Knight 2
|
The performance between the two flagships continues to be very close under our second Quake III based game - Jedi Knight 2.
Gaming Performance - Comanche 4
|
Video Encoding Performance - DiVX/XMpeg 4.5
What was once reserved for "professional" use only has now become a task for many home PCs - media encoding. Today's media encoding requirements are more demanding than ever and are still some of the most intensive procedures you can run on your PC.
We'll start off with a "quick" conversion of a DVD rip (more specifically, Chapter 40 from the Star Wars Episode I DVD) to a DiVX MPEG-4 file. We used the latest DiVX codec (5.03) in conjunction with Xmpeg 4.5 to perform the encoding at 720 x 480.
We set the encoding speed to Fastest, disabled audio processing and left all of the remaining settings on their defaults. We recorded the last frame rate given during the encoding process as the progress bar hit 100%
|
The Pentium 4 dominates in this test and the boost from Hyper-Threading is nothing short of impressive, the added L2 cache of Barton isn't helping much here.
This chart helps you see exactly how much of a boost Hyper-Threading gives Intel, that last spike is all because of Hyper-Threading.
Video Encoding Performance - Windows Media Encoder 9.0
For our next video encoding test we took Windows Media Encoder 9.0 and encoded the same chapter from the Star Wars Episode I DVD into a 2Mbps VBR WMV file using Media Encoder's built in 2Mbps DVD VBR settings. The time reported is in minutes to encode, lower being better obviously:
|
Although the performance advantage isn't as severe, we still see the Pentium 4 doing very well in this encoding test. The performance boost from Hyper-Threading is still significant, and as we explained earlier, the lack of temporal locality in this sort of a usage model keeps the added L2 cache from letting the 3000+ outperform even its 2800+ predecessor.
Video Encoding Performance - Quicktime 6.0 Professional
Our final encoding test takes the same source file and encodes it to a MPEG-4 Quicktime file for streaming over the Internet. We used the Normal MPEG-4 settings for the conversion, and the time was recorded in minutes (lower being better).
|
The performance gap narrows between the Pentium 4 3.06 and Barton, but the favor still goes to the Pentium 4 here.
3D Rendering Performance - 3dsmax R5
When the Athlon was first released over 3 years ago, 3D Studio MAX was a strong point of its performance. The Athlon's raw FPU performance was right up 3dsmax's ally and thus it put Intel's competing solutions (at the time, the Pentium III) to shame. Things have changed a bit, the latest version of 3ds max (R5) does have some Pentium 4 optimizations that keep things quite competitive between the Athlon XP and the Pentium 4.
For our 3ds max 5 benchmarks we chose all of the benchmark scenes that ship with the product - SinglePipe2.max, Underwater_Environment_Finished.max, 3dsmax5_rays.max, cballs2.max and vol_light2.max.
|
3D Rendering Performance - 3dsmax R5 (2)
|
3D Rendering Performance - 3dsmax R5 (3)
|
3D Rendering Performance - 3dsmax R5 (4)
|
3D Rendering Performance - 3dsmax R5 (5)
|
3D Rendering Performance - Maya 4.0.1
|
3D Rendering Performance - Lightwave 3D 7.5
While 3dsmax 5 is SSE2 optimized, the level of optimization is nowhere near what NewTek reported with Lightwave upon releasing version 7.0b. The performance improvements offered by the new SSE2 optimized version were all above 20% using NewTek's supplied benchmarking scenes.
We chose three benchmarks to use, two of the lesser SSE2 optimized scenes and another that is more optimized just to get an idea of the potential that lies for Pentium 4 users running heavily optimized application.
|
3D Rendering Performance - Lightwave 3D 7.5 (2)
|
3D Rendering Performance - Lightwave 3D 7.5 (3)
|
Final Words
AMD's first CPU of the year and it's still not the elusive Hammer, but as the benchmarks show, it doesn't need to be. In many cases the Athlon XP 3000+ can outperform the 3.06GHz Pentium 4, while in others it manages to tie with Intel's flagship and yet in others it falls behind just as much. The overall performance is close enough to warrant the 3000+ rating in some cases, but there's no question that it is a very close call between the two top performing CPUs. Looking at the CPU scaling charts alone you can get an idea for how competitive the two CPU families have become, as the Pentium 4 improved in performance and the Athlon XP continued to mature.
The areas in which the Athlon XP does quite well, including the new Barton core, are its conventional strong points; in business applications it dominates the Pentium 4, showing off a very conservative model rating, in games the chip is quite competitive with Intel but once we shift to the newer multimedia, encoding and rendering environments the Athlon XP is no longer able to do so well. This goes back to AMD's philosophy of building the best hardware to run software without requiring much optimization, unfortunately for their teams in Austin, a lot of the multimedia, encoding and rendering applications we're talking about are very Pentium 4-friendly these days. Then we have the issue of workloads that benefit from Hyper-Threading, an area we did not stress much in this review but one that carries much potential to differentiate the Athlon XP from the Pentium 4. When Prescott hits with its improved Hyper-Threading and larger caches that are conducive to better HT performance, it will be interesting to see how negative AMD remains on Intel's latest feature.
It is very interesting to note the relatively small performance improvement that resulted from the additional L2 cache, at least when you compare the impact of Barton to the impact Northwood had on the Pentium 4. We have to wonder if the added L2 cache and its accompanying 17 million transistors is worth it for the Athlon XP at this point; but with increasing clock speed becoming quite difficult without SOI or 90nm and no desire to add any additional functionality to the K7 architecture with Hammer just around the corner, those 17 million transistors may have been the simplest ticket to improving performance (as strange as that may sound).
With Barton launched, the focus once again shifts to Athlon 64 but we have a feeling it will be a very close battle throughout 2003 for AMD and Intel. The Athlon 64 may unequivocally tilt the balance in favor of AMD, but then there's Prescott to worry about. If the 512KB improvement numbers were any indication, moving to a full 1MB L2 cache in Prescott, combined with a larger L1 cache and improved Hyper-Threading could make for a powerful competitor. Between now and then, AMD will have to contend with not higher clock speeds, but much higher performing platforms as Intel readies their line of Springdale and Canterwood 800MHz FSB chipsets for launch in the second quarter.
Barton will keep AMD in the game, but Hammer is still quite necessary...