Conclusion & End Remarks

We’ve been hearing about Arm in the server space for many years now, with many people claiming “it’s coming”; “it’ll be great”, only for the hype to fizzle out into relative disappointment once the performance of the chips was put under the microscope. Thankfully, this is not the case for the Graviton2: not only were Amazon and Arm able to deliver on all of their promises, but they've also hit it out of the park in terms of value against the incumbent x86 players.

The Graviton2 is the quintessential reference Neoverse N1 platform as envisioned by Arm, aiming for nothing less than disruption of the datacentre market and making Arm servers a competitive reality. The chip is not only  able to compete in terms of raw throughput thanks to its 64 physical cores in a single socket, but it also manages to showcase competitive single-thread performance, keeping in line with AMD and Intel systems in the market.

The Amazon chip isn’t perfect, we definitely would have wanted to see more L3 cache integrated into the mesh interconnect as the 32MB does seem quite mediocre for handling 64 cores, and the chip does suffer from this aspect in terms of its performance scaling in memory heavy workloads. Only Amazon knows if this is a real-world bottleneck for the chip and the kind of workloads that are typical in the cloud.

Performance wise, there’s a big empty outline of an elephant in the room that's been missing from our data today, and that’s AMD’s new EPYC2 Rome processors. AMD has showed it had been able to vastly scale performance and do away with a lot of the limitations presented by the first generation EPYC processors that we saw today. Even if we can somewhat estimate the performance that Rome would represent against the Graviton2, we don’t have any idea of what kind of pricing Amazon will be launching the new c5a type instances at.

In terms of value, the Graviton2 seemingly ends up with top grades and puts the competition to shame. This aspect not only will be due to the Graviton2’s performance and efficiency, but also due to the fact that suddenly Amazon is now vertically integrated for its EC2 hardware platforms. If you’re an EC2 customer today, and unless you’re tied to x86 for whatever reason, you’d be stupid not to switch over to Graviton2 instances once they become available, as the cost savings will be significant.

What does this mean for non-Amazon users? Well the Arm server has become a reality, and companies such as Ampere and their new Altra server chips are trying to quickly follow up with the same recipe as the Graviton2 and offer similar ready-made meals for the non-Amazons of the world. These chips however will have to compete with AMD’s Rome, and later in the year the new Milan, which won’t be easy. Meanwhile Intel doesn’t seem to be a likely competitor in the short term while they’re attempting to resolve their issues.

Long-term, things are looking bright for the Arm ecosystem. Arm themselves are aiming to maintain a yearly 20-25% compound annual growth rate for performance, and Ampere already stated they’re looking for yearly hardware refreshes. We don’t know Amazon’s plans, but I imagine it’ll be similar, if not skipping some generations. Around the 2022 timeframe we should see Matterhorn-based products, Arm’s new Very Large™ CPU microarchitecture which should again accelerate things dramatically. In a similar sense, the newly founded Nuvia has lofty goals for their entrance into the datacentre market, and they do have the design talent with a track record to possibly deliver, in a few years’ time.

The Graviton2 is a great product, and we’re looking forward to see more such successful designs from the Arm ecosystem.

Cost Analysis - An x86 Massacre
Comments Locked

96 Comments

View All Comments

  • SarahKerrigan - Tuesday, March 10, 2020 - link

    That single-thread performance is extremely impressive. The multithreaded scaling is ugly, though. Back when N1 was announced, ARM seemed to think 1MB/core was a good spot for Neoverse LLC - I wonder why both Graviton and Altra are going for considerably less.
  • shing3232 - Tuesday, March 10, 2020 - link

    it's gonna costly(die and power wise) to build a interconnect for 64C with good performance. by the time, it would lost its power/perf edge I suppose.
  • Tabalan - Tuesday, March 10, 2020 - link

    Scaling might not be optimal, but performance loses are to expected if you greatly reduce available cache. In the end, MT performance is still far ahead of competition.
  • ballsystemlord - Thursday, March 12, 2020 - link

    You have to remember that the competition is not 64 cores, but 64v cpus. The difference is 60% or more. The Arm Graviton2 is being placed into the best possible light by this comparision.
  • ballsystemlord - Thursday, March 12, 2020 - link

    I mean 60% for the cores that are actually 1 thread. As in, the performance boost by turning on SMT is 40% best case scenario.
  • autarchprinceps - Sunday, October 25, 2020 - link

    I have to disagree. You seem to forget that the arm chip is cheaper. It’s an additional win if it manages to integrate more cores and yet still achieve a comparable single threaded performance. It’s not unfair to compare two products with one seeming to have a stat advantage from the start, if it’s still cheaper or costs the same. Why should a customer care?
  • zamroni - Thursday, March 12, 2020 - link

    L caches uses sram which needs 6 transistors per bit.
    So, every 1MB needs all least 48 millions transistors without counting transistors for the controller
  • dianajmclean6 - Monday, March 23, 2020 - link

    Six months ago I lost my job and after that I was fortunate enough to stumble upon a great website which literally saved me• I started working for them online and in a short time after I've started averaging 15k a month••• ic­ash68.c­­o­­­­M
  • RallJ - Tuesday, March 10, 2020 - link

    Comparisons made are to the whole core performance of Graviton to just thread performance of Xeon/EPYC. It's very problematic.

    Also TDP rating for the graviton is off by 50% based on what was reported at re:Invent.
  • Andrei Frumusanu - Tuesday, March 10, 2020 - link

    I go over the core/SMT topic in the article, it's only a problem from a hardware comparison aspect, but it's very much the correct comparison from a cloud product offering comparison. The value proposition also does not change depending on core count, the instances are priced at similar tiers.

Log in

Don't have an account? Sign up now