Performance & Efficiency Cores For Servers

HotChips 2023 was held August 27-29, 2023 at Stanford University in California and was the first in-person version of the conference in 4 years. The conference was held in a hybrid format that had over 500 participants in-person and over 1,000 attending virtually online. Topics covered a broad range of advancements in computing, connectivity, and computer architecture.

Both AMD and Intel gave presentations on their new server class processors. I found these talks interesting for several reasons, but primarily for the decision that they each made to produce a performance version and an energy efficient version of their cores for the server market.

We can go back to ARM’s big.LITTLE concept from over a decade ago and MediaTek’s Tri-Gear designs that used different cores with different power vs. efficiency profiles to achieve better performance and power efficiency. Some key differences here though are that those cores were targeted for the mobile application processor market and were implemented together on the same piece of silicon. The new cores announced by AMD and Intel are targeted for servers and currently are not implemented side by side on the same chip.

Another similarity between the two companies’ solutions is the use of chiplets for the core complex dies that plug into a platform framework using separate I/O dies to connect the system together.

Chris Gianos, Intel Fellow, presented the slide below in figure 1:

Fig. 1: Intel modular SoC architecture.

This slide diagrams the way that compute chiplets can be added to the system to generate different products. The P-Core, used in Granite Rapids, is “drop-in” compatible with their E-Core used in Sierra Forest. The two cores are different designs though that were specifically targeted towards performance and efficiency respectively. The E-Core doesn’t include AVX or AMX instructions that are found in the P-Cores as customers need to decide in advance how much performance per core they need and what types of applications they want to run.

Kai Troester, AMD Fellow, presented their Zen 4 core along with Ravi Bhargava, AMD Sr. Fellow. Zen 4, compared to previous generation AMD cores, has larger and improved conditional branch predictors with improved fetch bandwidth of up to 2 taken branch predictions per cycle. A larger op cache for improved redirect latency and power is also incorporated. Zen 4 includes AVX-512 support and with V-cache, up to 96MB of L3 per 8 cores.

Zen 4c is a compact version of Zen 4, optimized for density and power efficiency. It has the same IPC and features of Zen 4 with a lower max frequency, smaller area, and increased power efficiency. Zen 4c can contain 33% more cores in the same power envelope and IPC as Zen 4. Kai said it’s, “the same core (as Zen 4), but smaller because it has a lower frequency target so it’s more power efficient.” The diagram from his presentation is shown below in figure 2 and it shows the tradeoffs in terms of frequency for the advantages in area and power efficiency.

Fig. 2: Zen 4 vs. Zen 4c IPC, frequency, area, and power efficiency tradeoff.

AMD has already used chiplets as part of their system architecture and continues that practice using Zen 4 performance cores for their Genoa parts with up to 96 cores and Zen 4c efficiency cores for their Bergamo parts with up to 128 cores. The diagram below in figure 3 shows how the two cores can be used in the same system framework to produce different products better targeted for specific markets.

Fig. 3: Zen 4 vs. Zen 4c configuration.

Ravi mentioned that the common I/O Die continues to build upon AMD’s Infinity Fabric to scale across multiple core complex dies to create a broad family of server parts.

Dan Soltis, Sr. Principal Engineer at Intel, said that the E-Core used in Sierra Forest incorporates dual 3 wide out of order decoders that are more power efficient and have better latency than a single 6 wide decoder. The E-Core is also a single-threaded core (another difference from the P-Core). For a chiplet with 4-cores + L2, all these components are on the same clock and VDD and the L2 is the interface to the mesh, so all 4-cores would dynamically voltage and frequency scale (DVFS) together. Dan also presented the slide shown below in figure 4 that represents the tradeoff in performance and efficiency between the new P-Core and E-Core and the older Sapphire Rapids core. The chart also shows the ~2.5x scaling improvement going from Sapphire Rapids to Sierra Forrest.

Fig. 4: Intel’s P-Core and E-Core deployment view.

In figure 4, designs that are up and to the right are superior to others. This shows the advantage that the E-Core (Sierra Forest) systems have in terms of efficiency and the relative performance advantage of the P-Core (Granite Rapids).

The use of chiplets by major server CPU providers has allowed a greater capability to tailor parts for different markets and the use of differentiated cores expands upon that capability to better address different market needs. It should be exciting to watch these products compete in the marketplace and how the underlying chiplet framework will enable more innovation and tailoring to address new markets in the future.

This article originally appeared on Semiconductor Engineering

Cookie	Duration	Description
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__hssc	session	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hstc	session	This is the main cookie set by Hubspot, for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_F75FB76KBW	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_60322432_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
hubspotutk	session	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Performance & Efficiency Cores For Servers

Download our Collateral

Get in Touch With Us

Blog

Performance & Efficiency Cores For Servers

Download our Collateral

Get in Touch With Us

STAY INFORMED

Subscribe to our News