AMD recently released a patent to spread the rendering load across multiple GPU chiplets. A game scene is divided into individual blocks and distributed to chiplets to optimize the use of shaders in games. Two-level chip grouping is used for this.
AMD releases patent for application of GPU chips to create better use of shader technology
The new patent released by AMD opens up more insight into what the company plans to do with next-level GPU and CPU technology in the years to come. At the end of June, fifty-four patent applications were found to be sent for publication. It’s unclear which of the more than fifty published patents will be used in AMD’s plans. The applications discussed in the patents detail the company’s approaches over the following years.
An application that was noticed by community member @ETI1120 on the ComputerBase website, patent number US20220207827, processes critical image data in two stages to efficiently transmit rendering loads from a GPU across many chiplets . CPU initially applied it to the US Patent Office late last year.
When image data on a GPU is rasterized by standard means, the shader unit, also known as the ALU, performs the same task and assigns a color name to individual pixels. In turn, textured polygons found at the specific pixel in a particular game scene are mapped directly to the pixel. Finally, the formulated task will keep atypical principles and will differ only by other textures located at different pixels. This method is called SIMD, or Single Instruction – Multiple Data.
For most of today’s games, shading isn’t the only task a GPU spawns. But instead, several post-processing elements are included after the initial shading. Actions that the GPU would add, for example, would be antialiasing, shading, and game environment occlusions. However, ray tracing happens in tandem with shading, creating a new method of computation.
When talking about the GPU controlling the graphics in today’s games, the load created by the computer is increased exponentially to thousands of compute units.
In GPU games, this compute load scales up to several thousand compute units quite ideally. It differs from CPUs in that applications must be specifically written to add more cores. The CPU scheduler creates this action, dividing the work of the GPU into more digestible tasks processed by the compute units, also known as binning. The game image is rendered and then split into separate blocks containing a set number of pixels. The block is calculated by a subunit of the graphics processor, where it is then synchronized and created. After this action, the pixels awaiting calculation are included in a block until the subset of the graphics card is finally used. Considerations are made for shader computing power, memory bandwidth, and cache sizes.
AMD explains in the patent that splitting and joining requires a deep and complete data connection between all elements of the GPU, which is problematic. Data links not located on the die have a high level of latency, which slows down the process.
Processors made this transition to chiplets effortless thanks to the ability to send the task across multiple cores, making it accessible to chiplets. GPUs don’t offer the same flexibility, making their scheduler comparable to an introductory dual-core processor.
AMD recognizes the need and attempts to address these issues by modifying the rasterization pipeline and dispatching tasks between multiple GPU chiplets, similar to CPUs. This requires advanced binning technology, which the company introduces “two-level binning”, also known as “hybrid binning”.
In hyper binning, the division is processed in two separate steps instead of being processed directly in pixel-by-pixel blocks. The first step is to calculate the equation, taking a 3D environment and creating a two-dimensional image from the original. The step is called vertex shading and is completed before rasterization, and the process is very minimal on the first GPU chiplet. Once complete, the game scene begins to clump together, expanding into coarse bins and morphing into a single GPU chiplet. Then routine tasks such as rasterization and post-processing can begin.
It’s unclear when AMD intends to start using this new process or if it will be approved. However, it does give us a glimpse into the future of more efficient GPU processing.
Sources of information: ComputerBase, free patents online