Omegaflow: a high-performance dependency-based architecture

Abstract

This paper investigates how to better track and deliver dependency in dependency-based cores to exploit instruction-level parallelism (ILP) as much as possible. To this end, we first propose an analytical performance model for the state-of-art dependency-based core, Forwardflow, and figure out two vital factors affecting its upper bound of performance. Then we propose Omegaflow,a dependency-based architecture adopting three new techniques, which respond to the discovered factors. Experimental results show that Omegaflow improves IPC by 24.6% compared to the state-of-the-art design, approaching the performance of the OoO architecture with an ideal scheduler (94.4%) without increasing the clock cycle and consumes only 8.82% more energy than Forwardflow.

Publication
In Conference on Supercomputing

Omegaflow是我博士期间的一个研究工作,它是一种基于前向数据流的思想的指令调度和执行架构。 前向数据流架构与传统Tomasulo算法不同,由producer指令记录consumer的位置,并在计算完成后主动向consumer指令传递数据。 基于此,每次指令计算完成之后,只需要传递有限次包含唤醒信息的Token就可以唤醒所有依赖于该指令的consumer指令。 而在传统Tomasulo算法中,无论是隐式重命名需要向整个issue queue广播寄存器tag和value, 而显示重命名则需要向整个issue queue广播寄存器tag。 由于指令的依赖存在局部性,前向数据流架构可以将调度和执行引擎划分为多个组,组间的通信显著少于组内的通信, 从而实现可扩展的指令窗口。 Omegaflow为前向数据流架构提出了一种性能上限分析工具,并改进了Token的处理和传输速度。 (虽然但是,如论文中报告的一样,前向数据流架构仍然无法outperform 传统Tomasulo算法。) Omegaflow的代码在:Omegaflow project

Zhou, Yaoyang
Zhou, Yaoyang
Architect of LLM DSA; Maintainer of u-arch simulator for Xiangshan; PhD of Computer Architecture

I specialize in LLM DSA and CPU micro-architecture.