Publications

Sextans: A Streaming Accelerator for General-Purpose Sparse-Matrix Dense-Matrix Multiplication

Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of applications including scientific computing, …

RapidStream: Parallel Physical Implementation of FPGA HLS Designs

FPGAs require a much longer compilation cycle than conventional computing platforms like CPUs. In this paper, we shorten the overall …

Accelerating SSSP for Power-Law Graphs

The single-source shortest path (SSSP) problem is one of the most important and well-studied graph problems widely used in many …

Extending High-Level Synthesis for Task-Parallel Programs

C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in …

HBM Connect: High-Performance HLS Interconnect for FPGA HBM

With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory …

AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs

Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in …

When HLS Meets FPGA HBM: Benchmarking and Bandwidth Optimization

With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory …

Exploiting Computation Reuse for Stencil Accelerators

Stencil kernel is an important type of kernel used extensively in many application domains. Over the years, researchers have been …

Analysis and Optimization of the Implicit Broadcasts in FPGA HLS to Improve Maximum Frequency

Designs generated by high-level synthesis (HLS) tools typically achieve a lower frequency compared to manual RTL designs. In this work, …

HeteroHalide: From Image Processing DSL to Efficient FPGA Acceleration

The domain-specific language (DSL) for image processing, Halide, has generated a lot of interest because of its capability of …

FLASH: Fast, ParalleL, and Accurate Simulator for HLS

A large semantic gap between a high-level synthesis (HLS) design and a low-level RTL simulation environment often creates a barrier for …

Rapid Cycle-Accurate Simulator for High-Level Synthesis

A large semantic gap between the high-level synthesis (HLS) design and the low-level (on-board or RTL) simulation environment often …

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Reconfigurable Computing

With the pursuit of improving compute performance under strict power constraints, there is an increasing need for deploying …

SODA: Stencil with Optimized Dataflow Architecture

Stencil computation is one of the most important kernels in many application domains such as image processing, solving partial …

GraphH: A Processing-in-Memory Architecture for Large-scale Graph Processing

Large-scale graph processing requires the high bandwidth of data access. However, as graph computing continues to scale, it becomes …

An Optimal Microarchitecture for Stencil Computation with Data Reuse and Fine-Grained Parallelism

Stencil computation is one of the most important kernels for many applications such as image processing, solving partial differential …

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

The performance of large-scale graph processing suffers from challenges including poor locality, lack of scalability, random access …

FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search

Large-scale graph processing is gaining increasing attentions in many domains. Meanwhile, FPGA provides a power-efficient and highly …

NXgraph: An Efficient Graph Processing System on a Single Machine

Recent studies show that graph processing systems on a single machine can achieve competitive performance compared with cluster-based …

Test–Retest Reliability of Graph Metrics in High-resolution Functional Connectomics: A Resting-State Functional MRI Study

Background: The combination of resting-state functional MRI (R-fMRI) technique and graph theoretical approaches has emerged as a …