FPGA for Aggregate Processing: The Good, The Bad, and The Ugly
M. Eryilmaz,
Aarati Kakaraparthy,
Jignesh M. Patel,
Rathijit Sen,
and Kwanghyun Park
In International Conference on Data Engineering (ICDE)
2021
In this paper, we focus on current CPU-FPGA
architectures and study their usability for database management
systems. To focus our scope, we choose aggregation as the
query processing primitive for this investigation. We implement
a fully pipelined stall-free module that performs aggregation
on the FPGA, and also describe a performance model that
predicts the runtime of this module with 99% accuracy. We
study the performance of this module on two different CPU-FPGA architectures, namely remote-main-memory and bump-inthe-wire. Compared to an implementation of aggregation on CPU,
we find that the former is 1.7× slower whereas the latter is 2.2×
faster. This significant performance gap suggests two important
architectural considerations when designing CPU-FPGA systems,
namely the bandwidth ceiling and the resource ceiling, while also
highlighting issues of switching times and programmer efficiency.
We consider broader hardware trends to study the suitability
of the two FPGA architectures for accelerating the aggregation
operation, and find that the performance gap is likely to stay in
the coming future. Based on these observations, we discuss some
challenges and opportunities for CPU-FPGA architectures.