Overlay architectures implemented on FPGA devices have been proposed as a means to increase FPGA adoption in general-purpose computing. They provide the benefits of software such as flexibility and programmability, thus making it easier to build dedicated compilers. However, existing overlays are generic, resource and power hungry with performance usually an order of magnitude lower than bare metal implementations. As a result, FPGA overlays have been confined to research and some niche applications. In this paper, we introduce Application-Specific FPGA Overlays (AS-Overlays), which can provide bare-metal performance to FPGA overlays, thus opening doors for broader adoption.
In this paper, we propose an automated framework that takes as input a TensorFlow inference graph and generates high-performance accelerators on FPGA by assembling CNN pre-implemented components as a puzzle, based on the graph topology. Using pre-implemented components allows us the only use the minimum of resources necessary, predict the performance and a gain in productivity We adopt a unified representation based on systolic array to perform the computational-hungry operations of the model and provide novel analysis of design trade-offs for FPGA CNN accelerators. Experimental results show the great performance, low latency and flexibility provided by the proposed framework.
In this paper, we investigate the potential of an immersion technology system implemented on embedded devices. Our system consists of distributed smart cameras with overlapping views, covering numerous viewpoints of a monitored scene so that each smart camera knows the position of its neighbors. The system provides an on-demand panoramic field-of-view (FOV) in real-time. To generate the panoramic view, we design and implement an image stitching system using images captured from a subset of adjacent embedded cameras. We verify the effectiveness of our method in terms of quality of result (QoR) and computation efficiency. Initial results show up 12 FPS with 8MP cameras.