Transparent Offloading of Cloud FPGA Accelerators using OpenMP

Speaker: Guido Araujo, University of Campinas.

Abstract: The sheer amount of computing resources required to run modern cloud workloads has put a lot of pressure in the design of power efficient cluster nodes. To address this problem Intel (HARP) and Xilinx (AWS F1) have proposed CPU-FPGA integrated architectures that can deliver efficient power-performance executions. The integration of FPGA acceleration modules to software is a challenging endeavour that requires a seamless programming model. In this presentation we describe HardCloud (www.hardcloud.org), an extension of the OpenMP standard that aims at easing the task of offloading accelerators to cloud FPGAs.

Short bio

Guido Araujo received the PhD in Electrical Engineering from Princeton University, in 1997. He is a full professor of Computer Science and Engineering with UNICAMP. His current research interests include code optimization, parallelizing compilers, FPGA-based acceleration and cloud computing, which are explored in close cooperation with industry partners.

Scalable FPGA-based Accelerators in the Cloud

Speaker: Christophoros Kachris, InAccel.

Abstract: In this talk we will show how machine learning applications (e.g. logistic regression and K-means), based on Spark or other frameworks like Python, Scala and Java, can be accelerated up to 10x using hardware accelerators that are deployed in the cloud without any changes on the Spark code. The talk will describe how to utilize seamlessly FPGAs in the cloud in order to speedup your applications and at the same time reduce the OpEx without the need to change your code.

Short bio

Dr. Christoforos Kachris is the CEO and cofounder of InAccel and research associate of the National Technical University of Athens. He has more than 15 years of experience on the domain of reconfigurable computing, hardware acceleration and computer architecture. He is also the technical project manager of the H2020 VINEYARD project focused on the efficient utilization of hardware accelerators in the cloud.

FPGAs and the Cloud – An Endless Tale of Virtualization, Elasticity and Efficiency

Speaker: Oliver Knodel, Helmholtz-Zentrum Dresden-Rossendorf (HZDR).

Abstract: The flexible use of reconfigurable devices within a cloud context requires abstraction from the actual hardware through virtualization to offer these resources to service providers. In this talk, we present our Reconfigurable Common Computing Frame (RC2F) approach – inspired by system virtual machines – for the profound virtualization of reconfigurable hardware in cloud services. Using partial reconfiguration, our framework abstracts a single physical FPGA into multiple independent virtual FPGAs (vFPGAs). A user can request vFPGAs of different size for optimal resource utilization and energy efficiency of the whole cloud system. To enable such flexibility, we create homogeneous partitions on top of an inhomogeneous FPGA fabric abstracting from physical locations and static areas. On the host side our Reconfigurable Common Cloud Computing Environment (RC3E) offers different service models and manages the allocation of the dynamic vFPGAs.

Short bio

Since 2018 Oliver Knodel works as data scientist at the The Computational Science Group of the Helmholtz-Centre Dresden-Rossendorf (HZDR). One of his tasks is the integration of FPGAs into the dataflow starting at particle accelerators and ending in the HZDR data centre. In 2018 he finished his doctorate thesis (Dr.-Ing.) with the title „Reconfigurable Hardware in the Context of Cloud-Architectures“ at the Department of Computer Science at Technische Universität Dresden (TUD). From 2011 to 2018 he was research assistant at the chair for VLSI Design, Diagnostics and Architecture at TUD. His research interests are focused on highly parallel architectures, high performance computing, cloud computing and especially application areas for reconfigurable hardware in data centres and clouds.

The UNILOGIC scalable heterogeneous HPC architecture

Speaker: Iakovos Mavroidis, Technical University of Crete (TUC).

Abstract: In order to reach exascale performance current HPC servers need to be improved. Simple scaling is not a feasible solution due to the increasing utility costs and power consumption limitations. The ECOSCALE H2020 project tackles this challenge, introducing the UNILOGI//C architecture, a novel heterogeneous energy-efficient architecture. The proposed architecture follows a hierarchical approach where the system is partitioned into multiple autonomous Workers, i.e. compute nodes. Workers are interconnected in a tree-like structure in order to form larger Partitioned Global Address Space (PGAS) partitions, which are further hierarchically interconnected via an MPI protocol. The UNILOGIC architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, and, more importantly, automated hardware synthesis of these resources from an OpenCL-based programming model. Finally the architecture partitions the HW resources into reconfigurable slots and employs partial reconfiguration to dynamically configure them at runtime.

Short bio

Dr. Iakovos Mavroidis is a member of the Telecommunication Systems Institute in Greece. He received his M.Sc. degree in Electrical Engineering and Computer Science from the University of California at Berkeley in 2001. From 2001 to 2002 he was with MIPS Technologies. In 2011, he received his Ph.D. degree from the Department of Electronic and Computer Engineering at the Technical University of Crete in Greece. From 2011 to 2016 he was a Visiting Professor in Computer Science at the University of Crete, Greece. He was the coordinator of the RAPID H2020 project and he is currently coordinating the ECOSCALE H2020 project.

Unconventional Computing with Reconfigurable Devices in the Cloud

Speaker: Michaela Blott, Xilinx.

Abstract: Conventional von-Neumann architectures are suffering from rising power densities and performance scaling is slowing down with next generation technology nodes. At the same time we are faced with an explosion of data and sky-high compute requirements associated with the roll-out of machine learning algorithms. Reconfigurable logic can tailor the hardware to the application through customized datapaths and memory architectures. Thereby much higher energy efficiencies are available compared to conventional CPU- and GPU-based solutions. This has stimulated interest in their exploitation within power-hungry data centers with recent benchmarks showing that FPGA-based application acceleration can bring orders of magnitude improvement in regards to performance and performance per Watt compared to their counterparts. During this talk, we broadly characterize a range of applications and explore how through these unconventional customized compute architectures new levels of performance scalability and compute efficiency can be unleashed.

Short bio

Michaela Blott is a Principal Engineer at Xilinx Research, where she is heading a team of international scientists, driving research into new application domains for FPGAs, such as machine learning, and hyperscale deployments, high-speed networking in the past. She graduated from the University of Kaiserslautern in Germany and brings over 25 years of experience in computer architecture, FPGA and board design, working in both research institutions (ETH and Bell Labs) as well as development organizations. She is strongly involved with the international research community as technical co-chair of FPL’2018, industry advisor on numerous projects, and serves on numerous technical program committees (DATE, FPGA, FPL, GLOBALSIP, Hipeac).

Architectures and Synthesis for Streaming Accelerators on FPGA

Speaker: John McAllister, Queen’s University Belfast.

Abstract: FPGA are currently making major inroads in mainstream computing applications but the technology's roots are very firmly in the field of custom logic design. This combination makes them a fascinating, and extremely taxing, focus for design conventions and tools from the two differing domains. This talk proposes that, if both sides of the design coin (software programmability and custom logic) are considered as complementary, important benefits in the productivity of the accelerator design process, and the performance of the results, follow. By framing high-level synthesis of streaming operators as a custom multicore processor design problem accelerators with a prescribed performance and cost requirements can be automatically derived, at the push of a button, from an input program. This further supports more abstract programming models to ease system specification. This talk will investigate a number of design examples and present emerging FPGA accelerator design challenges.

Short bio

John McAllister is a Senior Lecturer and Fellow of the Institute of Electronics, Communications and Information Technology (ECIT) at Queen’s University Belfast and an Associated Researcher at INSA Rennes. His research addresses architectures and synthesis for Field Programmable Gate Array (FPGA) accelerators. He was founding Chief Technology Officer of Analytics Engines Ltd. and founding Chief Editor of the IEEE Signal Processing Society Resource Center (rc.signalprocessingsociety.org). He is an Area Editor of IEEE Trans. Signal Processing and Vice-Chair of the IEEE Technical Committee on Design and Implementation of Signal Processing Systems.

The NECSTLab Multi-Faceted Experience with AWS F1 - Teaching, Research Framework and Application stack

Speaker: Marco Santambrogio, Politecnico di Milano.

Abstract: The increasing demand for computing power in fields such as biology, finance, machine learning is pushing the adoption of reconfigurable hardware in order to keep up with the required performance level at a sustainable power consumption. Within this context, FPGA devices represent an interesting solution as they combine the benefits of power efficiency, performance and flexibility. Nevertheless, the steep learning curve and experience needed to develop efficient FPGA-based systems represents one of the main limiting factor for a broad utilization of such devices. In this talk, I will first present CAOS, a framework which helps the application designer in identifying acceleration opportunities and guides through the implementation of the final FPGA-based system. The CAOS platform targets the full stack of the application optimization process, starting from the identification of the kernel functions to accelerate, to the optimization of such kernels and to the generation of the runtime management and the configuration files needed to program the FPGA. After CAOS, I will present the HUGenomics projects, based on the CAOS framework. The unique genetic profile of a species is leading to the development of customized treatments, from personalized medicine to agrigenomics, but the exponential growth of available genomic data requires a computational effort that may limit the progress of these fields. The HUGenomics framework aims at facilitating genome assembly process by means of both hardware accelerated algorithms and scientific data visualization tools. Indeed, the system raises the level of abstraction allowing users to easily integrate custom algorithms into the hardware pipeline without any knowledge of the underneath architecture.

Short bio

Marco Santambrogio is an Associate Professor at Politecnico di Milano since 2018, and an Adjunct Professor del College of Engineering of the University of Illinois at Chicago (UIC) since 2009. He was Research Affiliate with the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)from 2010 to 2015. He received his laurea (M.Sc. equivalent) degree in Computer Engineering from the Politecnico di Milano (2004), a M.Sc. degree in Computer Science from the University of Illinois at Chicago (UIC) in 2005 and his PhD degree in Computer Engineering from the Politecnico di Milano (2008). Prof. Santambrogio was a postdoc fellow at CSAIL, MIT.

Building massive-scale super-computing systems in the Cloud

Speaker: Tobias Becker - Maxeler.

Abstract: With an increasing number of phones, smart devices and other sensors, steadily growing amounts of data are being generated and stored in the Cloud. The processing of this data in the Cloud is highly challenging in terms of computational requirements, bandwidth and energy consumption. In order to meet the data processing demands for the next decade, fundamental changes are needed in the way we compute and transfer data. Focusing on data-movement first is key to building massive-scale Cloud systems that do not only optimise how data is processed but also where it is processed. Adding customisable compute capabilities at the edge and in the Cloud will enable efficient and application-aware systems that can perform all the required data processing within affordable energy budgets at all levels.

Short bio

Dr. Tobias Becker is the Head of MaxAcademy at Maxeler Technologies where he coordinates Maxeler’s research activities and university relations. Before joining Maxeler he been has held positions as a researcher in the Department of Computing at Imperial College London, and at Xilinx, Inc. He received a Ph.D. degree in Computing from Imperial College London and a Dipl. Ing. degree in Electrical Engineering from the Technical University of Karlsruhe (now KIT).