# Open source placement and routing tools for FPGAs and their extensions to multi-die FPGAs

Prof. Dirk Stroobandt Ghent University, Belgium

#### **Presentation overview**

- Connection-based physical design
  - Our FPGA CAD framework
  - Liquid placement tool
  - Croute and RWRoute
- Multi-die placement and routing
  - LiquidMD
  - CRouteMD (in progress)

## FPGA CAD FRAMEWORK

#### FPGA CAD FRAMEWORK



#### FPGA CAD FRAMEWORK



https://GitHub.UGent.be/UGent-HES/FPGA-CAD-Framework

Liquid: High-quality Scalable Placement for Large Heterogeneous FPGAs

#### Placement problem



#### Analytical placement



### Liquid placement technique

- Each net is spring between extreme blocks
- Long springs pull harder
- Extra spring for highly critical source-sink connections





E. Vansteenkiste, S. Lenders, and D. Stroobandt, "Liquid: Fast placement prototyping through steepest gradient descent movement," in Field Programmable Logic and Applications (FPL), 2016 26th International Conference on. IEEE, 2016, pp. 1–4.

### Liquid placement technique

Is it necessary to exactly solve the linear system?



Legalization partly destroys the solution

Optimize system by moving each block several times in the direction that reduces the placement cost the most









E. Vansteenkiste, S. Lenders, and D. Stroobandt, "Liquid: Fast placement prototyping through steepest gradient descent movement," in Field Programmable Logic and Applications (FPL), 2016 26th International Conference on. IEEE, 2016, pp. 1–4.

#### **Optimized placement**



#### 1. Assign to nearest hard block column



#### 2. Column swap -> satisfy capacity constraints



#### 3. Greedy legalization



#### 4. Simulated annealing-based optimization



### Liquid versus vpr

- Large heterogeneous Titan23 benchmark designs
- Model of Altera's Stratix IV FPGA

|        | Total wire-<br>length [M] | Critical path<br>delay [ns] | Runtime [s] |
|--------|---------------------------|-----------------------------|-------------|
| VPR    | 3.83                      | 21.8                        | 924         |
| Liquid | 3.85                      | 21.7                        | 39          |
| ratio  | 1.00                      | 0.99                        | 0.04        |

#### scalability



S.-Y. Chen and Y.-W. Chang, "Routing-architecture-aware analytical placement for heterogeneous FPGAs," in *Proceedings of the 52nd Annual Design Automation Conference*. ACM, 2015, p. 27.

**CRoute:** A Fast High-quality **Timing-driven Connection-based FPGA Router** 





#### Net-based routing: Pathfinder

```
while (IllegalRoutingResourcesExist()):
    for each Net n do:
        if (firstIteration or n.congested()):
            ripUpRouting(n)
            route(n)
            n.resources().updatePresentCongestionCost()
```

```
allResources().updateHistoryCost()
updatePresentCongestionMultiplier()
allResources().updatePresentCongestionCost()
```

```
function route(Net n):
    routingTree = {source}
    for each Sink s of n:
        path = Dijkstra(routingTree, s)
        routingTree = routingTree U path
```











#### connection-based routing

```
while (IllegalRoutingResourcesExist()):
    for each Conn c do:
        if (firstIteration or c.congested()):
            ripUpRouting(c)
            route(c)
            c.resources().updatePresentCongestionCost()
```

```
allResources().updateHistoryCost()
updatePresentCongestionMultiplier()
allResources().updatePresentCongestionCost()
```

E. Vansteenkiste, K. Bruneel, and D. Stroobandt, "A connection-based router for FPGAs," in *Field-Programmable Technology (FPT)*, 2013 *International Conference on*. IEEE, 2013, pp. 326–329.







#### connection-based routing

```
while (IllegalRoutingResourcesExist()):
    for each Conn c do:
        if (firstIteration or c.congested()):
            ripUpRouting(c)
            route(c)
            c.resources().updatePresentCongestionCost()
```

```
allResources().updateHistoryCost()
updatePresentCongestionMultiplier()
allResources().updatePresentCongestionCost()
```

```
function route(Conn c):
```

```
path = Dijkstra(c.getSink())
```

E. Vansteenkiste, K. Bruneel, and D. Stroobandt, "A connection-based router for FPGAs," in *Field-Programmable Technology (FPT)*, 2013 *International Conference on*. IEEE, 2013, pp. 326–329.



#### Runtime



#### Runtime



#### runtime



First Iteration Reroute Connections Static Timing Analysis Other



#### Runtime



First Iteration Reroute Connections Static Timing Analysis Other



#### Wire-length



#### Critical path delay



# **RWRoute: An Open-source Timing-driven Router for Commercial FPGAs**



## Background: RapidWright



 Companion open-source framework for Vivado
 Enables custom crafted implementations
 Enables targeting commercial FPGAs

Chris Lavin and Alireza Kaviani, RapidWright: Enabling Custom Crafted Implementations for FPGAs, FCCM 2018.

### **CRoute: Sharing Mechanism and Drawback**



Scales down the cost of using a node *n* with a sharing factor, i.e., sharing(*n*)

sharing(n) = # connections using the node n
 Unaware of the criticality of connection under

consideration

- Encourages resource sharing even when a connection is long and timing-critical
- Limits critical path delay optimization

Source:

Elias Vansteenkiste, Karel Bruneel and D. Stroobandt, A connection-based router for FPGAs, FPT 2013.

# RWRoute: Criticality-aware Sharing Mechanism

- sharing'(n) = (1 criticality)<sup> $\lambda$ </sup> x sharing(n)
  - $\circ \lambda$ : user-defined sharing exponent
  - $> \lambda = 0$ : the criticality-unaware sharing mechanism as that of CRoute
  - $\lambda \ge 1$ : effective criticality-aware sharing mechanism
    - Encourage timing optimization for critical connections



#### Runtime and QoR Comparisons with Vivado

4.9X runtime speedup
 10% longer wirelength
 10% longer critical path delay







#### Scalability Comparison with Vivado



## Comparisons Regarding Rosetta Benchmarks





134

150

131

1.1X ~ 4.6X runtime speedup
 Comparable QoR

44

#### How Do You Use RWRoute?



# Multi-die placement and routing

#### Introduction





- Partitioning using hMETIS
- Factors influencing the partitioning quality.



#### Partitioning

- Choosing the optimal value of UB is a tradeoff between
  - Minimizing the cutsize and obtaining balanced partitions



Results published in the proceedings of SLIP workshop





- SLL legalization is simpler.
- Positions fixed at the start of the flow -> parallel independent placement

#### Interconnection demand



- From placement results
  - Get block positions
  - Get nets crossing the virtual boundary.
  - Nets crossing virtual boundary = Nets occupying an SLL.
- % $Utilisation = \frac{Occupied SLLs}{Available SLLs}$

#### **Interconnection demand**

#### Multi-Die vs Single-die Placement

Interconnection Demand



Circuit

#### Runtime



Runtime distribution: Single die, Multi-die (Sequential), Multi-die (Parallel)

#### WL estimation

#### Multi-Die vs Single-die Placement

Estimated TWL



Circuit

#### **CPD** estimation

#### Multi-Die vs Single-Die Placement

Estimated CPD



### CRouteMD

- We are currently working on extending CRoute to a multi-die parallel router
- Unfortunately no results yet



Many thanks to my Ph.D. students who collaborated to obtain these results: Elias Vansteenkiste, Dries Vercruyce, Yun Zhou, Raveena Raikar

