DLVM

Modern Compiler Infrastructure for Deep Learning Systems

Introduction

Deep learning software demands reliability and performance. However, many of the existing deep learning frameworks are software libraries that act as an unsafe DSL in Python and a computation graph interpreter.

We present DLVM, a design and implementation of a compiler infrastructure with a linear algebra intermediate representation, algorithmic differentiation by adjoint code generation, domain-specific optimizations, and a code generator targeting GPU via LLVM.

Designed as a modern compiler infrastructure inspired by LLVM, DLVM is more modular and more generic than existing deep learning compiler frameworks, and supports tensor DSLs with high expressivity. With our prototypical staged DSL embedded in Swift, we argue that the DLVM system enables a form of modular, safe, and performant frameworks for deep learning.


DLVM started as a research project at University of Illinois at Urbana-Champaign, and is now driven by a small community of researchers and developers.

Demos

NNKit is a staged DSL embedded in Swift. It:

// Staged function representing f(x, w, b) = dot(x, w) + b
let f: Rep<(Float2D, Float2D, Float1D) -> Float2D> =
    lambda { x, w, b in x  w + b.rankLifted() }

// Staged function ’g’, type-inferred from ’f’
let g = lambda { x, w, b in
    let linear = f[x, w, b] // staged function application
    return tanh(linear)
}

// Gradient of ’g’ with respect to arguments ’w’ and ’b’
let dg = gradient(of: g, withRespectTo: (1, 2), keeping: 0)
// ’dg’ has type:
// Rep<(Float2D, Float2D, Float1D) -> (Float2D, Float2D, Float2D)>

// Call staged function on input data ’x’, ’w’ and ’b’
let (dg_dw, dg_db, result) = dg[x, w, b]
// At runtime, ’dg’ gets just-in-time compiled though DLVM,
// and computes ( dg/dw, dg/db, g(x, w, b) )

The DLVM Intermediate Representation (IR) is the core language of the DLVM system. It:

The Swift code above is JIT compiled by NNKit to the following DLVM IR:

// Dimension-erased functions are flexible because input shapes are dynamic.
// They may be slower and less optimized than their shape-specialized counterparts.

// f(x, w, b) = dot(x, w) + pad(b, at: 0)
func @f: (<_ x _ x f32>, <_ x _ x f32>, <_ x f32>) -> <_ x _ x f32> {
'entry(%x: <_ x _ x f32>, %w: <_ x _ x f32>, %b: <_ x f32>):
    %0.0 = dot %x: <_ x _ x f32>, %w: <_ x _ x f32>
    %0.1 = padShape %b: <_ x f32> at 0
    %0.2 = add %0.0: <_ x _ x f32>, %0.1: <1 x _ x f32>
    return %0.2: <_ x _ x f32>
}

// Gradient declaration in DLVM IR: [gradient @f wrt 1, 2 seedable]
// Seedable: able to take back-propagated gradient as a seed for AD
// df(x, w, b, seed) = ( df/dw, df/db )
func @df: (<_ x _ x f32>, <_ x _ x f32>, <_ x f32>, <_ x _ x f32>)
         -> (<_ x _ x f32>, <_ x _ x f32>) {
'entry(%x: <_ x _ x f32>, %w: <_ x _ x f32>, %b: <_ x f32>, %seed: <_ x _ x f32>):
    // Backward pass: df/dw = dot(x^T, seed), df/db = sum(seed, along: 0)
    %0.0 = reduce %seed: <_ x _ x f32> by add init 0: f32 along 0
    %0.1 = transpose %x: <_ x _ x f32>
    %0.2 = dot %0.1: <_ x _ x f32>, %seed: <_ x _ x f32>
    %0.3 = literal (%0.2: <_ x _ x f32>, %0.0: <_ x f32>): (<_ x _ x f32>, <_ x f32>)
    return %0.3: (<_ x _ x f32>, <_ x f32>)
}

... // @g and @dg omitted here for brevity
// In shape-specialized functions, input shapes are statically known.
// This enables more optimizations and results in better performance.
// Shape-specialized for x: <1 x 784>, w: <784 x 10>, b: <10>

// f(x, w, b) = dot(x, w) + pad(b, at: 0)
func @f: (<1 x 784 x f32>, <784 x 10 x f32>, <10 x f32>) -> <1 x 10 x f32> {
'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <10 x f32>):
    %0.0 = dot %x: <1 x 784 x f32>, %w: <784 x 10 x f32>
    %0.1 = padShape %b: <10 x f32> at 0
    %0.2 = add %0.0: <1 x 10 x f32>, %0.1: <1 x 10 x f32>
    return %0.2: <1 x 10 x f32>
}

// Gradient declaration in DLVM IR: [gradient @f wrt 1, 2 seedable]
// Seedable: able to take backpropagated gradient as a seed for AD
// df(x, w, b, seed) = ( df/dw, df/db )
func @df: (<1 x 784 x f32>, <784 x 10 x f32>, <10 x f32>, <1 x 10 x f32>)
         -> (<784 x 10 x f32>, <10 x f32>) {
'entry(%x: <1 x 784 x f32>, %w: <784 x 10 x f32>, %b: <10 x f32>, %seed: <1 x 10 x f32>):
    // Backward pass: df/dw = dot(x^T, seed), df/db = squeeze(seed, at: 0)
    %0.0 = squeezeShape %seed: <1 x 10 x f32> at 0
    %0.1 = transpose %x: <1 x 784 x f32>
    %0.2 = dot %0.1: <784 x 1 x f32>, %seed: <1 x 10 x f32>
    %0.3 = literal (%0.2: <784 x 10 x f32>, %0.0: <10 x f32>): (<784 x 10 x f32>, <10 x f32>)
    return %0.3: (<784 x 10 x f32>, <10 x f32>)
}

... // @g and @dg omitted here for brevity

More information about NNKit and DLVM IR will be published soon.

Publications

Projects

All projects are written in Swift.

†: open sourcing in progress