Gradient-based optimization is a very important technique in modern deep learning research. The way of efficiently computing gradient of a model becomes an essential issue. Automatic Differentiation (AD) allow one not paying much effort on derivation of a model. Thus, we can focus on the architecture of model, regardless of the gradient. Meanwhile, the idea of Differentiable Programming shows up. Therefore, gradient calculation is not limit to optimization field. However, integration of AD system and the software development stack is not that trivial. I will introduce the basic knowledge about different ways to conduct differentiation in the past and also the AMAZING package Zygote.jl — the 21st century package for gradient calculation.
Collaborative note: https://hackmd.io/@coscup/SyeklQe4r