Friday, April 19, 2019
Tuesday, April 16, 2019
Convolution(1)
- How to compute the size of the Conv layer?
Supoose:
- S:= # Stride
- N := output size
- F:= Filter size
- P:=Padding size
Think about all these heads
$W+2P=(F-1)+(N-1)*S+1$
which gives
$N=\frac{W+2P-F}{S}+1$
Sunday, April 7, 2019
Friday, April 5, 2019
Graph Convolution Neural Nets? Are you out of your mind? How can you do this on a graph?
Recently I am working on reproducing the paper Graph Convolutional Matrix Completion . A natural question to ask is what is convolution on graph.
Convolution at first is defined for continuous space, later maybe signal processing guy has extended this idea to discrete space(say image processing, the convolutional kernel you used in convolutional neural network). But what about graphs?
The first idea may appeared in Joan Bruna's this paper. Then extended by Defferrard et al(2016).
The version I encountered is Thomas N. Kipf et al.(2017), which I will introduce here.
The basic setting is here:
where $U$ is the matrix of eigenvectors of the normalized graph Laplacian $L=I_N-D^{-1/2}AD^{-1/2}=U \Lambda U^T$ , where $D_{i, i}=\sum_j {A_{i, j }}$
Note:
\begin{align}
g_\theta * x & = \mathcal{F}^{-1} (\mathcal{F}(x) \odot \mathcal{F}(g)) \\
& = Ug_\theta U^T x
\end{align}
They are not satisfied, yet. Computing $g_\theta$ is expensive, so they use an approximation derived in Hammond et al.(2011), where:
$g_\theta'(\Lambda) \approx \sum_{k=0}^K \theta_k^{'}T_k(\tilde{\Lambda})$
where $T_k(x)$ is the Chebyshev polynomials. $T_0(x)=1, T_1(x)=x, T_k(x)=2xT_{k-1}(x)-T_{k-2}(x) $
$\tilde{\Lambda}=\frac{2}{\lambda_{max}} \Lambda - I_N$, $\lambda_{max}$ is the largest eigenvalue of $L$
They limit the $K=1$, and assume $\lambda_{max}\approx2$
Now:
$g_{\theta'}*x\approx \theta_0'x+\theta_1^{'}(L-I_N)x=\theta_0^{'}x-\theta_1^{'}D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x$
Further, they assume $\theta=\theta_0^{'}=-\theta_1^{'}$
Apply re-normalization trick to prevent exploding/vanishing gradients problem:
Ref:
1.
Convolution at first is defined for continuous space, later maybe signal processing guy has extended this idea to discrete space(say image processing, the convolutional kernel you used in convolutional neural network). But what about graphs?
The first idea may appeared in Joan Bruna's this paper. Then extended by Defferrard et al(2016).
The version I encountered is Thomas N. Kipf et al.(2017), which I will introduce here.
The basic setting is here:
- $x\in R^N$ is a feature vector of the nodes of a graph
- Convolutional kernel/filter $g_\theta=diag(\theta), \theta\in R^N$ lies in the Fourier domain
where $U$ is the matrix of eigenvectors of the normalized graph Laplacian $L=I_N-D^{-1/2}AD^{-1/2}=U \Lambda U^T$ , where $D_{i, i}=\sum_j {A_{i, j }}$
Note:
- In spectral-based models, graph are assumed to be undirected. Normalized graph Laplacian matrix is one representation.
- Graph Fourier Transform: $\mathcal{F}(x)= U^T x$
- Inverse Graph Fourier Transform: $\mathcal{F}^{-1}(\hat{x})= U \hat{x}$
- Derivation:
\begin{align}
g_\theta * x & = \mathcal{F}^{-1} (\mathcal{F}(x) \odot \mathcal{F}(g)) \\
& = Ug_\theta U^T x
\end{align}
They are not satisfied, yet. Computing $g_\theta$ is expensive, so they use an approximation derived in Hammond et al.(2011), where:
$g_\theta'(\Lambda) \approx \sum_{k=0}^K \theta_k^{'}T_k(\tilde{\Lambda})$
where $T_k(x)$ is the Chebyshev polynomials. $T_0(x)=1, T_1(x)=x, T_k(x)=2xT_{k-1}(x)-T_{k-2}(x) $
$\tilde{\Lambda}=\frac{2}{\lambda_{max}} \Lambda - I_N$, $\lambda_{max}$ is the largest eigenvalue of $L$
They limit the $K=1$, and assume $\lambda_{max}\approx2$
Now:
$g_{\theta'}*x\approx \theta_0'x+\theta_1^{'}(L-I_N)x=\theta_0^{'}x-\theta_1^{'}D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x$
Further, they assume $\theta=\theta_0^{'}=-\theta_1^{'}$
Apply re-normalization trick to prevent exploding/vanishing gradients problem:
- $\tilde{A}=A+I_N$
- $\tilde{D}_{ii}=\sum_j \tilde{A}_{ij}$
- $I_N+D^{-\frac{1}{2}}AD^{-\frac{1}{2}} \rightarrow \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}$
Matrix representation:
Suppose $X\in R^{N \times C} $ is the signal/input, where C is the dim of input channels/features. $\Theta\in R^{C \times F }$ a matrix of filter parameters. $Z \in R^{N \times F}$ is the convolved
input matrix.
$Z=\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} X \Theta $
Ref:
1.
Subscribe to:
Posts (Atom)