## Friday, April 19, 2019

## Tuesday, April 16, 2019

### Convolution(1)

- How to compute the size of the Conv layer?

Supoose:

- S:= # Stride
- N := output size
- F:= Filter size
- P:=Padding size

Think about all these

**heads**
$W+2P=(F-1)+(N-1)*S+1$

which gives

$N=\frac{W+2P-F}{S}+1$

## Sunday, April 7, 2019

## Friday, April 5, 2019

### Graph Convolution Neural Nets? Are you out of your mind? How can you do this on a graph?

Recently I am working on reproducing the paper Graph Convolutional Matrix Completion . A natural question to ask is what is

Convolution at first is defined for continuous space, later maybe signal processing guy has extended this idea to discrete space(say image processing, the convolutional kernel you used in convolutional neural network). But what about graphs?

The first idea may appeared in Joan Bruna's this paper. Then extended by Defferrard et al(2016).

The version I encountered is Thomas N. Kipf et al.(2017), which I will introduce here.

The basic setting is here:

where $U$ is the matrix of eigenvectors of the

\begin{align}

g_\theta * x & = \mathcal{F}^{-1} (\mathcal{F}(x) \odot \mathcal{F}(g)) \\

& = Ug_\theta U^T x

\end{align}

$g_\theta'(\Lambda) \approx \sum_{k=0}^K \theta_k^{'}T_k(\tilde{\Lambda})$

where $T_k(x)$ is the Chebyshev polynomials. $T_0(x)=1, T_1(x)=x, T_k(x)=2xT_{k-1}(x)-T_{k-2}(x) $

$\tilde{\Lambda}=\frac{2}{\lambda_{max}} \Lambda - I_N$, $\lambda_{max}$ is the largest eigenvalue of $L$

They limit the $K=1$, and assume $\lambda_{max}\approx2$

Now:

$g_{\theta'}*x\approx \theta_0'x+\theta_1^{'}(L-I_N)x=\theta_0^{'}x-\theta_1^{'}D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x$

Further, they assume $\theta=\theta_0^{'}=-\theta_1^{'}$

Apply

Ref:

1.

**convolution**on graph.Convolution at first is defined for continuous space, later maybe signal processing guy has extended this idea to discrete space(say image processing, the convolutional kernel you used in convolutional neural network). But what about graphs?

The first idea may appeared in Joan Bruna's this paper. Then extended by Defferrard et al(2016).

The version I encountered is Thomas N. Kipf et al.(2017), which I will introduce here.

The basic setting is here:

- $x\in R^N$ is a feature vector of the nodes of a graph
- Convolutional kernel/filter $g_\theta=diag(\theta), \theta\in R^N$ lies in the Fourier domain

where $U$ is the matrix of eigenvectors of the

**normalized graph Laplacian**$L=I_N-D^{-1/2}AD^{-1/2}=U \Lambda U^T$**,**where $D_{i, i}=\sum_j {A_{i, j }}$

**Note:**- In spectral-based models, graph are assumed to be undirected. Normalized graph Laplacian matrix is one representation.
- Graph Fourier Transform: $\mathcal{F}(x)= U^T x$
- Inverse Graph Fourier Transform: $\mathcal{F}^{-1}(\hat{x})= U \hat{x}$
- Derivation:

\begin{align}

g_\theta * x & = \mathcal{F}^{-1} (\mathcal{F}(x) \odot \mathcal{F}(g)) \\

& = Ug_\theta U^T x

\end{align}

**They are not satisfied, yet. Computing $g_\theta$ is expensive, so they use an approximation derived in Hammond et al.(2011), where:**

$g_\theta'(\Lambda) \approx \sum_{k=0}^K \theta_k^{'}T_k(\tilde{\Lambda})$

where $T_k(x)$ is the Chebyshev polynomials. $T_0(x)=1, T_1(x)=x, T_k(x)=2xT_{k-1}(x)-T_{k-2}(x) $

$\tilde{\Lambda}=\frac{2}{\lambda_{max}} \Lambda - I_N$, $\lambda_{max}$ is the largest eigenvalue of $L$

They limit the $K=1$, and assume $\lambda_{max}\approx2$

Now:

$g_{\theta'}*x\approx \theta_0'x+\theta_1^{'}(L-I_N)x=\theta_0^{'}x-\theta_1^{'}D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x$

Further, they assume $\theta=\theta_0^{'}=-\theta_1^{'}$

Apply

**re-normalization trick**to prevent exploding/vanishing gradients problem:- $\tilde{A}=A+I_N$
- $\tilde{D}_{ii}=\sum_j \tilde{A}_{ij}$
- $I_N+D^{-\frac{1}{2}}AD^{-\frac{1}{2}} \rightarrow \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}$

**Matrix representation:**

Suppose $X\in R^{N \times C} $ is the signal/input, where C is the dim of input channels/features. $\Theta\in R^{C \times F }$ a matrix of filter parameters. $Z \in R^{N \times F}$ is the convolved

input matrix.

$Z=\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} X \Theta $

Ref:

1.

Subscribe to:
Posts (Atom)