Wider nets without the memory cost
(Work in progress)
Wider artificial neural networks require more memory to train, run and store. Randomly permuted parameters (RPP) is proposed as a method for widening nets without increasing memory consumption. Using RPP, network width can be multiplied by $k$, and instead of the parameter count, $q$, increasing multiplicatively to $ \bar{q} = q * k $, it either stays the same, or increases additively to $\bar{q} \approx q + k$, if biases are not reused. RPP works by using a hidden weights matrix, from which a larger, permuted weights matrix, is created. The elements of the permuted weights matrix are pointers to elements of the hidden weights matrix, and each element of the hidden weights matrix appears $k$ times in the permuted weights matrix. Preliminary experiments demonstrate empirically that this method is promising.
Method
A standard neural network layer can be written as:
Using RPP, each column of $W$ is randomly permuted $k$ times, to create $k$ new columns in weights matrix, $W_p$.
Where $p_{h,i,j}$ is an element of $P$, the *random permutation tensor* used to index $W$. The values of $P$ are needed for both forward and back propagation, and should stay constant for a given network layer, but $P$ does not need to be stored in memory, because it can be generated by a deterministic random algorithm. ($P$ is constrained such that the indices correspond to column-wise permutations as described above.) Alternatively, a single $P$ can be stored in memory, and used for multiple layers (this works for layers that have the same shaped weight matrix), in which case the number of model parameters for a network of depth $d$ would become, $\bar{p} = m*n*d + m*n*k$, instead of $\bar{p} = m*n*d*k$ if the layers were widened the standard way. # Preliminary results ![](/assets/posts/permute_fashion_mnist.png) The above plot shows RPP layers performing as well as layers with 100 times more parameters.