..

Hopfield Networks Visualized

Hopfield networks were first introduced in 1982 by John Hopfield [1]. They are differ from many modern neural network architectures. A few of the difference are:


Simple Example

Let's start with a simple example. Hopfield networks are fully connected neural networks, with bidirectional weights. The neuron states are bipolar, either -1 or 1. This differs from most neural networks where the neurons assume continuous values. The weights are calculated using the Hebbian learning rule. They are typically used to perform associative recall. This means that given a noisy or partial input, the network can recall the associated memory. [2]



Example of a 3x3 grid "memory"


The following visual shows an example of a Hopfield network through the process of recalling a memory. The network has been trained with this "memory", which is just a greyscale image with 9 pixels.

Given any corrupted or partial version of this input, the hopfield network can "recall" the input overtime through asyncronous updates to the network. The network recalls the memory by updating the states s_i of each neuron overtime so that they converge to the memory state.





Training

Like other neural networks, the hopfield network consists of nodes and weights. The weight matrix of a hopfield network is set with the following rule:



W = \frac{1}{n}\sum_{i}^{n}s_{i}s_{i}^{T}


where s_i is the state value vector of the network's neurons and n is the number of nodes. This is essentially a weighted average of the outer product of each "memory" where a memory is a configuration of neuron states. This outputs a symmetric weight matrix, where number at ij corresponds to the weight value between nodes i and j. The weights on the diagonal of the matrix are set to 0. This is becuase Hopfield networks do not have self-connections.



W_{ij} = W_{ji} \quad \forall i,j : i \; != \; j
W_{ij} = 0 \quad \forall i,j : i = j


We can perform this exercise with our dummy example and we get the following:



W= \begin{bmatrix} -1 \\ 1 \\ -1 \\ -1 \\ 1 \\ -1 \\ -1 \\ 1 \\ 1 \end{bmatrix} \cdot \begin{bmatrix} -1 & 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} 1 & -1 & 1 & 1 & -1 & 1 & 1 & -1 & -1 \\ -1 & 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 \\ 1 & -1 & 1 & 1 & -1 & 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & 1 & -1 & 1 & 1 & -1 & -1 \\ -1 & 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 \\ 1 & -1 & 1 & 1 & -1 & 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & 1 & -1 & 1 & 1 & -1 & -1 \\ -1 & 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 \\ -1 & 1 & -1 & -1 & 1 & -1 & -1 & 1 & 1 \end{bmatrix}


Then we set the diagonal to zero:



W= \begin{bmatrix} 0 & -1 & 1 & 1 & -1 & 1 & 1 & -1 & -1 \\ -1 & 0 & -1 & -1 & 1 & -1 & -1 & 1 & 1 \\ 1 & -1 & 0 & 1 & -1 & 1 & 1 & -1 & -1 \\ 1 & -1 & 1 & 0 & -1 & 1 & 1 & -1 & -1 \\ -1 & 1 & -1 & -1 & 0 & -1 & -1 & 1 & 1 \\ 1 & -1 & 1 & 1 & -1 & 0 & 1 & -1 & -1 \\ 1 & -1 & 1 & 1 & -1 & 1 & 0 & -1 & -1 \\ -1 & 1 & -1 & -1 & 1 & -1 & -1 & 0 & 1 \\ -1 & 1 & -1 & -1 & 1 & -1 & -1 & 1 & 0 \end{bmatrix}


import numpy as np

# n is the number of nodes (neurons) in the network
# also the size of the memory
X = np.array([[-1,1,-1], [-1,1,-1], [-1,1,1]])
n = X.flatten().shape[0]

weights = np.outer(sample, sample) / n
np.fill_diagonal(weights, 0)


Now our simple hopfield network is trained and ready to recall the memory. We can walk through the process of recalling the memory of the network using asyncronous updates.



Recall

The recall process starts with a partial or corrupted "memory", and slowly reconstructs the full memory. In our case, we only have one memory, so the network will converge to the same state regardless of what we start with. We will choose a randomly initialized state vector to start with.



s = \begin{bmatrix} 0.03 \\ 0.37 \\ 0.18 \\ 0.78 \\ 0.38 \\ -0.82 \\ 0.02 \\ -0.38 \\ 0.68 \end{bmatrix}
Corrupted starting memory


At each time step, a node is chosen at random and its state is updated. It is updated according to the following rule [3]:



s_i = \begin{cases} 1 & \sum_{j}^{n}W_{ij}s_j > 0 \\ -1 & \sum_{j}^{n}W_{ij}s_j \le 0 \\ \end{cases}



i = np.random.randint(n)
net_input = np.dot(weights[i], states)
states[i] = 1 if net_input > 0 else -1


The s vector is repeatedly updated until the network converges to a stable state. The following visual shows the network converging to the memory state.




Energy

The convergence of the network to a stable state can be thought of as a minimization of the "energy" function of the network. Throughout the training process, energy should be non increasing at each time step. The energy function is defined as:



E=-\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}W_{ij}s_is_j


Or in vector form:



E=-\frac{1}{2}s^TWs



states = states.reshape(-1,1)
energy = -0.5 * np.dot(states.T, np.dot(weights, states))


The energy of the network can be thought of as how "stable" the network is. The lower the energy, the more stable, and the closer the network is to converging on a memory state. Hopfield networks are designed to minimize this energy function.



Energy over time


Thoughts

Hopfield networks are particularly interesting to me becuase they are both non-differentiable, and as a recurrent network, they allow for variable numbers of computations before converging on a solution. This is in contrast to feedforward networks which have a fixed number of computations.

Be sure to check out the sources below to see the inspirations and resources used in this article. The code for this hopfield network is on my github.

Sources