TensorFlow Implementation of AutoEncoder Self-Encoder

I. Overview

AutoEncoder is roughly a learning method that encodes the high-dimensional features of the data in compressed dimensionality reduction and then goes through the opposite decoding process. In the learning process, the final result obtained by decoding is compared with the original data, and the loss function is reduced by correcting the weight bias parameter to continuously improve the recovery ability of the original data. After the learning process is completed, the results obtained in the first half of the encoding process can represent the low-dimensional "eigenvalues" of the original data. The self-encoder model obtained through learning can realize the compression of high-dimensional data to the desired dimension, the principle is similar to PCA.

II. Model realization

1. AutoEncoder

First, on the MNIST dataset, feature compression and feature decompression are implemented and visualized to compare the decompressed data against the original data.

Look at the code first:

import tensorflow as tf 
import numpy as np 
import  as plt 
 
# Import MNIST data
from  import input_data 
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False) 
 
learning_rate = 0.01 
training_epochs = 10 
batch_size = 256 
display_step = 1 
examples_to_show = 10 
n_input = 784 
 
# tf Graph input (only pictures) 
X = ("float", [None, n_input]) 
 
# Store the parameters of each hidden layer as a dictionary
n_hidden_1 = 256 # of neurons in the first coding layer
n_hidden_2 = 128 # of neurons in the second coding layer
# Changes in weights and biases are reversed in order at the encoding and decoding layers
# Weight parameter matrix dimensions are input*output for each layer, bias parameter dimensions depend on the number of cells in the output layer
weights = { 
 'encoder_h1': (tf.random_normal([n_input, n_hidden_1])), 
 'encoder_h2': (tf.random_normal([n_hidden_1, n_hidden_2])), 
 'decoder_h1': (tf.random_normal([n_hidden_2, n_hidden_1])), 
 'decoder_h2': (tf.random_normal([n_hidden_1, n_input])), 
} 
biases = { 
 'encoder_b1': (tf.random_normal([n_hidden_1])), 
 'encoder_b2': (tf.random_normal([n_hidden_2])), 
 'decoder_b1': (tf.random_normal([n_hidden_1])), 
 'decoder_b2': (tf.random_normal([n_input])), 
} 
 
# Each layer of the structure is xW + b
# Build the encoder
def encoder(x): 
 layer_1 = (((x, weights['encoder_h1']), 
         biases['encoder_b1'])) 
 layer_2 = (((layer_1, weights['encoder_h2']), 
         biases['encoder_b2'])) 
 return layer_2 
 
 
# Build the decoder
def decoder(x): 
 layer_1 = (((x, weights['decoder_h1']), 
         biases['decoder_b1'])) 
 layer_2 = (((layer_1, weights['decoder_h2']), 
         biases['decoder_b2'])) 
 return layer_2 
 
# Modeling
encoder_op = encoder(X) 
decoder_op = decoder(encoder_op) 
 
# Predictions
y_pred = decoder_op 
y_true = X 
 
# Define cost functions and optimizers
cost = tf.reduce_mean((y_true - y_pred, 2)) # Least squares method
optimizer = (learning_rate).minimize(cost) 
 
with () as sess: 
 # tf.initialize_all_variables() no long valid from 
 # 2017-03-02 if using tensorflow >= 0.12 
 if int((tf.__version__).split('.')[1]) < 12 and int((tf.__version__).split('.')[0]) < 1: 
  init = tf.initialize_all_variables() 
 else: 
  init = tf.global_variables_initializer() 
 (init) 
 # First calculate the total number of batches to ensure that every sample in the training set participates in each cycle, unlike batch training
 total_batch = int(.num_examples/batch_size) # of total lots
 for epoch in range(training_epochs): 
  for i in range(total_batch): 
   batch_xs, batch_ys = .next_batch(batch_size) # max(x) = 1, min(x) = 0 
   # Run optimization op (backprop) and cost op (to get loss value) 
   _, c = ([optimizer, cost], feed_dict={X: batch_xs}) 
  if epoch % display_step == 0: 
   print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c)) 
 print("Optimization Finished!") 
 
 encode_decode = ( 
  y_pred, feed_dict={X: [:examples_to_show]}) 
 f, a = (2, 10, figsize=(10, 2)) 
 for i in range(examples_to_show): 
  a[0][i].imshow(([i], (28, 28))) 
  a[1][i].imshow((encode_decode[i], (28, 28))) 
 ()

Code Interpretation:

First, import the various libraries and datasets that will be used, and define each parameter such as the learning rate, the number of training iterations, etc., so that it is clear and easy to be modified at a later stage. Since the structure of the neural network of the self-encoder is very regular, which is the structure of xW + b, the weight W and bias b of each layer are unified in a dictionary, which is more clearly described by the key value of the dictionary. The model construction idea, the encoder part and the decoder part are constructed separately, the activation function of each layer uses the Sigmoid function, and the encoder usually uses the same activation function as the encoder. Usually the encoder part and decoder part are a mutually inverse process, for example, we design the encoder that reduces 784 dimensions to 256 dimensions and then to 128 dimensions, and the decoder corresponds to decoding from 128 dimensions to 256 dimensions and then to 784 dimensions. Define the cost function, which is expressed as the least squares expression of the decoder's output and the original input, and the optimizer uses AdamOptimizer training phase to involve all the training data in each cycle. After training, the training results are finally compared with the original data visualization, as shown below, with a high degree of restoration. If the number of training cycles is increased or the number of layers of the self-encoder is increased, a better reduction can be obtained.

Run results:

2. Encoder

Encoder encoder works on the same principle as AutoEncoder, we will encode the low-dimensional "eigenvalues" obtained in the low-dimensional space visualization, intuitively show the clustering effect of the data. Specifically, the 784-dimensional MNIST data will be reduced step by step from 784 to 128 to 64 to 10 and finally to 2 dimensions, which will be displayed in a 2-dimensional coordinate system. The difference is that instead of using the Sigmoid activation function in the last layer of the encoder, we will use the default linear activation function, so that the output will be (-∞, +∞).

Full Code:

import tensorflow as tf 
import  as plt 
 
from  import input_data 
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False) 
 
learning_rate = 0.01 
training_epochs = 10 
batch_size = 256 
display_step = 1 
n_input = 784 
X = ("float", [None, n_input]) 
 
n_hidden_1 = 128 
n_hidden_2 = 64 
n_hidden_3 = 10 
n_hidden_4 = 2 
weights = { 
 'encoder_h1': (tf.truncated_normal([n_input, n_hidden_1],)), 
 'encoder_h2': (tf.truncated_normal([n_hidden_1, n_hidden_2],)), 
 'encoder_h3': (tf.truncated_normal([n_hidden_2, n_hidden_3],)), 
 'encoder_h4': (tf.truncated_normal([n_hidden_3, n_hidden_4],)), 
 'decoder_h1': (tf.truncated_normal([n_hidden_4, n_hidden_3],)), 
 'decoder_h2': (tf.truncated_normal([n_hidden_3, n_hidden_2],)), 
 'decoder_h3': (tf.truncated_normal([n_hidden_2, n_hidden_1],)), 
 'decoder_h4': (tf.truncated_normal([n_hidden_1, n_input],)), 
} 
biases = { 
 'encoder_b1': (tf.random_normal([n_hidden_1])), 
 'encoder_b2': (tf.random_normal([n_hidden_2])), 
 'encoder_b3': (tf.random_normal([n_hidden_3])), 
 'encoder_b4': (tf.random_normal([n_hidden_4])), 
 'decoder_b1': (tf.random_normal([n_hidden_3])), 
 'decoder_b2': (tf.random_normal([n_hidden_2])), 
 'decoder_b3': (tf.random_normal([n_hidden_1])), 
 'decoder_b4': (tf.random_normal([n_input])), 
} 
def encoder(x): 
 layer_1 = (((x, weights['encoder_h1']), 
         biases['encoder_b1'])) 
 layer_2 = (((layer_1, weights['encoder_h2']), 
         biases['encoder_b2'])) 
 layer_3 = (((layer_2, weights['encoder_h3']), 
         biases['encoder_b3'])) 
 # To facilitate the output of the coding layer, the subsequent layer of the coding layer does not use an activation function
 layer_4 = ((layer_3, weights['encoder_h4']), 
         biases['encoder_b4']) 
 return layer_4 
 
def decoder(x): 
 layer_1 = (((x, weights['decoder_h1']), 
         biases['decoder_b1'])) 
 layer_2 = (((layer_1, weights['decoder_h2']), 
         biases['decoder_b2'])) 
 layer_3 = (((layer_2, weights['decoder_h3']), 
        biases['decoder_b3'])) 
 layer_4 = (((layer_3, weights['decoder_h4']), 
        biases['decoder_b4'])) 
 return layer_4 
 
encoder_op = encoder(X) 
decoder_op = decoder(encoder_op) 
 
y_pred = decoder_op 
y_true = X 
 
cost = tf.reduce_mean((y_true - y_pred, 2)) 
optimizer = (learning_rate).minimize(cost) 
 
with () as sess: 
 # tf.initialize_all_variables() no long valid from 
 # 2017-03-02 if using tensorflow >= 0.12 
 if int((tf.__version__).split('.')[1]) < 12 and int((tf.__version__).split('.')[0]) < 1: 
  init = tf.initialize_all_variables() 
 else: 
  init = tf.global_variables_initializer() 
 (init) 
 total_batch = int(.num_examples/batch_size) 
 for epoch in range(training_epochs): 
  for i in range(total_batch): 
   batch_xs, batch_ys = .next_batch(batch_size) # max(x) = 1, min(x) = 0 
   _, c = ([optimizer, cost], feed_dict={X: batch_xs}) 
  if epoch % display_step == 0: 
   print("Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(c)) 
 print("Optimization Finished!") 
 
 encoder_result = (encoder_op, feed_dict={X: }) 
 (encoder_result[:, 0], encoder_result[:, 1], c=) 
 () 
 ()

Results:

From the results, it can be seen that 2-dimensional coded features have better clustering effect, and each color in the figure represents a number with good aggregation.

Of course, the results obtained in this experiment are only a simple introduction to AutoEncoder, and more complex self-encoder structures should be designed to obtain better discriminative features in order to get the desired results.

This is the whole content of this article.