The sliding average maintains a shadow variable for the target variable, which does not affect the maintenance of updates to the original variable, but is used in place of the original variable during testing or actual prediction (when not training).
1. Sliding average solution object initialization
ema = (decay,num_updates)
Parameter decay
`shadow_variable = decay * shadow_variable + (1 - decay) * variable`
Parameter num_updates
`min(decay, (1 + num_updates) / (10 + num_updates))`
2. Add/update variables
Add target variables for which shadow variables are maintained
Note that maintenance is not automatic and requires this sentence to be run in every round of training, so it's common to use tf.control_dependencies to make it bind to train_op to the point that every time train_op updates the shadow variables
([var0, var1])
3、Get the shadow variable value
This step is not needed to define the graph in which the target values are extracted from the set of shadow variables
(([var0, var1]))
4. Save & Load Shadow Variables
We know that in TensorFlow, the sliding averages of variables are maintained by shadow variables, so if you want to get the sliding average of a variable you need to get the shadow variable and not the variable itself.
Save shadow variables
After the object is created, Saver saves it normally and stores the shadow variable, the naming convention is "v/ExponentialMovingAverage" corresponding to the variable "v".
import tensorflow as tf if __name__ == "__main__": v = (0.,name="v") # Setting the coefficients of the sliding average model ema = (0.99) # set variable v to use sliding average model, tf.all_variables() sets all variables op = ([v]) # Get the name of the variable v print() #v:0 # Create an object that holds the model save = () sess = () # Initialize all variables init = tf.initialize_all_variables() (init) # Reassign the value to the variable v ((v,10)) #Apply average slide settings (op) #Save the model file (sess,"./") # Output the value of the variable v before and after using the sliding average model print(([v,(v)])) #[10.0, 0.099999905]
Load shadow variables and map to variables
Using Saver's variable name mapping for loading models, you can do this for virtually all variables. Summary of TensorFlow model loading methods
v = (1.,name="v") #Define model objects saver = ({"v/ExponentialMovingAverage":v}) sess = () (sess,"./") print((v)) #0.0999999
One particular thing to note here is that in the use function, the model parameter passed is {"v/ExponentialMovingAverage":v} rather than {"v":v}, and if you use the latter parameter then you will get a result of 10 rather than 0.09, and that's because the latter gets the variable itself rather than the shadow variable.
When using this approach to read the model file, you also need to enter a large list of variable names.
Use of the variables_to_restore function
v = (1.,name="v") The magnitude of the parameters of the #sliding model does not affect the value of v ema = (0.99) print(ema.variables_to_restore()) #{'v/ExponentialMovingAverage': < 'v:0' shape=() dtype=float32_ref>} sess = () saver = (ema.variables_to_restore()) (sess,"./") print((v)) #0.0999999
variables_to_restore recognizes variables in the network and automatically generates shadow variable names.
By using the variables_to_restore function, it is possible to map the shadow variables directly to the variables themselves when loading the model, so that when we get the sliding average of a variable we only need to get the value of the variable itself and not the shadow variables.
5、Official Documentation Examples
The official document will automatically train one side of the model every time apply update, in fact, you can reverse the relationship, "tf real google" P128 has examples of
| Example usage when creating a training model: | | ```python | # Create variables. | var0 = (...) | var1 = (...) | # ... use the variables to build a training model... | ... | # Create an op that applies the optimizer. This is what we usually | # would use as a training op. | opt_op = (my_loss, [var0, var1]) | | # Create an ExponentialMovingAverage object | ema = (decay=0.9999) | | with tf.control_dependencies([opt_op]): | # Create the shadow variables, and add ops to maintain moving averages | # of var0 and var1. This also creates an op that will update the moving | # averages after each training step. This is what we will use in place | # of the usual training op. | training_op = ([var0, var1]) | | ...train the model by running training_op... | ```
6. batch_normal example
Not quite the same as above, it's not quite as easy to bind to train_op in batch_normal (which sits outside the body of the function), then forcing the outputs of the two variables to be procedurally node-bound to the parameter update step
def batch_norm(x,beta,gamma,phase_train,scope='bn',decay=0.9,eps=1e-5): with tf.variable_scope(scope): # beta = tf.get_variable(name='beta', shape=[n_out], initializer=tf.constant_initializer(0.0), trainable=True) # gamma = tf.get_variable(name='gamma', shape=[n_out], # initializer=tf.random_normal_initializer(1.0, stddev), trainable=True) batch_mean,batch_var = (x,[0,1,2],name='moments') ema = (decay=decay) def mean_var_with_update(): ema_apply_op = ([batch_mean,batch_var]) with tf.control_dependencies([ema_apply_op]): return (batch_mean),(batch_var) # after identity will convert Variable to Tensor and incorporate it into the graph. # Otherwise, since Variable is session-independent, it won't be limited by graph control_dependencies mean,var = (phase_train, mean_var_with_update, lambda: ((batch_mean),(batch_var))) normed = .batch_normalization(x, mean, var, beta, gamma, eps) return normed
This is the whole content of this article.