1. Why look at gradients
For beginners the network often doesn't converge, and the loss is strange (it just doesn't converge), so I suspect it's a problem with the gradient in backpropagation
(1) The number (absolute value) after the derivation is getting smaller and smaller (tends to 0), this is the gradient vanishes
(2) The number (of absolute values) after the derivation is getting bigger and bigger (extraordinarily large and divergent), this is the gradient explosion
So, when the loss is not normal, you can see whether the gradient is in the explosion, or is disappeared, the gradient explosion, then, the network in the W will also be very large, manually control a little bit (initialization time to make a little bit, etc. There are certainly other methods, just I do not know, know the gods can also tell me a little bit ~~), if the gradient is disappeared, you can try to use resnet, densenet, and so on, but the gradient is not normal. densenet, etc.
2. how to view gradient in tensorflow
(y,x) here is the derivative of y with respect to x (dy/dx), x and y must be related to Oh!
Directly (y_, weight1) will do~
Above this tensorflow view gradient way is all I have shared with you, I hope to give you a reference, and I hope you support me more.