There are a lot of different type of layer in NN.
Those are physically identical? What's their topological difference?
- Fully connected layer
- Attention layer
- ...
How do other people think about model train efficiency?
Just Number of resource x running time?
Let's investigate about it and fix the target metric.
How can I physically abstractify loss function as global potential?
- Cross entropy
- RMSE
- ...
What's physical meaning of sharing weight gradient?
What happen when I apply different form of data sharing?
Clarify the random walk behavior when train use
- SGD optimizer
- Adam optimizer
- ...
More over, can we find that which optimizer can make agent as active particle?
To test train method automatically, we need to build the train protocol.
It should satisfy followings.
- Connect to AWS instance / run training
- Abstraction of model and training
- Evaluate train result