Update README.md
This commit is contained in:
24
README.md
24
README.md
@@ -1,3 +1,25 @@
|
||||
# ParallelTrainMethod
|
||||
|
||||
What is the best way to train parallely.
|
||||
What is the best way to train parallely.
|
||||
The most common way to train model in parallel is shared weight gradient.
|
||||
Is this way the most efficient way?
|
||||
This repository want to research the several way with physical insight.
|
||||
|
||||
## Introduction
|
||||
For the single model training, model weight moves like random walk with SGD optimizer.
|
||||
So, the train procedure is actually random target search with multiple target.
|
||||
|
||||
For the case of parallel model training, shared weight gradient can be handled as the interaction between agents.
|
||||
In this case, the problem is multiple agents random target search with interaction.
|
||||
|
||||
In general, the search behavior depends on
|
||||
- Domain: Topology of Weight space
|
||||
- Random walk behavior: Gaussian, Levy walk, Active particle etc
|
||||
- Potential
|
||||
- Global: Loss function
|
||||
- Pairwise: Interaction between agents
|
||||
- Etc
|
||||
- Initial configuration
|
||||
- Restart rate
|
||||
|
||||
I will investigate the efficiency of model training among those variables.
|
||||
|
||||
Reference in New Issue
Block a user