26 lines
990 B
Markdown
26 lines
990 B
Markdown
# ParallelTrainMethod
|
|
|
|
What is the best way to train parallely.
|
|
The most common way to train model in parallel is shared weight gradient.
|
|
Is this way the most efficient way?
|
|
This repository want to research the several way with physical insight.
|
|
|
|
## Introduction
|
|
For the single model training, model weight moves like random walk with SGD optimizer.
|
|
So, the train procedure is actually random target search with multiple target.
|
|
|
|
For the case of parallel model training, shared weight gradient can be handled as the interaction between agents.
|
|
In this case, the problem is multiple agents random target search with interaction.
|
|
|
|
In general, the search behavior depends on
|
|
- Domain: Topology of Weight space
|
|
- Random walk behavior: Gaussian, Levy walk, Active particle etc
|
|
- Potential
|
|
- Global: Loss function
|
|
- Pairwise: Interaction between agents
|
|
- Etc
|
|
- Initial configuration
|
|
- Restart rate
|
|
|
|
I will investigate the efficiency of model training among those variables.
|