ParallelTrainMethod/README.md

# ParallelTrainMethod

What is the best way to train parallely.
The most common way to train model in parallel is shared weight gradient.
Is this way the most efficient way?
This repository want to research the several way with physical insight.

## Introduction
For the single model training, model weight moves like random walk with SGD optimizer.
So, the train procedure is actually random target search with multiple target.

For the case of parallel model training, shared weight gradient can be handled as the interaction between agents.
In this case, the problem is multiple agents random target search with interaction.

In general, the search behavior depends on
- Domain: Topology of Weight space
- Random walk behavior: Gaussian, Levy walk, Active particle etc
- Potential
    - Global: Loss function
    - Pairwise: Interaction between agents
- Etc
    - Initial configuration
    - Restart rate

I will investigate the efficiency of model training among those variables.