Skip to content

Commit 7856b40

Browse files
committed
add basic info on training the models
1 parent adae90b commit 7856b40

File tree

1 file changed

+29
-0
lines changed
  • ImageNet/training_scripts/imagenet_training

1 file changed

+29
-0
lines changed

ImageNet/training_scripts/imagenet_training/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,35 @@ The root folder of the repository contains reference train, validation, and infe
1212
### training Code
1313
The code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL / LGPL conflicts. That said, it is your responsibility to ensure you comply with licenses here and conditions of any dependent licenses. Where applicable, I've linked the sources/references for various components in docstrings. If you think I've missed anything please create an issue.
1414

15+
### How to train the models:
16+
All of the simplenet variants have been trained in the same way, using the same basic training regime, the only difference between them are just:
17+
1.different weight decay
18+
2.different dropout rates or
19+
3.disabling label smoothing
20+
21+
Since I had access to a single GPU, and training models take a huge amount of time, I had to come up with a plan to shorten that time.
22+
to this end, I first train a variant without any dropout, with a specified weight decay,
23+
then periodocally save checkpoints (aside from the top10 best checkpoints) so that when a model platued, I can resume
24+
from a recent checkpoint with dropout, or a slightly different weight decay. this is not ideal but works well enough when you have no access to decent hardware.
25+
26+
the models are trained like this:
27+
e.g lets try to train the simplenetv1_5m_m2 variant, we start by:
28+
```
29+
./distributed_train.sh 1 /media/hossein/SSD_IMG/ImageNet_DataSet/ --model simplenet --netidx 0 --netscale 1.0 --netmode 2 -b 256 --sched step --epochs 900 --decay-epochs 1 --decay-rate 0.981 --opt rmsproptf --opt-eps .001 -j 20 --warmup-lr 1e-3 --weight-decay 0.00003 --drop 0.0 --amp --lr .0195 --pin-mem --channels-last --model-ema --model-ema-decay 0.9999
30+
```
31+
then when you see signs of overfitting, you resume from a recent checkpoint with dropout e.g. (for this case I resumed from epoch 251) with these changes(slightly lowered weight decay, added dropout and removed label smoothing):
32+
```
33+
./distributed_train.sh 1 /media/hossein/SSD_IMG/ImageNet_DataSet/ --model simplenet --netidx 0 --netscale 1.0 --netmode 2 -b 256 --sched step --epochs 900 --decay-epochs 1 --decay-rate 0.981 --opt rmsproptf --opt-eps .002 -j 20 --warmup-lr 1e-3 --weight-decay 0.00002 --drop 0.0 --amp --lr .0195 --pin-mem --channels-last --model-ema --model-ema-decay 0.9999 --resume output/train/20221204-092911-simpnet-224_simplenetv1_2_netmode2_wd3e-5/checkpoint-251.pth\ \(copy\).tar --drop-rates '{"11":0.02,"12":0.05,"13":0.05}' --smoothing 0.0
34+
```
35+
then we take the average of some of the best checkpoints so far, and if we are not satisfied, we can resume with the average weights and train more. we achieved 71.936 this way.
36+
37+
Final notes:
38+
-- All variants are trained using the same training regime with batch-size of 256 on a single GPU.
39+
-- The small variants such as 1.5m ones, are trained with weight decay of 1e-5, larger models usually use 2e-5 or 3e-5
40+
depending on whether dropout is used or not.
41+
42+
With decent hardware one should hopefully be able to achieve higher accuracy. If time permits I'll try to improve upon the results
43+
1544
## Citing
1645

1746
### BibTeX

0 commit comments

Comments
 (0)