You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ImageNet/training_scripts/imagenet_training/README.md
+29Lines changed: 29 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,35 @@ The root folder of the repository contains reference train, validation, and infe
12
12
### training Code
13
13
The code here is licensed Apache 2.0. I've taken care to make sure any third party code included or adapted has compatible (permissive) licenses such as MIT, BSD, etc. I've made an effort to avoid any GPL / LGPL conflicts. That said, it is your responsibility to ensure you comply with licenses here and conditions of any dependent licenses. Where applicable, I've linked the sources/references for various components in docstrings. If you think I've missed anything please create an issue.
14
14
15
+
### How to train the models:
16
+
All of the simplenet variants have been trained in the same way, using the same basic training regime, the only difference between them are just:
17
+
1.different weight decay
18
+
2.different dropout rates or
19
+
3.disabling label smoothing
20
+
21
+
Since I had access to a single GPU, and training models take a huge amount of time, I had to come up with a plan to shorten that time.
22
+
to this end, I first train a variant without any dropout, with a specified weight decay,
23
+
then periodocally save checkpoints (aside from the top10 best checkpoints) so that when a model platued, I can resume
24
+
from a recent checkpoint with dropout, or a slightly different weight decay. this is not ideal but works well enough when you have no access to decent hardware.
25
+
26
+
the models are trained like this:
27
+
e.g lets try to train the simplenetv1_5m_m2 variant, we start by:
then when you see signs of overfitting, you resume from a recent checkpoint with dropout e.g. (for this case I resumed from epoch 251) with these changes(slightly lowered weight decay, added dropout and removed label smoothing):
then we take the average of some of the best checkpoints so far, and if we are not satisfied, we can resume with the average weights and train more. we achieved 71.936 this way.
36
+
37
+
Final notes:
38
+
-- All variants are trained using the same training regime with batch-size of 256 on a single GPU.
39
+
-- The small variants such as 1.5m ones, are trained with weight decay of 1e-5, larger models usually use 2e-5 or 3e-5
40
+
depending on whether dropout is used or not.
41
+
42
+
With decent hardware one should hopefully be able to achieve higher accuracy. If time permits I'll try to improve upon the results
0 commit comments