GoogLeNet (Inception V1) is a convolutional neural network designed for efficient image classification. It uses the Inception module to process multiple filter sizes in parallel, improving feature extraction while keeping computation low.
- Inception modules combine 1×1, 3×3, 5×5 convolutions and pooling in parallel
- Uses 1×1 convolutions and global average pooling to reduce computation and parameters
- Designed to achieve high accuracy with efficient use of resources
Key Features of GoogLeNet
1. 1×1 Convolutions
GoogLeNet uses 1×1 convolutions mainly for dimensionality reduction, which reduces computation and the number of trainable parameters while preserving important features.
Example Comparison:
- Without 1×1 Convolution:(14×14×48)×(5×5×480)=112.9M operation

- With 1×1 Convolution:(14×14×16)×(1×1×480)+(14×14×48)×(5×5×16)=5.3M operations

This results in a major reduction in computation without loss of performance.
2. Global Average Pooling
Instead of fully connected layers, GoogLeNet uses Global Average Pooling, which averages each feature map into a single value.
- Eliminates large number of parameters
- Reduces overfitting
- Improves generalization and accuracy
3. Inception Module
The Inception module is the core building block of GoogLeNet. It applies multiple operations in parallel:
- 1×1 convolutions
- 3×3 convolutions
- 5×5 convolutions
- 3×3 max pooling
All outputs are concatenated to capture multi-scale features efficiently without increasing computation significantly.

4. Auxiliary Classifiers
To reduce vanishing gradient problems, GoogLeNet uses auxiliary classifiers during training.
Each classifier includes:
- Average pooling
- 1×1 convolution
- Fully connected layers
- Softmax output
These help stabilize training and improve generalization.
5. Model Architecture
GoogLeNet is a 22-layer deep network (excluding pooling layers) that emphasizes computational efficiency, making it feasible to run even on hardware with limited resources. Below is Layer by Layer architectural details of GoogLeNet.

The architecture also contains two auxiliary classifier layer connected to the output of Inception (4a) and Inception (4d) layers.
Inception V1 architecture
- Input Layer: Accepts a 224×224 RGB image
- Initial Convolutions and Pooling: Applies convolution and max pooling layers to extract low-level features and reduce spatial dimensions
- Local Response Normalization (LRN): Normalizes feature maps early to improve generalization
- Inception Modules: Apply 1×1, 3×3, 5×5 convolutions and 3×3 max pooling in parallel, then concatenate outputs to capture multi-scale features
- Auxiliary Classifiers: Intermediate branches with pooling, convolutions, fully connected layers, and softmax used to improve training stability
- Final Layers: Uses global average pooling followed by a fully connected layer and softmax for final classification
Performance and Results
- Winner of ILSVRC 2014 in both classification and detection tasks
- Achieved a top-5 error rate of 6.67% in image classification
- An ensemble of six GoogLeNet models achieved 43.9% mAP (mean Average Precision) on the ImageNet detection task
Related Articles