Question: If you have 10 filters that are 3×3×3 in one layer of a neural network, how many parameters does that layer have?
If layer l is a convolution layer:
LeCun et al., 1998. Gradient-based learning applied to document recognition
Activation Shape | Activation Size | # Parameters | |
---|---|---|---|
Input | (32, 32, 1) | 1024 | 0 |
CONV1 (f=5, s=1) | (28, 28, 6) | 6272 | 5×5×6+6=156 |
POOL1 (f=2, s=2) | (14, 14, 6) | 1176 | 0 |
CONV2 (f=5, s=1) | (10, 10, 16) | 1600 | 5×5×6×16+16=2416 |
POOL2 (f=2, s=2) | (5, 5, 16) | 1176 | 0 |
FC3 | (120, 1) | 120 | 400×120+1=48001 |
FC4 | (84, 1) | 84 | 120×84+1=10081 |
Softmax | (10, 1) | 10 | 84×10+1=841 |
Convolution
Pooling
Fully Connected
Typical keras
workflow:
fit()
method of your model# Define model structure
cnn_model <- keras_model_sequential() %>%
layer_conv_2d(filters = 32, kernel_size = c(3, 3),
activation = "relu", input_shape = input_shape) %>%
layer_max_pooling_2d(pool_size = c(2, 2)) %>%
layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
layer_dropout(rate = 0.25) %>%
layer_flatten() %>%
layer_dense(units = 128, activation = "relu") %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = num_classes, activation = "softmax")
# Compile model
cnn_model %>% compile(
loss = loss_categorical_crossentropy,
optimizer = optimizer_adadelta(),
metrics = c('accuracy')
)
# Train model
cnn_history <- cnn_model %>%
fit(
x_train, y_train,
batch_size = batch_size,
epochs = epochs,
validation_split = 0.2
)
LeNet -5: LeCun et al., 1998. Gradient-based learning applied to document recognition
AlexNet: Krizhevsky et al., 2012. ImageNet Classification with Deep Convolutional Neural Networks
VGG-16: Simonyan & Zisserman 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition
ResNets: He et al., 2015. Deep Residual Learning for Image Recognition
NASnet: 1800 GPU days (5 yrs on 1 GPU)
AmoebaNet: 3150 GPU days
DARTS: 4 GPU days
ENAS: 1000 x cheaper than standard NAS