-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update the model structure and goals
I just generate a new method for the evaluation and loading of the data, a new method to process and the variational method, just trying to simplify the understanding and looking for a more confidence and stable method for the results Co-Authored-By: Dong Han <dong.han@uconn.edu>
- Loading branch information
Showing
3 changed files
with
2,188 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
b3b4b4d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decoder implementation builds a convolutional neural network to map a latent representation to an image, by gradually upsampling and increasing the number of channels.
Some key points:
It first maps the latent code to a hidden representation using a fully-connected layer. This acts as the "bottleneck" that compresses information.
It then reshapes the hidden representation into a 3D tensor and uses transpose convolutions (similar to upsampling) to increase spatial dimensions. Each transpose conv gradually increases the (width, height) while reducing number of channels.
Final layers upsample to reach desired (width, height) output, and event_dims handles the channel dimension as probabilistic event dimensions.
This overall structure is commonly used in variational autoencoders (VAEs) to map a compressed latent representation back to the original input distribution. Some relevant papers that utilize similar architectures:
Auto-Encoding Variational Bayes (https://arxiv.org/abs/1312.6114) - Original VAE paper
Neural Discrete Representation Learning (https://arxiv.org/abs/1711.00937) - Uses transposed convolutions in decoder
Understanding disentangling in β-VAE (https://arxiv.org/abs/1804.03599)
ReLU Activations
The ReLU activation function is applied after each transposed convolution, for example:
nn.ConvTranspose2d(128, 64, 5, 2),
nn.ReLU(),
The ReLU simply thresholds the activations from the previous layer at 0, so:
ReLU(x) = max(0, x)
This introduces non-linearities which helps the model learn more complex mappings from the latent representation to the image. Without the ReLU (or other activation function), the model would be limited to only linear transformations, reducing its representational capacity.
The non-linearity helps the network warp and reconstruct more accurate images, as well as propagate useful gradients during training.
Transposed Convolutions
The transposed convolution layers upsample the spatial dimensions using learned convolution filters. Some key points:
They allow upsampling and expanding image spatial size in an learnable way through optimized convolution kernels.
Each transpose conv gradually increases the (width, height) while reducing number of channels, to reach final image size.
Parameters like kernel size, stride and padding determine the exact upsampling factor.
So the transposed convolutions provide an efficient way to learn an upsampling pathway for the compressed latent representation, increasing dimensions until we generate the final image.
NOTE: I don't use residual layers and I know that those can be deeper but we have a signal that is not too much complex or that requires more layers. Maybe I am doing something wrong but I'll note it.
b3b4b4d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I did not know the importance of ReLU layer and forgot to add it after each BatchNorm layer. I will ask Darren to try it tomorrow @dac20022 .
Dear Darren @dac20022 ,
Please save the training loss of the models with ReLU everywhere right after the BN layer and compare the training loss. For each of the six training loss line, we should have a ReLU version of it, and please plot the training loss curves if possible.
https://github.uconn.edu/dac20022/Pulsewatch_autoencoder/blob/Cassey/Time_Frequency/plot_Losslists.m
ResNet V2 model structure with ReLU layer everywhere:
https://arxiv.org/abs/1603.05027
https://github.com/pytorch/vision/blob/c1e2095c3a16fbe7db25b9e2f206025488c2c203/torchvision/models/resnet.py#L94C11-L94C11