ARM-Stator-Vanes-Deliverables

Pre-Training Image Preprocessing

Before training, raw images must be masked and cropped to extract the in-focus subregions. This can be done quickly on a batch of raw images with the included Batch_Bitmask_Images.m script. This script must be run on Matlab release 2019b or later.

The Batch_Bitmask_Images.m script takes 4 inputs, defined at the top of the script:

bmsk_input_dir defines the folder location where the bitmasks are located. Each bitmask should be saved as a .JPG file by its bitmask identifying number in the following format: POSE(bmskID).JPG
img_input_dir defines the folder location where the raw images to be processed are located. Each image should be saved as a .JPG file with its corresponding region and part identifying numbers in the following format: part_(partID)_subregion_(regionID).JPG
poses_in_waypoints is a cell array that defines the mapping between regions and waypoints. Each element of the cell array contains a numeric array of regions that are captured from the waypoint (camera/part pose) of the associated cell array index. For example, {[2], [1 3]} indicates that subregion 2 is captured from waypoint 1, and subregions 1 and 3 are captured from waypoint 2.
img_output_dir defines the folder location where the masked and cropped images will be deposited after processing.

After configuring the above input parameters at the top of the Batch_Bitmask_Images.m script, the script can be run to process all of the raw images for trianing purposes.

Initial creation of bitmasks is done using raw images captured from a part that has subregions physically partitioned and marked. Bitmask creation can be done with GIMP image editing software; more detailed instructions are included in Bitmask Creation Procedure.pdf

Training neural network

Running Environment

'numpy', 'pandas', 'tensorflow', 'matplotlib', 'csv', 'datetime' are required to be installed before running the code.
Run the command line 'pip install tensorflow' in the python terminal window to install the library. Same for other library.

Configuration Steps

For the training, a folder containing all the images and a csv.file containing all the labels are needed to be read in the data loading.

Define the path for images and csv.file

# folder path of images
DATADIR = '/content/grad/My Drive/Colab Notebooks/GKN456_0119_EN'

# file path of csv file
csv_path = '/content/grad/My Drive/Colab Notebooks/GKN456_EN.csv'

Set the destination folder
To save the training results and trained models

save_dir = '/content/grad/My Drive/Colab Notebooks'

Fine Tuning with the training parameters
The two major parameters which need to be changed are minibatchsize and max_epoch iteration number. The default minibatch size is set to 64.
However, it can be adjusted based on the number of sample training images. The recommended ratio between the minibatch size and the number of sample training images is 3:20. The default max epoch size is 30 epochs. If the training process plot does not converge after max epoch, then increase max epoch size gradually.

batch_size(Line 144 & 153)

batch_size=64

Max_epochs(Line 169)

history = Model_Training(train_data_generator,valid_data_generator,150)

Input & Output

Naming Convention
The most important principle to be followed should be to ensure the filename of every images are unique. Actually one unique number is enough for one image, like 5069.jpg. It does not have to be Part01_section19.jpg, which probably could make trouble by overlaping the names of different datasets. However, the information about part number and section number should be documented in the excel file detailedly.
Data File

Data files consist of two components: One is the image folder and another one is the csv.file.

Image Folder

Image folder contains all the images, bad and good, with unique file name.

csv.file

It is an excel file that documented the part information, including the dataset No., part No., section No., filename and labels.

The most necessary information in the training would be filename and the label. They will be quoted in the code, so make sure the name is accurate in the csv.file.

filename & label

Training Data and Validation Data

In the training process, the training section and validation section have been preset in the 5-fold cross validation training to make the best use of the existing data. We thus can put all the data in one floder for training process.
If we want to validate the data from other source with the pretrained model, then we should apply the trained network only on those validation data to test the performances.

Run the python file ImageInspector.py to validate the trained models with data from other source.

Trained Model and Training Results

The trained models would be saved as the weight_file and model_file. network_models:
- model file: model1.json weight file: model_weight1.h5
- model file: model2.json weight file: model_weight2.h5
- model file: model3.json weight file: model_weight3.h5
- model file: model4.json weight file: model_weight4.h5
- model file: model5.json weight file: model_weight5.h5
The training results woudl be saved as a csv.file, showing the preditions of every 5 models and corresponding probability for every images.

Testing with The Trained Models As we obtain the trained models from the training, we could test it with all the model files. test.py, which have been integrated into the software package. Here is some instructions for preliminary validation.

And please the define the file path of the model you saved in the config.yml and the file path of the test images in the test.py.

To clarrify: this section is only for extra preliminary testing with the trained models, whcih is not a necessary action. In the actual practice, it has been integrated into the software packages and we just need to replace the old trained model files with the new trained model files we obtained.

dscl/ARM-Stator-Vanes-Deliverables

About

Resources

Stars

Watchers

Forks

Releases

Languages