Matching images using Image Hashing

This is a brief post of my notes describing the process to match similar images in an archive of photographs. I am using the techniques described by Adrian Rosebrock in his excellent article Image hashing with OpenCV and Python. The images used are from the Pompeii Artistic Landscape Project and provided courtesy of Pompeii in Pictures.

Image hashing is a process to match images through the use of a number that represents a very simplified form of an image, like this one below.

Original image before image hashing. Images courtesy of Pompeii in Pictures.

First, the color of the image is simplified. The image is converted to grayscale. See below:

Image converted to grayscale. Images courtesy of Pompeii in Pictures.

Next, the image is simplified by size. It is resized 9 pixels wide by 8 pixels high.

Image resized to 9 pixels wide by 8 pixels high. The green and red rectangles are relevant to describe the next step.

Adrian Rosebrock uses a differential hash based on brightness to create a binary number of 64 bits. Each bit is 1 or 0. Two pixels next to each other horizontally are compared: left and right. If right is brighter, bit = 1. Bit = 0 if left is brighter. See below:

The result of the image hash for the image above. The 1 inside the green square is the result of the comparison between the 2 pixels in the green rectangle in the picture above. The same thing is true for the o inside the red square. Inside the red rectangle two images above, the pixel on the left is brighter, so 0 is the result.

This process produces a 64bit binary number: 0101001100000011101001111000101110011101000011110000001001000011

This converts to decimal 5981808948155449923.

Matches

A match of copies of an image.
An interesting match of similar images.

References

Dunn, Jackie and Bob Dunn. Pompeii in Pictures.

Rosebrock, Adrian. Image hashing with OpenCV and Python.

Building a Wall Construction Detection Model with Keras.

I am building a project to detect wall construction types from images of Pompeii. I am using Waleed Abdulla’s Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. Also, I employ the technique described by Jason Brownlee in How to Train an Object Detection Model with Keras to detect kangaroos in images. Instead of kangaroos, I want to detect the type of construction used in building walls.

This is a brief post describing the preparation of images for training as well as the initial results. The images used are from the Pompeii Artistic Landscape Project and provided courtesy of Pompeii in Pictures. The original images were photographed by Buzz Ferebee and they have been altered by the program used for predictions. An example of an image showing the model’s detection of construction type opus incertum is below. Cinzia Presti created the data used to select the images for training.

The red rectangles note the model’s prediction of opus incertum as a wall construction type. Image courtesy of Pompeii in Pictures. Originally photographed by Buzz Ferebee.

To build this model, images were selected for training. Given the construction type is visible in only parts of each image, rectangles in each image show where the construction type is visible.

Image showing areas designated for training the model to detect opus incertum. File name: 00096.jpg. Image courtesy of Pompeii in Pictures. Originally photographed by Buzz Ferebee.

Each of the images has a corresponding xml file containing the coordinates of the rectangles that contain the objects used to train on. See file 00096.xml below:

<annotation>
	<folder>opus_incertum</folder>
	<filename>00096.jpg</filename>
	<path>/home/student/data_5000_project/data/images/construction_types/raw/opus_incertum/pompeiiinpictures Ferebee 20600 May 2016 DSCN8319.JPG</path>
	<source>
		<database>pompeiiinpictures.com</database>
	</source>
	<size>
		<width>1024</width>
		<height>768</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>opus_incertum</name>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>16</xmin>
			<ymin>579</ymin>
			<xmax>257</xmax>
			<ymax>758</ymax>
		</bndbox>
	</object>
	<object>
		<name>opus_incertum</name>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>507</xmin>
			<ymin>563</ymin>
			<xmax>703</xmax>
			<ymax>749</ymax>
		</bndbox>
	</object>
	<object>
		<name>opus_incertum</name>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>829</xmin>
			<ymin>570</ymin>
			<xmax>1007</xmax>
			<ymax>752</ymax>
		</bndbox>
	</object>
</annotation>

The program to create the xml annotation files also saves images using a standard numeric file name (ex.: 00001.jpg) and width of 1024 pixels.

Initial Results

The “Actual” column of images below shows images used in training the model. The white rectangles show the boundary boxes contained in the corresponding xml file for the image. Some images don’t have a white rectangle. These images were deemed by me not to have a good enough sample for training so I didn’t make an xml file for them.

The “Predicted” column shows what the model considers to be opus incertum construction. Frequently it’s correct. It does make errors too, considering the blue sky in row 5 is recognized as stone work. I want to see if further training can correct this.

A couple things to note: It’s bad practice to run a model on images used to train it, but I am doing this here to verify it’s functioning. Later, I also need to see how the model performs on images with no opus incertum.

References

Abdulla, Waleed. Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. GitHub repository. Github, 2017. https://github.com/matterport/Mask_RCNN.

Brownlee, Jason. How to Train an Object Detection Model with Keras. Machine Learning Mastery. https://machinelearningmastery.com/how-to-train-an-object-detection-model-with-keras/.

Dunn, Jackie and Bob Dunn. Pompeii in Pictures.

Ferebee, Buzz. Pompeii Photographic archive. 2015-2017

Presti, Cinzia. Image Classfication Workspace.

Using Box.com’s API to get images.

This is a note about how I connected to box.com using its API so that a python program could download images and meta data.

Details of the API are here https://developer.box.com/en/guides/authentication/oauth2/with-sdk/

To connect to Box, I needed to make an app. See the link “My Apps” in the SDK link above.

Click My Apps

Create a new app.

Click Create New App

Give your app a name.

Give your app a name.

I used OAuth 2.0 Authentication

I used standard OAuth 2.0. You will need your Client ID and Client Secret later in this process. Protect this information and don’t put it directly in your code.
I used my website for a Redirect URI and limited the Scope to Read only.

I put the client_id and client_secret values into a json file that looks like this:

{
"client_id":"ryyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
"client_secret":"Vzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
}

Here is the code to connect:

!pip install boxsdk
!pip install auth
!pip install redis
!pip install mysql.connector
!pip install requests

from boxsdk import OAuth2

import json
#Set the file we want to use for authenticating a Box app
#The json file stores the client_id and client_secret so we don't have it in the code.
# The json file looks like this:
#{
#"client_id":"___the_codes_for_client_id___",
#"client_secret":"___the_codes_for_client_secret___"
#}

oauth_settings_file = 'C:\\ProgramData\\box_app_test.json'
with open(oauth_settings_file, "r") as read_file:
    oauth_data = json.load(read_file)
print(oauth_data["client_id"])
print(oauth_data["client_secret"])

oauth = OAuth2(
    client_id=oauth_data["client_id"],
    client_secret=oauth_data["client_secret"]
)

auth_url, csrf_token = oauth.get_authorization_url('https://jeffblackadar.ca')
print("Click on this:")
print(auth_url)
print(csrf_token)
print("Copy the code that follows code= in the URL.  Paste it into the oauth.authenticate('___the_code___') below.  Be quick, the code lasts only a few seconds.")

I ran the code above in a Jupyter notebook. The output is:

ryyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
 Vzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
 Click on this:
 https://account.box.com/api/oauth2/authorize?state=box_csrf_token_Qcccccccccccccccccccccc&response_type=code&client_id=ryyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy&redirect_uri=https%3A%2F%2Fjeffblackadar.ca
 box_csrf_token_Qcccccccccccccccccccccc
 Copy the code that follows code= in the URL.  Paste it into the oauth.authenticate('the_code') below.  Be quick, the code lasts only a few seconds.

You will notice the Redirect URI set above appears when the URL above is clicked. But first you must authenticate with Box.com using your password to make sure only authorized users read your content.

Log in with the user ID that has access to your content.
Click Grant access to Box
Copy the code (but just the code.) Paste it into the python program below.

Paste the code above into the statement below. You need to work quickly, the code is valid for a few seconds only. There is a better way to do this, but this is what is working at this time, please let me know of improvements.

from boxsdk import Client

# Make sure that the csrf token you get from the `state` parameter
# in the final redirect URI is the same token you get from the
# get_authorization_url method to protect against CSRF vulnerabilities.
#assert 'THE_CSRF_TOKEN_YOU_GOT' == csrf_token
access_token, refresh_token = oauth.authenticate('qzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz')
client = Client(oauth)

Then run this test. It will list all of the files in the folders on box.com.

def process_subfolder_test(client, folder_id, folder_name):
    print("this folder: "+folder_name)
    items = client.folder(folder_id=folder_id).get_items()
    for item in items:
        print('{0} {1} is named "{2}"'.format(item.type.capitalize(), item.id, item.name))
        if(item.type.capitalize()=="Folder"):
            process_subfolder_test(client, item.id,folder_name+"/"+item.name)
        if(item.type.capitalize()=="File"):
            #print(item)
            print('File: {0} is named: "{1}" path: {2} '.format(item.id, item.name, folder_name+"/"+item.name))            
    return

process_subfolder_test(client, '0',"")

Here is the test output:

this folder: 
Folder 98208868103 is named "lop"
this folder: /lop
Folder 98436941432 is named "1963"
this folder: /lop/1963
File 588118649408 is named "Elizabeth II young 2019-08-10 15_41_20.591925.jpg"
File: 588118649408 is named: "Elizabeth II young 2019-08-10 15_41_20.591925.jpg" path: /lop/1963/Elizabeth II young 2019-08-10 15_41_20.591925.jpg 
File 588114839194 is named "Elizabeth II young 2019-08-10 15_41_52.188758.jpg"
File: 588114839194 is named: "Elizabeth II young 2019-08-10 15_41_52.188758.jpg" path: /lop/1963/Elizabeth II young 2019-08-10 15_41_52.188758.jpg 
File 587019307270 is named "eII2900.png"
File: 587019307270 is named: "eII2900.png" path: /lop/eII2900.png 
File 587019495720 is named "eII2901.png"
File: 587019495720 is named: "eII2901.png" path: /lop/eII2901.png 
File 587019193229 is named "eII2903.png"
File: 587019193229 is named: "eII2903.png" path: /lop/eII2903.png 

Performance of model parameters

I processed 1458 models in this spreadsheet (see models tab). As mentioned in my previous post, these are the parameters:

  • model_number – the identification number
  • batch_size – the size of the batch. 8 or 16
  • filters1 – the number of filters for layer 1. (possible values 32,64 or 128)
model.add(Conv2D(filters=filters1, 
  • dropout1 – Dropout (if greater than 0) (possible values 0,0.25,0.5)

if(dropout1>0):
model.add(Dropout(dropout1))

  • filters2 – the number of filters for layer 2. (32,64 or 128)
  • dropout2 – dropout for layer 2. (0,0.25,0.5)
  • filters3 – the number of filters for layer 3. (32,64 or 128)
  • dropout3 – dropout for layer 3. (0,0.25,0.5)
  • loss – the result of running the model.
  • accuracy – (as above.)

A review of the spreadsheet shows that many of the models I ran have poor accuracy, even as low as 1:3 (0.333333333333333) to predict a match between three coin obverse portraits (Elizabeth II, George VI and Abraham Lincoln). I did find some models with an accuracy above 80% yet I wanted to see if there were patterns I could use to improve my set of models. So I used a Seaborn heatmap of the models (below) for batch sizes of 8,16 and both together.

Heatmap of models 2 – 730 (batch size = 8). There is a slightly negative relationship between accuracy and dropout1. It is possible it would be more efficient to use dropout values of 0 or 0.25 and not 0.5.
Heatmap of models 731 – 1459 (batch size = 16).
Heatmap of models 2 – 1459 (batch sizes of 8,16).

The heatmap for loss and accuracy to the model parameters in the last two rows shows there is a slightly negative relationship between accuracy and dropout1. It is possible it would be more efficient to use dropout values of 0 or 0.25 and not 0.5 when running these models again. It also seems like there is a slightly positive relationship between batch size and accuracy possibly indicating that larger batch sizes may lead to more accurate models. I have been running a set of models with a batch size of 32 to see if this pattern becomes stronger. (same spreadsheet, models tab.) I am also going to validate my approach through additional personal (not machine) learning.

Regularizing the image recognition model.

In Deep Learning for Python, François Chollet provides a “universal workflow of machine learning”. (Chapter 4, page 114.) I have been using his steps to seek the best performing image recognition model. I tried iterations of various models with different numbers of layers, filters and dropouts. An example of a model that did not provide a satisfactory level of accuracy is below.

def createModel5fail2():

    #tried kernel_size=(5,5), 

    from keras import models
    model = models.Sequential()    

    model.add(Conv2D(filters=32, 
               kernel_size=(5,5), 
               strides=(1,1),
               padding='same',
               input_shape=(image_width, image_height,NB_CHANNELS),
               data_format='channels_last'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))

    model.add(Conv2D(filters=64,
               kernel_size=(5,5),
               strides=(1,1),
               padding='valid'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
    
    model.add(Flatten())        
    model.add(Dense(128))
    model.add(Activation('relu'))

    model.add(Dropout(0.25))
    
    #number of classes
    # 1,0,0 E-II
    # 0,1,0 G-VI
    # 0,0,1 G-VI
    model.add(Dense(3, activation='softmax'))

    return model 

In order to be more methodical and record my results I added a spreadsheet of model parameters (see models tab). These are the parameters:

  • model_number – the identification number
  • batch_size – the size of the batch. 8 or 16
  • filters1 – the number of filters for layer 1.
model.add(Conv2D(filters=filters1, 
  • dropout1 – Dropout (if greater than 0)

if(dropout1>0):
model.add(Dropout(dropout1))

  • filters2 – the number of filters for layer 2.
  • dropout2 – dropout for layer 2.
  • filters3 – the number of filters for layer 3.
  • dropout3 – dropout for layer 3.
  • loss – the result of running the model.
  • accuracy – (as above.)

The code to create the spreadsheet of parameters is here. (It’s just nested loops.) Below is the code to create a model from parameters fed from the spreadsheet. In the course of writing up this post, I found 2 bugs in the code below that are now corrected. Because of the bugs I need to re-run my results.

def createModelFromSpreadsheet():

    from keras import models
    model = models.Sequential()
    

    model.add(Conv2D(filters=filters1, 
               kernel_size=(2,2), 
               strides=(1,1),
               padding='same',
               input_shape=(image_width, image_height,NB_CHANNELS),
               data_format='channels_last'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))
    if(dropout1>0):
        model.add(Dropout(dropout1))
    
    model.add(Conv2D(filters=filters2,
               kernel_size=(2,2),
               strides=(1,1),
               padding='valid'))
    model.add(Activation('relu'))
    model.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))

    if(dropout2>0):
        model.add(Dropout(dropout2))

    if(filters3>0): 
        model.add(Conv2D(filters=filters3,
               kernel_size=(2,2),
               strides=(1,1),
               padding='valid'))
        model.add(Activation('relu'))
        model.add(MaxPooling2D(pool_size=(2,2),
                     strides=2))

        if(dropout3>0):
            model.add(Dropout(dropout3))

    
    model.add(Flatten())        
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dropout(0.25))
    
    #number of classes
    # 1,0,0 E-II
    # 0,1,0 G-VI
    # 0,0,1 G-VI
    model.add(Dense(3, activation='softmax'))

    return model
  

Below is the code to loop through each row of the model spreadsheet, create a model from the parameters, fit it and record the result.

number_of_models_to_run = 40
for number_of_models_to_run_count in range (0,number_of_models_to_run):
    model_row = int(worksheet_config.cell(1, 2).value)

    BATCH_SIZE = int(worksheet_models.cell(model_row, 2).value,0) #, 'batch_size')
    filters1 = int(worksheet_models.cell(model_row, 3).value,0) #, 'filters1')
    dropout1 = float(worksheet_models.cell(model_row, 4).value) #, 'dropout1')
    filters2 = int(worksheet_models.cell(model_row, 5).value,0) #, 'filters2')
    dropout2 = float(worksheet_models.cell(model_row, 6).value) #, 'dropout2')
    filters3 = int(worksheet_models.cell(model_row, 7).value,0) #, 'filters3')
    dropout3 = float(worksheet_models.cell(model_row, 8).value) #, 'dropout3')

    print(str(model_row)+" "+str(BATCH_SIZE)+" "+str(filters1)+" "+str(dropout1)+" "+str(filters2)+" "+str(dropout2)+" "+str(filters3)+" "+str(dropout3))
    # NB_CHANNELS = # 3 for RGB images or 1 for grayscale images
    NB_CHANNELS =  1
    NB_TRAIN_IMG = 111
    # NB_VALID_IMG = # Replace with the total number validation images  
    NB_VALID_IMG = 54


    #*************
    #* Change model
    #*************
    model2 = createModelFromSpreadsheet()
    model2.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
    model2.summary()

    epochs = 100

    # Fit the model on the batches generated by datagen.flow().
    history2 = model2.fit_generator(datagen.flow(tr_img_data , tr_lbl_data, batch_size=BATCH_SIZE),
                                  #steps_per_epoch=int(np.ceil(tr_img_data .shape[0] / float(batch_size))),
                                  steps_per_epoch=NB_TRAIN_IMG//BATCH_SIZE,
                                  epochs=epochs,
                                  validation_data=(val_img_data, val_lbl_data),
                                  validation_steps=NB_VALID_IMG//BATCH_SIZE,
                                  shuffle=True,
                                  workers=4)

    evaluation = model2.evaluate(tst_img_data, tst_lbl_data)
    print(evaluation)
    print(evaluation[0])
    #record results
    worksheet_models.update_cell(model_row, 10, evaluation[0])
    worksheet_models.update_cell(model_row, 11, evaluation[1])
    worksheet_config.update_cell(1, 2, str(model_row+1))
    if(evaluation[1]>0.75):
        number_of_models_to_run_count = number_of_models_to_run
        print("Good Model - stopped")

In the course of running these models, I had a model that provided 77% image recognition accuracy when tested and so I saved the weights. Due to the bugs I found I am re-running my results now to see if I can reproduce the model and find a better one.

Image Classification – Tuning models

Since the start of September I have been working to improve my image classification model. The positive result is that I have a model that is capable of categorizing 3 different types of coins, however the model is not yet as accurate as it needs to be. For reference here is my working code

Categorizing three different types of coin images.

I have added photos of Abraham Lincoln to the collection of coin photos I am using for training. Each class of photo is “one hot label” encoded to give it an identifier that can be used in the model: 1,0,0 = Elizabeth; 0,1,0 = George VI and 0,0,1 = Abraham Lincoln. (Continuing this pattern, additional classes of coins can be added for training.) Below is the code that does this based on the first three characters of the photo’s file name.

def one_hot_label(img):
label = img.split('.')[0]
label = label[:3]
if label == 'eII':
ohl = np.array([1,0,0])
elif label == 'gvi':
ohl = np.array([0,1,0])
elif label == 'lin':
ohl = np.array([0,0,1])
return ohl
(above) An example of an image of Abraham Lincoln used in training the model. This image has a label of 0,0,1 to indicate that it belongs to the same class as other images of Lincoln. (I am a little concerned that the numbers of the year and letters from “Liberty” will interfere with the training.)

The model I have trained can recognize Abraham Lincoln more times that it does not.

predict_for('/content/drive/My Drive/coin-image-processor/portraits/test/all/linc4351.png')
produced a result of [0. 0. 1.], which is correct. The model fails to accurately predict some of the other images of Lincoln.

Model Accuracy

When training the model I monitor the loss and accuracy for both training and validation. Validation accuracy is where the model checks its effectiveness against a set of validation images. Training accuracy is a measure of how well the model is performing using its training data. A model is functioning well if its training accuracy and validation accuracy are both high .

 Epoch 16/150 13/13 [==============================] - 0s 23ms/step - loss: 0.8050 - acc: 0.5769 - val_loss: 10.7454 - val_acc: 0.3333 

As shown above, at this point in the training of this model, the training accuracy (acc:) is low (57.6%) and the validation accuracy (val_acc:) is even lower (33%). For an image prediction between 3 different types of coins, this model is validated to be as accurate as rolling a die.

A graph of the accuracy of a model over 150 epochs of training.

The red line of the training accuracy in the graph above shows a model that becomes more accurate over time. The accuracy of the model is very low initially, but it does climb almost continuously.

The validation accuracy of the model also begins quite low. Consider the area of the graph inside the magenta box denoted by (T). During this training, val_acc stalls at 33% between epochs 5 and 25. During my experiments with different model configurations, if I saw this stall happen I would terminate the training to save time. Considering what happened here, I should let the models run longer. This model eventually achieved a validation accuracy of 78%, the best result I had in the past couple of days.

Overfitting

The validation accuracy of this model peaks at epoch 88. As it declines, the training accuracy of the model continues a trend to higher accuracy. This is a sign that the model is overfitting and training for features that are present in the training data but won’t be generally present for other images. An overfit model is not useful for recognizing images from outside of its training set. This information is useful since it signifies that this model should be trained for approximately 88 epochs and not 150. At the same time, this particular model still needs work. Even with a validation accuracy of 77%, the model is still likely overfit given it has a training accuracy of 90%. So it is likely that this model will make errors of prediction when used with new images of our coin subjects.

Image Classification – First Results

An objective of this research is to demonstrate a means to automatically classify an image of an artifact using computer vision. I am using a method and code from Dattaraj Rao’s book Keras to Kubernetes: The Journey of a Machine Learning Model to Production. In Chapter 5 he demonstrates the use of machine learning to classify logo images of Pepsi and Coca Cola. I have used his code in an attempt to classify coin images of George VI and Elizabeth II.

Code for this is here: https://github.com/jeffblackadar/image_work/blob/master/Keras_to_Kubernetes_Rao.ipynb

The images I am using are here.

Below are my initial results; the prediction is shown below the image.

[[1.]] Prediction for /content/drive/My Drive/coin-image-processor/photos/george_vi/gvi3330.png: george_vi
[[0.]] Prediction for /content/drive/My Drive/coin-image-processor/photos/elizabeth_young/eII2903.png: elizabeth_young

…So far so good…

[[0.]] Prediction for /content/george_test_1.jpg: elizabeth_young.

[1 – footnote]

As noted above, this prediction failed.

I am not sure why yet, but here is my experience so far. On my first run through, the prediction failed for the first image of George VI too. I got the correct result when I used a larger image size for training and validation.

train_generator = train_datagen.flow_from_directory(
        training_dir,
        target_size=(300, 300),

(above) The original code uses an image size of 150 x 150 so I doubled it in each line of the program where that size is used. I may need to use a larger size than 300 x 300.

The colours of my coin images are somewhat uniform, while Rao’s example uses Coke’s red and white logo versus Pepsi’s logo with blue in it. Does color play a more significant role in image classification using Keras than I thought? I will look at what is happening during model training to see if I can address this issue.

Data Augmentation

I have a small number of coin images yet effective training of an image recognition model requires numerous different images. Rao uses the technique of data augmentation to manipulate a small set of images into a larger set of images that can be used for training by distorting them. This can be particularly useful when training a model to recognize images taken by cameras from different angles as would happen in outdoor photography. A portion of Rao’s code is below. Given the coin images I am using are photographed from above, I have reduced the level of distortion (shear, zoom, width and height shift.)

#From:
# Keras to Kubernetes: The Journey of a Machine Learning Model to Production
# Dattaraj Jagdish Rao
# Pages 152-153

 from keras.preprocessing.image import ImageDataGenerator
 import matplotlib.pyplot as plt
 %matplotlib inline
 training_dir = "/content/drive/My Drive/coin-image-processor/portraits/train"
 validation_dir = "/content/drive/My Drive/coin-image-processor/portraits/validation"
 gen_batch_size = 1
 This is meant to train the model for images taken at different angles.  I am going to assume pictures of coins are from directly above, so there is little variation
 train_datagen = ImageDataGenerator(
     rescale=1./255,
     shear_range=0.05,
     zoom_range=0.05,
     fill_mode = "nearest",    
     width_shift_range=0.05,
     height_shift_range=0.05,
     rotation_range=20,
     horizontal_flip=False)
 train_generator = train_datagen.flow_from_directory(
         training_dir,
         target_size=(300, 300),
         batch_size=32,
         class_mode='binary')
 class_names = ['elizabeth_young','george_vi']
 print ("generating images")
 ROW = 10
 plt.figure(figsize=(20,20))
 for i in range(ROW*ROW):
     plt.subplot(ROW,ROW,i+1)
     plt.xticks([])
     next_set = train_generator.next()
     plt.imshow(next_set[0][0])
     plt.xticks([])
     plt.yticks([])
     plt.grid(False)
     plt.xlabel(class_names[int(next_set[1][0])])
 plt.show()
Sample of images produced from data augmentation.

My next steps to improve the results I am getting are looking at what is happening as the models are trained and training the models longer using larger image sizes.

References

Rao, Dattaraj. Keras to Kubernetes: The Journey of a Machine Learning Model to Production. 2019.

[1 – footnote] This test image is from a Google search. The original image is from: https://www.cdncoin.com/1937-1964-60-Coin-Set-in-Case-p/20160428003.htm

Image inpainting – first results.

First results of image inpainting using Mathias Gruber’s PConv-Keras: https://github.com/MathiasGruber/PConv-Keras (I took a short cut on training the model for this.)

I have a few dozen pictures of Elizabeth II from Canadian 1 cent pieces. I want to see if I can train a model that can in-paint a partial image. Haiyan Wang, Zhongshi He, Dingding Chen, Yongwen Huang, and Yiman He have written an excellent study of this technique in their article “Virtual Inpainting for Dazu Rock Carvings Based on a Sample Dataset” in the Journal on Computing and Cultural Heritage. [1]

I am using Mathias Gruber’s PConv-Keras repository https://github.com/MathiasGruber/PConv-Keras in GitHub to do image inpainting. He has impressive results and as a caveat for my results, I am not yet training the model used for inpainting nearly as long as Gruber does. I am using Google Colab and it is not meant for long running processes so I am using a small number of steps and epochs to train the model. Even with this constraint I am seeing potential results.

The steps used to setup Mathias Gruber’s PConv-Keras in Google Colab are here. Thanks to Eduardo Rosas for these instructions so I could get this set up.

Using Gruber’s PConv-Keras I have been able to train a model to perform image inpainting. My next steps are to refine the model, train it more deeply and look for improved results. The code and results I am working on are on my Google Drive at this time. The images I am using are here.

This week I improved the program to process coin images as well. I see improved results by having a higher tolerance of impurities in the background of the picture when finding whitespace. (I use white_mean = 250, not 254 or 255) This version is in GitHub.

1 Wang, Haiyan, Zhongshi He, Dingding Chen, Yongwen Huang, and Yiman He. 2019. “Virtual Inpainting for Dazu Rock Carvings Based on a Sample Dataset.” Journal on Computing and Cultural Heritage 12 (3): 1-17.

Automatically cropping images

As mentioned in previous posts, I need numerous images to train an image recognition model. My goal is to have many examples of the image of Elizabeth II like the one below. To be efficient, I want to process many photographs of 1 cent coins using a program and so that program must be able to reliably find the centre of the portrait.

To crop the image I used two methods: 1. remove whitespace from the outside inward and 2. find the edge of the coin using OpenCV’s cv2.HoughCircles Function.

Removing whitespace from the outside inward is the simpler of the two methods. To do this I assume the edges of the image are white, color 255 in a grayscale image. If the mean value of pixel colors is 255 for a whole column of pixels, that whole column can be considered whitespace. If the mean value of pixel colors is lower than 255 I assume the column contains part of the darker coin. Cropping the image from the x value of this column will crop the whitespace from the left edge of the image.

for img_col_left in range(1,round(gray_img.shape[1]/2)):
    if np.mean(gray_img,axis = 0)[img_col_left] < 254:
        break 

The for loop of this code starts from the first pixel and moves toward the center. If the mean of the column is less than 254 the loop stops since the edge of the coin is found. I am using 254 instead of 255 to allow for some specks of dust or other imperfections in the white background. Using a for loop is not efficient and this code should be improved, but I want to get this working first.

Before the background is cropped, the image is converted to black and white and then grayscale in order to simplify the edges. Here is the procedure at this point.

import numpy as np
import cv2
import time
from google.colab.patches import cv2_imshow

def img_remove_whitespace(imgo):
    print("start " + str(time.time()))
    #convert to black and white - make it simpler?
    # define a threshold, 128 is the middle of black and white in grey scale
    thresh = 128

    # assign blue channel to zeros
    img_binary = cv2.threshold(imgo, thresh, 255, cv2.THRESH_BINARY)[1]
    #cv2_imshow(img_binary) 
    
    gray_img = cv2.cvtColor(img_binary, cv2.COLOR_BGR2GRAY)
    #cv2_imshow(gray_img) 
    print(gray_img.shape)

    # Thanks https://likegeeks.com/python-image-processing/
    # croppedImage = img[startRow:endRow, startCol:endCol]
      
    # allow for 254 (slightly less than every pixel totally white to allow some specks)
    #count in from the right edge until the mean of each column is less than 255
    for img_row_top in range(0,round(gray_img.shape[0]/2)):    
        if np.mean(gray_img,axis = 1)[img_row_top] < 254:
            break 
    print(img_row_top)
    for img_row_bottom in range(gray_img.shape[0]-1,round(gray_img.shape[0]/2),-1):
        if np.mean(gray_img,axis = 1)[img_row_bottom] < 254:
          break 
    print(img_row_bottom)    
    for img_col_left in range(1,round(gray_img.shape[1]/2)):
        if np.mean(gray_img,axis = 0)[img_col_left] < 254:
            break 
    print(img_col_left)    
    for img_col_right in range(gray_img.shape[1]-1,round(gray_img.shape[1]/2),-1):
        if np.mean(gray_img,axis = 0)[img_col_right] < 254:
            break
    print(img_col_right)
          
    imgo_cropped = imgo[img_row_top:img_row_bottom,img_col_left:img_col_right,0:3]
    print("Whitespace removal")
    print(imgo_cropped.shape)
    
    # cv2_imshow(imgo_cropped) 
    print("end " + str(time.time()))
    return(imgo_cropped)

A problem with this method is that some images have shadows which interfere with the procedure from seeing the true edge of the coin. (See below.)

Image with shadow. Whitespace detection at the edges won’t work.

For cases like the image above, I tried to use OpenCV’s Hough Circles to find the boundary of the coin. Thanks to Adrian Rosebrock’s tutorial “Detecting Circles in Images using OpenCV and Hough Circles” I was able to apply this here.

In my case, cv2.HoughCircles found too many circles. For example, it found 95 of them in one image. Almost all of these circles are not the edge of the coin. I used several methods to try to find the circle that represented the edge of the coin. I sorted the circles by radius, reasoning the largest circle was the edge. It was not always. I looked for large circles that were completely inside the image but also got erroneous results. (See below.) Perhaps I am using this incorrectly, but I have decided this method is not reliable enough to be worthwhile so I am going to stop using it. The code to use the Hough circles is below. Warning, there is likely a problem with it.

print("Since finding whitespace did not work, we will find circles. This will take more time")      
circles = cv2.HoughCircles(gray_img, cv2.HOUGH_GRADIENT, 1.2, 100)

# ensure at least some circles were found
if circles is not None:
    print("Circles")
    print(circles.shape)

    # convert the (x, y) coordinates and radius of the circles to integers
    circles = np.round(circles[0, :]).astype("int")
    circles2=sorted(circles,key=takeThird,reverse=True)
    print("There are " + str(len(circles2)) +" circles found in this image")
    for cir in range(0,len(circles2)):    
        x = circles2[cir][0]
        y = circles2[cir][1]
        r = circles2[cir][2]
        print()
        if r < good_coin_radius*1.1 and r > good_coin_radius*0.9:
            if (x > (good_coin_radius*0.9) and x < (output.shape[0]-(good_coin_radius*0.9))):
                if (y > (good_coin_radius*0.9) and y < (output.shape[1]-(good_coin_radius*0.9))):
                    print("I believe this is the right circle.")  
                    print(circles2[cir])
                    cv2.circle(output, (x, y), r, (0, 255, 0), 4)        
                    cv2.rectangle(output, (x - 5, y - 5), (x + 5, y + 5), (0, 128, 255), -1)  
                    cv2_imshow(output)
                    output = output[x-r:x+r,y-r:y+r]
                    width_half = round(output.shape[0]/2)
                    height_half = round(output.shape[1]/2)
                    cv2.circle(output,(width_half, height_half), round(r*1.414).astype("int"), (255,255,255), round(r*1.4).astype("int"))
                    output = img_remove_whitespace(output)
                    cv2_imshow(output)
                    return(output)
False positive Hough circle representing the edge of the coin.

My conclusion is that I am only going to use coins from the left half of each picture I take since the photo flash works better there and there are fewer shadows. I will take care to remove debris around the coins that interferes with finding whitespace. Failing that, the routine removes photos that it can’t crop to the expected size of the coin. This results in a loss of some photos, but this is acceptable here since I need don’t need every photo to train the image recognition model. Below is a rejected image. The cleaned up code I am using right now is here.

Creating a set of images for image recognition.

iPhone camera gantry to take photos in focus at 3x magnification.

I would like to train an image recognition model with my own images to see how well it works. Here I want to use the obverse of coins to make a model to recognize the portraits of Elizabeth II (younger), Elizabeth II (more mature), George VI and Abraham Lincoln.

Initially I used 5 cent coins but I found they reflected too much light to take a good photograph so I switched to 1 cent coins. I also started with a camera on a Microsoft Surface Pro computer, taking pictures of 9 coins at a time in order to try to be efficient, but I did not get the higher image quality I believe I need.

Microsoft Surface Pro camera taking pictures using a Python program in Google Colab.
Photograph taken using the Surface Pro camera.
Photo taken with iPhone: 3x magnification, square layout, flash on white paper background.

The next step is to remove the background using OpenCV in Python, crop the image to have just the coin. I don’t want to have the image recognition model recognize the portrait because her name is printed on it, so I will crop it again to have only the portrait.

I believe this type of image processing can be applied to historical artifacts photographed using a neutral background. I am concerned the coins are too worn and have too little variation in colour to make a good model but that in itself will be useful to learn if it’s the case.

My thanks to the Saskatoon Coin Club for their excellent page describing the obverse designs of Canadian one cent coins.