Butterfly Classifier
Using Computer Vision to classify Butterflies into their respective species
Description
Butterflies are insects in the macrolepidopteran clade Rhopalocera from the order Lepidoptera, which also includes moths. Adult butterflies have large, often brightly coloured wings, and conspicuous, fluttering flight
There are over 12000 species of butteflies across the world .In this notebook we are going to buld a computer vision software that can classify the butterflies from around 52 common species
#Importing the required libraries
!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
from fastai.vision.widgets import *
We are going todownload the images from bing image search using the Api key above and store them in a directory on google drive
species=('Monarch butterfly','Painted lady butterfly','Cabbage white butterfly','Red admiral butterfly','Large white butterfly','Old world swallowtail butterfly','Speckled wood butterfly','Eastern tiger swallowtail butterfly','Green-veined white butterfly','Orange-tip butterfly','Mourning cloak butterfly','Common blue butterfly','Glanville fritillary butterfly','Peacock butterfly','Gulf fritillary butterfly','Small copper buttefly','Clouded yellow butterfly','Meadow brown butterfly','Pipevine swallowtail butterfly','Common buckeye butterfly','Small tortoiseshell butterfly','Cloudless sulphur butterfly','Common brimstone butterfly','Large blue butterfly','Holly blue butterfly','Gatekeeper butterfly','Marsh fritillary butterfly','Chequered skipper butterfly','Ringlet butterfly','Silver-studded blue butterfly','Great spangled fritillary butterfly','Postman butterfly','Postman butterfly','Grizzled skipper butterfly','Green hairstreak butterfly','Comma butterfly','Wood white butterfly','Black swallowtail butterfly','Large skipper butterfly','Large heath butterfly','Zebra Longwing Butterfly','Common wood-nymph butterfly','Clouded sulphur butterfly','American lady butterfly','Small blue butterfly','Adonis blue butterfly','Duke of burgundy butterfly','Small skipper butterfly ','Edith s checkerspot butterfly','Grayling butterfly','Brown argus butterfly','Brown hairstreak butterfly')
len(species)
We Will collect pictures of 52 different species of butterflies
path=Path('species')
if not path.exists():
path.mkdir()
for o in species:
dest = (path/o)
dest.mkdir(exist_ok=True)
results = search_images_bing(key, f'{o}')
download_images(dest, urls=results.attrgot('content_url'))
butterflies=get_image_files(path)
butterflies
From the above we can see we have 7575 pictures of presumably 52 different species of buutterflies
From each species we have presumably 150 pictures
But this might not be the case since some pictures can get corrupted during the download .... in the next section we will check for those and remove them
failed=verify_images(butterflies)
failed
From the above we can see that 67 images were corrupted
So I will delet them in the next code cell
failed.map(Path.unlink);
Done
I will then create an Image Dataloader in the next code cell
species_blk=DataBlock(
blocks=(ImageBlock,CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(seed=42,valid_pct=0.2),
get_y=parent_label,
item_tfms=RandomResizedCrop(128,min_scale=0.3),
batch_tfms=aug_transforms())
dls=species_blk.dataloaders(path)
Lets take a look on some of the items in our datablock
dls.valid.show_batch(max_n=24, nrows=3)
NOW that we have our butterflies with their correct labels in a format that can be taken by a CNN learner
we will train with different predefined architectures and see how our accuracy behaves
learn=cnn_learner(dls,resnet18,metrics=error_rate)
learn.fit_one_cycle(4)
so with the reenet18 we could classify the butterflies correctly 70 percent of time
Meaning out of 100 butterflies the model saw we got 70 of them right and only missed 30 ... Thats good for a start but not impressive
Let me take a look at the butterflies we missclassified .... So that i can see if this low accuracy can be justified
#Confusion Matrix
missed = ClassificationInterpretation.from_learner(learn)
missed.plot_confusion_matrix(figsize=(20,20), dpi=60)
That did not work well for us we have alot of classes lets just see 10 most confused butterflies
missed.plot_top_losses(10, nrows=2)
from the above we can See that in our dataset there are pictures caterpillars.Ofcourse they finally turn into butterflies but for now they are noise in our dataset
so we are going to do some data cleaning then look fit with a different architecture and see how our model will perform
we are also going to fine tune the model now
cleaner = ImageClassifierCleaner(learn)
cleaner
Going through all the 54 species istaking alot of time and my computeris slow but i have learnt that there are alot of caterpillr pictures in the photos and for this model toperform well we should remove all of them
maybe iwill when i get a computer witha faster GPU hehehehehehe
I have removed some in the first five species though
for idx in cleaner.delete(): cleaner.fns[idx].unlink()
# retraining the model
species_blk =species_blk.new(
item_tfms=RandomResizedCrop(224, min_scale=0.5),
batch_tfms=aug_transforms()
)
##We will use resnet50 on this and fine tune
dls = species_blk.dataloaders(path)
learn = cnn_learner(dls, resnet50, metrics=error_rate)
learn.fine_tune(10)
We managed an error rate of 0.182667
This means that our model was able to correctly classify approximately 82 butterflies correctly when given a set of 100 butterfly pictures
let me find the learning rate (suitable) then train for a few epochs and see if it does better
learn.lr_find()
The learning rate between -06 and -04 gave a relatively low loss and was much stable
learn.unfreeze()
learn.lr_find()
learn.fit_one_cycle(4, lr_max=1e-4)
Well the fine tuning and the resnet50 sprinkled some little accuracy on top what we had
learn.export()
path = Path()
path.ls(file_exts='.pkl')
learn_inf = load_learner(path/'export.pkl')
learn_inf.dls.vocab
# creating upload and classify widgets
btn_upload = widgets.FileUpload()
btn_run = widgets.Button(description='Classify')
lbl_pred = widgets.Label()
out_pl = widgets.Output()
# creating the on click listener
def on_click_classify(change):
img = PILImage.create(btn_upload.data[-1])
out_pl.clear_output()
with out_pl:
display(img.to_thumb(128, 128))
pred, pred_idx, probs = learn.predict(img)
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
btn_run.on_click(on_click_classify)
# creating a VBox
VBox([widgets.Label('Upload Your Butterfly'),
btn_upload, btn_run, out_pl, lbl_pred])