Problem Statement

Misdiagnosis of the many diseases impacting agricultural crops can lead to misuse of chemicals leading to the emergence of resistant pathogen strains, increased input costs, and more outbreaks with significant economic loss and environmental impacts. Current disease diagnosis based on human scouting is time-consuming and expensive, and although computer-vision based models have the promise to increase efficiency, the great variance in symptoms due to age of infected tissues, genetic variations, and light conditions within trees decreases the accuracy of detection.

Specific Objectives

Objectives of ‘Plant Pathology Challenge’ are to train a model using images of training dataset to 1) Accurately classify a given image from testing dataset into different diseased category or a healthy leaf; 2) Accurately distinguish between many diseases, sometimes more than one on a single leaf; 3) Deal with rare classes and novel symptoms; 4) Address depth perception—angle, light, shade, physiological age of the leaf; and 5) Incorporate expert knowledge in identification, annotation, quantification, and guiding computer vision to search for relevant features during learning.

Resources If you use the dataset for your project, please cite the preprint https://arxiv.org/abs/2004.11958 Acknowledgments

We acknowledge financial support from Cornell Initiative for Digital Agriculture (CIDA) and special thanks to Zach Guillian for help with data collection.

Kaggle is excited to partner with research groups to push forward the frontier of machine learning. Research competitions make use of Kaggle's platform and experience, but are largely organized by the research group's data science team. Any questions or concerns regarding the competition data, quality, or topic will be addressed by them.

#collapse_hide
!pip install fastai2 graphviz ipywidgets matplotlib nbdev>=0.2.12 pandas scikit_learn azure-cognitiveservices-search-imagesearch sentencepiece

Down grade the fascore library as the one installed above is not compatible with Fastai.Vision Library

#collapse_hide
pip install fastcore==1.0.0

Collecting fastcore==1.0.0
  Downloading https://files.pythonhosted.org/packages/cc/92/233661d730b5613b4daf473cd28005bf2294fb1a858ce0bac57fbb7fa5ec/fastcore-1.0.0-py3-none-any.whl
Requirement already satisfied: wheel in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.0) (0.35.1)
Requirement already satisfied: pip in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.0) (19.3.1)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.0) (1.18.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.0) (20.4)
Requirement already satisfied: dataclasses in /usr/local/lib/python3.6/dist-packages (from fastcore==1.0.0) (0.7)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->fastcore==1.0.0) (2.4.7)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from packaging->fastcore==1.0.0) (1.15.0)
ERROR: nbdev 1.0.14 has requirement fastcore>=1.0.5, but you'll have fastcore 1.0.0 which is incompatible.
Installing collected packages: fastcore
  Found existing installation: fastcore 1.0.9
    Uninstalling fastcore-1.0.9:
      Successfully uninstalled fastcore-1.0.9
Successfully installed fastcore-1.0.0

from fastai2.vision.all import*
from sklearn.metrics import roc_auc_score

data_path = Path("/content/drive/My Drive/Competition Datasets /Plant_Pathology_2020")

df = pd.read_csv(data_path/"train.csv")

df.head()

df.iloc[:, 1:].sum(axis=1).value_counts()

1    1821
dtype: int64

imglabels = list(df.columns[1:])

df["labels"] = df.apply(lambda x: imglabels[x.values[1:].argmax()], axis=1)

dls = ImageDataLoaders.from_df(df,
                               path=data_path, 
                               suff=".jpg", 
                               folder="images",
                               label_col="labels",
                               item_tfms=RandomResizedCrop(512, min_scale=0.5), # note that we use a bigger image size
                               batch_tfms=aug_transforms(),
                               valid_pct=0.05,
                               bs=16,
                               val_bs=16
                               )

dls.show_batch(max_n=16, nrows=2)

def mean_roc_auc(preds, targets, num_cols=4):
    """The competition metric
    
    Quoting: 'Submissions are evaluated on mean column-wise ROC AUC. 
    In other words, the score is the average of the individual AUCs 
    of each predicted column. '
    
    Unfortunately, we cannot use in validation, as it can happen that
    all files in a batch has the same label, and ROC is undefined
    """
    aucs = []
    preds = preds.detach().cpu().numpy()
    targets = targets.detach().cpu().numpy()
    
    for i in range(num_cols):
        # grab a column from the networks output
        cpreds = preds[:, i]
        # see which objects have the i-th label
        ctargets = [x == i for x in targets]
        aucs.append(roc_auc_score(ctargets, cpreds))
    return sum(aucs) / num_cols

learn = cnn_learner(dls, resnet50, metrics=[accuracy], model_dir="/content/drive/My Drive/Competition Datasets /Plant_Pathology_2020")

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /root/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth

learn.lr_find()

SuggestedLRs(lr_min=0.012022644281387329, lr_steep=0.0014454397605732083)

learn.fit(4, lr=1e-3)

learn.unfreeze()

learn.lr_find()

SuggestedLRs(lr_min=2.290867705596611e-05, lr_steep=6.309573450380412e-07)

learn.save("model")

learn.fit_one_cycle(16, lr_max=slice(1e-6,1e-5), cbs=[SaveModelCallback(every='epoch', monitor="accuracy")])

Better model found at epoch 0 with accuracy value: 0.901098906993866.
Better model found at epoch 1 with accuracy value: 0.9340659379959106.
Better model found at epoch 4 with accuracy value: 0.9450549483299255.
Better model found at epoch 6 with accuracy value: 0.9560439586639404.

learn.load("model")

<fastai2.learner.Learner at 0x7f1ab2352860>

learn.load("model")

<fastai2.learner.Learner at 0x7f1ab2352860>

test_image_ids = [img.split(".")[0] for img in os.listdir(data_path/"images") if img.startswith("Test")]
test_images = [data_path/"images"/f"{img}.jpg" for img in test_image_ids]
preds = learn.get_preds(dl=dls.test_dl(test_images, shuffle=False, drop_last=False))

# ensure that the order of columns in preds matches the imglabels
preds = preds[0].cpu().numpy()
vocab = list(dls[0].dataset.vocab)
column_permutation = [vocab.index(l) for l in imglabels]
preds = preds[:, column_permutation]

submission = pd.DataFrame()
submission["image_id"] = test_image_ids
for i in range(len(imglabels)):
    submission[imglabels[i]] = preds[:, i]
submission.to_csv("submission.csv", index=False)

submission.head(10)

submission.to_csv("/content/drive/My Drive/Competition Datasets /Plant_Pathology_2020/submission.csv", index=False)

submission.shape

(1832, 5)

	image_id	healthy	multiple_diseases	rust	scab
0	Train_0	0	0	0	1
1	Train_1	0	1	0	0
2	Train_2	1	0	0	0
3	Train_3	0	0	1	0
4	Train_4	1	0	0	0

epoch	train_loss	valid_loss	accuracy	time
0	0.707231	0.460912	0.857143	03:02
1	0.413365	0.325106	0.879121	01:50
2	0.315370	0.257212	0.923077	01:51
3	0.317782	0.273335	0.912088	01:51

epoch	train_loss	valid_loss	accuracy	time
0	0.288165	0.252785	0.901099	01:57
1	0.246356	0.231000	0.934066	01:58
2	0.224936	0.230817	0.923077	01:57
3	0.194872	0.214574	0.934066	01:57
4	0.184176	0.181867	0.945055	01:57
5	0.158892	0.180428	0.945055	01:58
6	0.171457	0.171559	0.956044	01:58
7	0.147328	0.156418	0.945055	01:59
8	0.149036	0.152680	0.945055	01:57
9	0.128518	0.154319	0.956044	01:58
10	0.139238	0.180397	0.945055	01:58
11	0.118221	0.157399	0.945055	01:58
12	0.104585	0.171786	0.945055	01:57
13	0.166329	0.163900	0.956044	01:58
14	0.136599	0.165283	0.945055	01:58
15	0.120882	0.154736	0.956044	01:58

	image_id	healthy	multiple_diseases	rust	scab
0	Test_981	7.602632e-04	2.068442e-03	6.178252e-06	9.971651e-01
1	Test_98	1.365243e-04	3.923656e-03	2.894043e-07	9.959395e-01
2	Test_986	5.981431e-07	2.534528e-04	2.782608e-09	9.997459e-01
3	Test_976	1.648496e-05	5.323170e-05	9.999267e-01	3.539740e-06
4	Test_999	4.984141e-07	9.775054e-07	9.999986e-01	5.851503e-09
5	Test_996	4.980442e-05	2.681709e-01	7.314788e-01	3.003662e-04
6	Test_99	2.969532e-02	4.484442e-04	1.557501e-03	9.682987e-01
7	Test_980	1.123909e-03	1.314125e-02	1.206828e-06	9.857336e-01
8	Test_997	1.704059e-07	1.934623e-02	9.600331e-01	2.062041e-02
9	Test_995	2.850686e-03	2.235687e-01	7.695004e-01	4.080193e-03