Plant Identifier App for iOS using finetuned ResNet50 model

Started on August 31, 2024

Status: On-hold

Developing an AI-powered plant identifier to help identify plants and provide care instructions.

This page is a journal of my learning and development of the app.

I used Claude AI every step of the way to help me undestand topics, code and chatGPT for prototyping and logo generation.

Why?

Plant identifier is one of the top 100 grossing apps in the apple app store. Now I want to understand how it works and through this exercise we will try to build something similar.

Photo by Linh Le on Unsplash

What we need

Data
Model
iOS App
API

PlantNet is a plant identification dataset containing 300K images. As per the license we can use the data for commercial and non-commercial purposes. Now we have data, we need to find a model which can identify the plant in the image and provide the user with care instructions.

Never has been better time to build with AI

Let's answer some questions

Are there existing pre-trained models?

Yes, we can use pretrained CNN models like ResNet, VGG, or Inception
Can we include this model directly on our iOS app ?

Initially I thought this is something viable based on my earlier knowledge of loading a model directly on to app - check out my repo here, where I used MobileNet model to classify images. The model was basic and thus it is not a big size to include part of your app. This seems still viable the model size may be big, around 100MB, but this is what we need, in order for our users to have instant results and offline identification.

Status: On Hold

This project is currently on hold due to unavailablity of good plant dataset. I'm not in favor of doing an API call to some external service, which would add lot of latency. We will resume again in few months and work on web scraping to acquire the data and re-train our models.

Journal

09.22.2024

Client App (iOS) prototype

Download Plant Identifier Prototype.pdf

09.13.2024

Client App (iOS) Design

Now we have the model that is ready from pre-trained ResNet50, now we have to create a client App that users can upload photos and request additional help and care instructions.

First steps is prototyping how the app should look.

For design inspiration, check out below pages.

We can prototype on Figma.com, you can use any of the free design kits that are available.

This template seems to have most of the screens that we can re-use.

On boarding views
Log-in views
paywall

Need some more research on the product side to see what we need to include. Best place to start is App Store comments section of the existing apps.

Logo Created

09.06.2024

Fine Tuning on AWS Sagemaker

Upload all of the Plant dataset to S3, ran a script to load all the 300K files into S3 at the night and I did not time it, but probably few hours to upload 30GB worth of files.

Now we can't load all 30GB files into Jupyter notebook on Sagemaker, we can stream the files in batches.

from torch.utils.data import DataLoader
from torchvision import transforms

# Create the transform for images
transform = transforms.Compose([
    transforms.Resize((224, 224)),  
    transforms.ToTensor()
])

# Create the S3 dataset for train and val
train_dataset = S3ImageDataset(s3_bucket='plantnet-300k-data',
     s3_prefix='images/train', transform=transform)
val_dataset = S3ImageDataset(s3_bucket='plantnet-300k-data', 
    s3_prefix='images/val', transform=transform)

# Load the data in batches
train_loader = DataLoader(train_dataset, batch_size=32, 
shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=32, 
shuffle=False, num_workers=4)

Interesting issue I found with training on AWS Sagemaker, where the script would fail after few epochs, citing missing S3 file.

I was using ml.m5.large on AWS Sagemaker.

While trying to troubleshoot this issue, I've tried Claude AI and ChatGPT, both suggested I need to implement below changes, when Epoch 1 completed, but failed at 64% on Epoch2

From Claude AI:

This helps clarify why the training process stopped during epoch 2. Let's analyze what happened:

Progress before error: The training was about 64% complete through epoch 2 (970/1525 batches) when the error occurred. This took approximately 1 hour and 37 minutes.
Error type: The main error is a ClientError from the botocore library, which is part of the AWS SDK for Python (Boto3).
Error details:
- The error occurred when trying to fetch data for the DataLoader.
- Specifically, it happened during a GetObject operation on an S3 bucket.
- The error message states: "An error occurred (InternalError) when calling the GetObject operation (reached max retries: 4): We encountered an internal error. Please try again."
Error context: This error suggests that the training data is being loaded from an Amazon S3 bucket. The training process was unable to retrieve an object (likely an image file) from S3 after multiple attempts.
Possible causes: a. Temporary S3 service disruption b. Network connectivity issues c. Permissions problems (though less likely given the "InternalError" message) d. Resource constraints or throttling
System response: After encountering this error, the system attempted to save the current model state before exiting.

This error explains why epoch 2 didn't complete. It's not a failure of the model or the training process itself, but rather an issue with data retrieval from cloud storage.

To address this issue and prevent it in future training runs, you could consider:

Implementing more robust error handling and retry logic in your data loading code.
Caching data locally if possible to reduce dependence on S3 during training.
Checking S3 and network connectivity before starting long training jobs.
Implementing checkpointing to allow resuming training from the last successful point.

09.05.2024

Fine Tuning locally

I'd be using Pytorch to fine tune the ResNet34 model and all of the training will be done locally on my machine.

Before we start, let's do some research on the topic.

Data augmentation - for example ,making our images crop little bit, rotate them to incorporate all real world use cases for our model to understand better.

Stumbled on Fast AI framework, there are few features that help us out of the box.

Fast AI has inbuilt functionality that gives you loss function , confusion matrix.
Give us samples of where the model is getting confused.

If you are using Mac, it seems you will not have access to GPU directly. Data Augmentation may need GPU access, so we have to disable it for your local run.

It's a dead end to train on my Mac, so switching back to Loading all the data to S3 and then train on AWS Sagemaker.

I've loaded all 300K files to AWS S3, but the Sagemaker script got killed after few epochs, which made it difficult to train.

So falling back on to training locally.

I'm using pre-trained model ResNet50, After first Epoch , below are the results. This took almost 50 mins to train on my Mac locally.

We are going to do 30 Epochs, hopefully model converges much faster and if we see that, the training script will exit.

Epoch 1

INFO:__main__:Epoch 1 completed
INFO:__main__:loss_train : 2.4369251771526095
INFO:__main__:loss_val : 1.5950532106632938
INFO:__main__:acc_train : 0.4754874628970629 / topk_acc_train : {1: 0.4754874628970629, 5: 0.7189032289804687}
INFO:__main__:acc_val : 0.6255222057972878 / topk_acc_val : {1: 0.6255222057972878, 5: 0.8552927566038948} / avgk_acc_val : {1: 0.6394048460697989, 5: 0.8794909698566746}

Epoch 18

Last time model is saved.

INFO:__main__:Epoch 18 completed
INFO:__main__:loss_train : 0.591255208659147
INFO:__main__:loss_val : 0.8582512536248693
INFO:__main__:acc_train : 0.8244108627560308 / topk_acc_train : {1: 0.8244108627560308, 5: 0.9717976680496564}
INFO:__main__:acc_val : 0.7899929301368983 / topk_acc_val : {1: 0.7899929301368983, 5: 0.9536923966835915} / avgk_acc_val : {1: 0.7965164856353236, 5: 0.9700173533003407}

Epoch 20

After Epoch 20 the model has not been getting saved, which means it is not getting improved significantly, thus I've killed the script.

INFO:__main__:Epoch 20 completed
INFO:__main__:loss_train : 0.5567932317672534
INFO:__main__:loss_val : 0.8666584216157698
INFO:__main__:acc_train : 0.8321430328473737 / topk_acc_train : {1: 0.8321430328473737, 5: 0.9750856852359009}
INFO:__main__:acc_val : 0.7886110932579214 / topk_acc_val : {1: 0.7886110932579214, 5: 0.9534353107526191} / avgk_acc_val : {1: 0.7954881419114339, 5: 0.9706279323864001}

Finally we are able to convert a .pt file to a CoreML model file, which can be embedded into iOS app.

ahhh ! I'm so frustrated. I just tested the model , it does not recognize simple things like basil leaves, because it's not in our training dataset.

So we spent 30 hrs training and out of 300K images do not include basic plants/leaves we use in American cuisine.

09.04.2024

Pytorch

Goal of this exercise is for us to learn enough about pytorch to use existing models or finetune them on our dataset.

Resources: pytorch tutorial

In layman terms, for example,if we have 2 features age and income, we can predict whether a person can buy a car or not. This is called decision tree, there are weights, in this case age greather than 30 and income greater than 50k.

If age > 30:
    If income > 50k:
        Predict "Buy"
    Else:
        Predict "Don't Buy"
Else:
    Predict "Don't Buy"

For Image Data:

Convolutional Neural Networks (CNNs): These models apply filters (using weights) to extract patterns like edges, textures, and shapes from images.
Generative Adversarial Networks (GANs): These are used for generating images (like in MidJourney) and use weights to optimize a generator and discriminator model in tandem. Weights? Yes, CNNs and GANs are weight-based models. These models are highly successful in handling the complex spatial relationships in images.

We can use ResNet18 to classify images. We need to fintune this model on our dataset.

09.01.2024

Research

Today I'll be reading about the PlantNet dataset and how to train a model on AWS.

As part of my learning on ML I'm reading up kaggle ML course understanding basic concepts

How to use a model to predict?

This seems pretty simple.

You read the dataset.
Determine the features , in this example ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
use sklearn to predict the price.

Now what I understood is sklearn is ideal for sturctured data, such as tabular data.

For our use case, we have a dataset of images. So we need a deep learning libraries like pytorch or tensorflow. Pytorch is somewhat beginner friendly, so let us start with that and understand the basics.