ExpenseSnap

Started on September 24, 2024

Status: In-progress

Take a snap of your grocery receipt and let us do magic

Idea

Let user snap a receipt and app will do the rest of expense tracking and data analytics for user including spending habits.

What we need?

  • ML model that can read a receipt and breakdown contents of it.
  • client app (iOS)
  • Data analytics backend (CoreData + SQLite) + Swift

Why

Why are we doing this? I would like to classify current apps into 2 categories

  • Apps that provide cash rewards after you scan your shopping receipts, which means they make money by selling your shopping data - Big Red Flag đźš©
  • Apps that connect to your wallet and help budgeting.

We want to address 3rd category.

  • App that focuses on your spending habits whether online or local, digital or physical.
  • Additional focus of tax reporting , tag expenses that qualify tax deduction for small business owners.

Your data stays encrypted, your privacy is paramount to us. You control your data, data controls are with in settings page on the App, go check it out.

Tasks

Kanaban Board

Journal

10.19.2024

Hybrid Approach

I want to take some time to propose the following approach for our implementation. I do not want to perform all the data analytics in the cloud, which becomes a bigger challenge as I continue building the cloud solution. In a real-world scenario, users may upload one receipt per day, or sometimes they may upload a few receipts in quick succession. For instance, if they shop at multiple stores in an outlet mall, they expect us to quickly analyze how they spent their money.

They don't want insights on a T+1 basis; they want it instantly. Therefore, a real-time cloud solution with critical delivery SLAs may feel like overkill. It’s like sending SEAL Team Six to catch a pickpocket—possible, but expensive and time-consuming.

Since we are just starting out and don’t have many paying users, implementing such an elaborate cloud solution not only takes time but also goes against our principle of "move fast or die."

On-Device and Cloud Solution

Let’s imagine analyzing the data ingested by the user on-device, which means building the entire cloud ETL solution and aggregations on-device.

Pros

  • A star schema on-device ensures that data analytics are consistent across both the cloud and the device. Idempotency means that the same input will result in the same output, regardless of where the processing takes place.
  • Even if the user has 10K entries, this solution is scalable without much memory footprint on the device. Additionally, we can encourage users to opt for a premium plan that offers cloud backup features.

Cons

  • We can't reuse the same code, as the cloud solution will likely use Spark or Pandas, while on-device development would be constrained to Swift.
  • The cloud solution will leverage built-in frameworks and packages for efficient ETL processes, whereas Swift on-device limits us to primitive data structures like arrays and dictionaries.
  • A custom solution will be needed to seamlessly sync results between on-device and cloud.

Design Schema

If you reference Kimball’s data warehousing techniques, you’ll encounter the following store sales analysis schema. There’s nothing surprising here—same concept, shamelessly copied! Thank you, Kimball. I never thought I’d use your example straight from the textbook for a project, but here we are.

Fact Tables:

  • fact_receipt_items
  • fact_receipts

Dimension Tables:

  • dim_dates
  • dim_payment_method
  • dim_product
  • dim_stores
  • dim_users
09.28.2024

Storage

Here's a breakdown of the data flow ensuring proper data handling, security, and user-specific isolation.

1.User Authentication (OAuth)

-Flow: The first step in the process is that the user authenticates usingOAuth. OAuth ensures secure user access without needing to store passwords.

Steps:

  • User logs in using OAuth (e.g., Google, Apple, or custom OAuth provider).
  • After successful authentication, the app receives an access token and a unique user ID for further data access and isolation.

Security:

  • OAuth provides token-based authentication, ensuring that each user's session is authenticated.
  • UseJWT (JSON Web Tokens) orOAuth Access Tokens to validate every request made by the user within the app.

Outcome: User is authenticated and linked to a unique user ID, and from here on, all data related to the user is associated with this ID.

2.Image Capture and Upload

-Flow: Once authenticated, the user captures an image of a receipt, which is stored locally or on the cloud, depending on the architecture.

Steps:

  • The user snaps a photo using the iOS camera.
  • The image is stored locally on the device, likely in a temporary storage folder or memory cache.
  • If needed, the image is uploaded to a cloud service (e.g., AWS S3) for further processing.

Security:

  • The image is stored securely with encryption on the device and in transit (SSL/TLS).
  • When uploaded to cloud storage (e.g., S3), the image is encrypted at rest usingAES-256 encryption.
  • Each image is tagged with the user ID from OAuth to ensure data isolation.

Outcome: Image is uploaded and securely stored with an association to the user’s unique ID.

3.Data Extraction Using OpenAI

-Flow: The app sends the image to an AI service (like OpenAI) to parse the text and extract receipt information (store name, items, prices, total).

Steps:

  • Once the image is uploaded, the app calls a backend API that interacts with the OpenAI model (or another text parsing model).
  • The OpenAI model processes the image and returns structured data such as store name, list of items, prices, and total amounts.

Security:

  • The interaction with the AI service should be encrypted with SSL/TLS.
  • If using OpenAI or a similar service, ensure API requests contain the user’s authentication token to avoid unauthorized access.

Outcome: The receipt data is parsed into structured data (e.g., JSON format) and returned to the app.

4.User Confirmation

-Flow: The structured data (receipt details) is displayed to the user for review and confirmation.

Steps:

  • The app presents the parsed data (e.g., store name, items, prices, and totals) in a readable format.
  • The user can review the extracted data and confirm or edit any inaccuracies before saving.

Security:

  • Since this data belongs to the authenticated user, ensure the data is displayed only for their session.

Outcome: Once the user confirms the data, it is considered ready to be stored.

5.Storing Structured Data in a Database

-Flow: After user confirmation, the structured receipt data is stored securely in a database.

Steps:

  • The app sends the confirmed structured data to the backend.
  • The backend stores this data in a relational (e.g., PostgreSQL) or NoSQL (e.g., MongoDB) database.
  • Each entry is tagged with the user’s unique ID from OAuth to ensure data isolation.

Security: -Encryption at Rest: All sensitive data is encrypted in the database usingAES-256 or similar encryption protocols. -Encryption in Transit: The app uses SSL/TLS** to enc**rypt data sent to the backend. -Data Isolation: Ensure that the user ID is included in every query to prevent one user from accessing another user’s data. -Access Control: Implement strict database access controls to ensure only authorized services and users can read/write to the database.

Outcome: The structured data is now stored securely in the database, linked to the authenticated user.

6.Data Analytics and Enrichment

-Flow: After data is stored, the app performs analytics (e.g., spend analysis) on the user's receipt data and enriches it with additional information.

Steps:

  • The backend processes the stored data to generate analytics such as spending patterns, total spending per month, or category-based expenses.
  • The enriched data (e.g., analytics, categorized expenses) is stored back into the database for quick retrieval.

Security:

  • Ensure that all analytic queries are user-specific. No aggregation should include data from other users unless anonymized.
  • Store any analytics data with the same encryption protocols.

Outcome: The enriched and processed data is securely stored and ready for visualization.

7.User Access to Analytics and CPA Integration

-Flow: The user can now view their spending charts and export or push their expenses to a CPA or tax service.

Steps:

  • The app queries the enriched data and presents analytics through charts or summaries (e.g., total monthly spend, top spending categories).
  • The user can also export this data in formats likeCSV orPDF or integrate with third-party tax services (e.g., push data to a CPA).

Security:

  • If exporting data, ensure it is generated and transmitted securely (e.g., export data is encrypted, only the user can access their export files).
  • If integrating with third-party services (e.g., CPA services), use secure APIs and OAuth to ensure secure data sharing.

Outcome: Users can view their spending data, gain insights, and export it securely for tax purposes.

8.Data Retention, Security, and Encryption Protocols

-Data Retention: Define a clear data retention policy, such as keeping data for a certain number of years (e.g., for tax purposes) and automatically deleting older data. Ensure users can delete their data if they choose to.

-Encryption: -At Rest: All user data (receipts, analytics, and exports) should be encrypted using AES-256 or equivalent encryption techniques. -In Transit: All communications between the app, backend, and external services should use SSL/TLS.

-User Isolation: Ensure that every database query, data storage, and analytics computation includes the user’s unique ID to prevent cross-user data access. No user should be able to see another user's data.

-Access Control: Implement strong access control mechanisms on both the app and the backend:

  • Use token-based access (OAuth tokens, JWTs) to ensure only authenticated users can interact with the system.
  • Ensure backend APIs validate every request with the user’s authentication token.

Visual Data Flow:

1.OAuth Authentication:

  • User authenticates using OAuth.
  • Receives access token and user ID.

2.Image Upload:

  • User snaps a photo and uploads it.
  • The image is stored securely (locally or cloud).

3.Data Extraction:

  • Image is sent to OpenAI or an AI service.
  • Structured data is returned (store name, items, prices).

4.User Confirmation:

  • User reviews and confirms the extracted data.

5.Data Storage:

  • Confirmed structured data is stored securely with the user’s ID in a database.

6.Analytics and Enrichment:

  • Data is analyzed, enriched (e.g., categorized expenses), and stored for later use.

7.Access and Export:

  • User views analytics (e.g., charts).
  • Optionally exports or pushes data to CPA or third-party tax services.
09.27.2024

Research

Checking the AWS Lambda and try getting JSON responses of parsing receipts.

Built a jenkins pipeline along with new Lambda for processing input data and spit out JSON response with details from the receipt OCR extraction

  • On device OCR extracts relevant contents, which are then send to OpenAI for enrichment, which is then used as the raw data ingestion layer.

  • A custom ETL implementation on top of this that will fix missing values, normalize data and fix anomalies then load our star Schema is done

09.25.2024

Defining high level requirements

Based on our previous experience with plant identifier that we were not able build good ML model, thus entire app ran into a bit of a problem.

So we will have to first nail the ML portion of the app before we continue further.

  • Build ML model that reads Grocery Receipts, test.
  • Do Data analytics on it , Python first, later at application level, it will be swift.

ML Model

1. OCR (Optical Character Recognition) for Text Extraction

The first step in processing a grocery receipt is extracting the text from the image. The most commonly used models for OCR are:

  • Tesseract OCR: An open-source tool by Google that can recognize text from images. It’s a good starting point but may require some fine-tuning and pre-processing of the images for complex receipts.
  • Google Cloud Vision API: Offers advanced OCR capabilities and is highly accurate for structured documents like receipts. This is a paid service but can handle a wide variety of formats and fonts.
  • AWS Textract: Amazon’s OCR solution, which not only extracts text but also identifies the layout, forms, and tables in the image.
  • Microsoft Azure Computer Vision: Another highly reliable OCR service for extracting text and structured information from receipts.

These models are designed to read the text from the receipt, including product names, prices, and quantities.

2. Post-OCR Processing with NLP Models

After extracting the raw text, you’ll need to clean and structure it into a table. NLP models can help identify patterns and structure the data into meaningful categories like Item Name, Quantity, Price, etc. You’ll likely use a combination of:

  • Named Entity Recognition (NER): NER models can help detect specific entities in the receipt, such as product names, prices, dates, and store names. You can fine-tune pre-trained models like BERT or SpaCy for this purpose.
  • Regex (Regular Expressions): For simple rule-based extraction of fields like prices (which are often preceded by $ symbols or decimal points), regex can be helpful.
  • Table Extraction Algorithms: Some ML models or libraries like Camelot and Tabula can help extract table-like structures from images, though they are primarily designed for PDFs.

OCR

I've decided to go with openAI model, doing an API call.