This project will focus on Automatic Character Recognition using Principal Component Analysis (PCA) is a simple method of identifying the principal components of a dataset, then aligning characteristics of test sequences to one of many classes. The results of this project are important! If you do well on this project, you will have about 20% of the Final Project complete as well. The goal is to introduce students a very basic form of machine learning using a familiar tool in MATLAB. Students may use Python if they are comfortable with it, though it is not required.
DATASET: CHARS74K DATASET
This is a database of handwritten characters. It includes both English Characters, and Kannada Characters. This project will focus on English Characters only.
- 36 Classes (0-9, A-Z)
- 55 Samples of Each
- Consists strictly of PNG images
- Original Paper:
- T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), Lisbon, Portugal, February 2009.
PROJECT STEPS: THESE ARE THE HIGH-LEVEL STEPS FOR THIS PROJECT
The intent for these steps is to provide students with a High-Level roadmap for successfully completing this project. As a general guideline, completing one part each week is a pretty good pace for completion on time.
- Part 1: Convert the data to a usable form for performing PCA.
- Part 2: Isolate training data from testing data. Generate the Principal Components.
- Part 3: Project the testing data onto the Principal Components to determine the most similar class
- Part 4: Collect Statistics and show results.
PROJECT 1 DELIVERABLES
- Present all files in a SINGLE zip file.
- Should include all MATLAB code used to execute the project.
- Final Presentation
- USE THE TEMPLATE GENERATED AS PART OF HW 1!
- Required Sections:
- Problem Introduction
- Technical Solution Approach
- Implementation and Results
- Conclusion