Type of Document Master's Thesis Author Tang, Qing Author's Email Address firstname.lastname@example.org URN etd-07052006-234928 Title Two-Dimensional Penalized Signal Regression for Hand Written Digit Recognition Degree Master of Applied Statistics (M.Ap.Stat.) Department Experimental Statistics Advisory Committee
Advisor Name Title Brian D. Marx Committee Chair James P. Geaghan Committee Member Kevin S. McCarter Committee Member Keywords
- handwritten digit recognition
- USPS zip code
- P-spline signal regression (PSR)
Date of Defense 2006-05-08 Availability unrestricted AbstractMany attempts have been made to achieve successful recognition of handwritten digits. We report our results of using statistical method on handwritten digit recognition. A digitized handwritten numeral can be represented by an image with grayscales. The image includes features that are mapped into two-dimensional space with row and column coordinates. Based on this structure, two-dimensional penalized signal logistic regression (PSR) is applied to the recognition of handwritten digits.
The data set is taken from the USPS zip code database that contains 7219 training images and 2007 test images. All the images have been deslanted and normalized into 16 x 16 pixels with various grayscales. The PSR method constructs a coefficient surface using a rich two-dimensional tensor product B-splines basis, so that the surface is more flexible than needed. We then penalize roughness of the coefficient surface with difference penalties on each coefficient associate with the rows and columns of the tensor product B-splines. The optimal penalty weight is found in several minutes of iterative operations. A competitive overall recognition error rate of 8.97% on the test data set was achieved.
We will also review an artificial neural network approach for comparison. By using PSR, it requires neither long learning time nor large memory resources. Another advantage of the PSR method is that our results are obtained on the original USPS data set without any further image preprocessing. We also found that PSR algorithm was very capable to cope with high diversity and variation that were two major features of handwritten digits.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Tang_thesis.pdf 800.17 Kb 00:03:42 00:01:54 00:01:40 00:00:50 00:00:04