书签分享收藏举报版权申诉 / 7

立即下载加入VIP免费专享

当前位置：首页 > 其他 > 毕业论文（设计）-基于SVM 的手写体阿拉伯数字识别09494.doc

毕业论文（设计）-基于SVM 的手写体阿拉伯数字识别09494.doc

上传人：小小飞

文档编号：3945691

上传时间：2019-10-10

格式：DOC

页数：7

大小：192.52KB

《毕业论文（设计）-基于SVM 的手写体阿拉伯数字识别09494.doc》由会员分享，可在线阅读，更多相关《毕业论文（设计）-基于SVM 的手写体阿拉伯数字识别09494.doc（7页珍藏版）》请在三一文库上搜索。

1、专业好文档文章编号：1009-8119（2005）09-0041-03基于SVM的手写体阿拉伯数字识别张鸽陈书开（长沙理工大学计算机与通讯工程学院,长沙 410076）摘要支持向量机(SVM)是近年来在统计学习理论的基础上发展起来的一种新的模式识别方法,在解决小样本、非线性及高维模式识别问题中表现出许多特有的优势。介绍了在提取穿越次数特征、粗网格特征以及密度特征提取的基础上应用SVM进行手写体阿拉伯数字识别的方法。关键词 SVM，核函数，穿越次数特征，粗网格特征，密度特征Handwriting Numerals Recognition based on SVM Zhang Ge Che

2、n Shukai (Department of Computer and Communication，Changsha Univesity of Science and Technology，Changsha 410076)Abstract Supprot Vector Machine(SVM) is a new pattern recognition method developed in recent years on the goundation of statistical learning theory.It wins populatity due to many attractiv

3、e features and emphatical performance in the fields of nonlinear and high dimensional pattern recognition .The paper introduces a script arabic numerals recognition method applied SVM based on drawing out Traversing-times character and Wide-gridding character.Keywords SVM, Kenerl Function, Traversin

4、g-times character,Wide-gridding character, density character1 引言手写体阿拉伯数字识别是图象处理和模式识别领域中的研究课题之一。字符识别系统一般由图象采集、信号预处理、特征提取、分类识别等几个部分组成。识别系统的识别方式可分为联机手写体字符识别、脱机印刷体字符识别和脱机手写体字符识别等,其中脱机手写体字符由于书写者的因素,使其字符图像的随意性很大,例如,笔画的粗细、字体的大小、手写体的倾斜度、字符笔画的局部扭曲变形、字体灰度的差异等都直接影响到字符的正确识别。所以手写体数字字符的识别是数字字符识别领域内最具挑战性的课题。近年来，

5、支持向量机（ Support Vector Machines,SVM）的研究在广泛开展。支持向量机是V.Vipnik 等根据统计学习理论（Statistical Learning Theory简称 SLT）提出的一种新的机器学习方法，在解决小样本、非线性及高维模式识别问题中表现出许多特有的优势，已经在模式识别、函数逼近和概率密度估计等方面取得了良好的效果1。支持向量机从本质上讲是一种前向神经网络，根据结构风险最小化准则，在使训练样本分类误差极小化的前提下，尽量提高分类器的泛化推广能力。从实施的角度，训练支持向量机的核心思想等价于求解一个线性约束的二次规划问题，从而构造一个超平面作为决策平面，使

6、得特征空间中两类模式之间的距离最大，而且它能保证得到的解为全局最优解。本文即是采用SVM进行09的手写体阿拉伯数字的识别。 2 SVM基本原理2.1 线性可分情况SVM方法是从线性可分情况下的最优分类面(Optimal Hyperplane)提出的。所谓最优分类面就是要求分类线不但能将两类样本无错误的分开,而且要使两类之间的距离最大。设线性可分样本集为(xi, yi), i=1,2,n, xRd, y+1,-1是类别标号。d维空间中线性判别函数的一般形式为:g(x)=wx+b,分类面方程为:wx+b=0 (1)将判别函数进行归一化,使两类所有样本都满足 |g(x)|1,即,使离分类面最近的样本

7、的|g(x)|=1,这样分类间隔就等于2/w,因此间隔最大等价于使w(或w2)最小;而要求分类线对所有样本正确分类,就是要求其满足:yi(wxi)+b10,(i=1,2,n) (2)因此,满足上述条件且使w2最小的分类面就是最优分类面。这两类样本中离分类面最近的点且平行于最优分类面的超平面上的训练样本就是使式(2)中等号成立的那些样本,他们叫做支持向量(Support Vectors)。根据上面的讨论,最优分类面问题可以表示成如下的约束优化问题,即在式(2)的约束下,求函数:(w)= w2= (ww) (3)的最小值。这是一个二次规划问题,可定义以下的拉格朗日函数: L(w,b,a)= (ww

8、)aiyi(wxi)+b-1 (4)其中:ai0为Lagrange系数。求式(3)的极小值就是对w和b求拉氏函数的极小值。求L对w和b的偏微分，并令其等于0,可转化为对偶问题:在约束条件aiyi=0,ai0,i=1,2,n之下对ai求式(5)的最大值:W(a)= ai -aiajyiyj(xi.xj) (5)由KhnTucker定理可知,最优解满足:yi(wx+b)-1=0 i (6)显然,只有支持向量的系数ai不为0,即只有支持向量影响最终的划分结果。于是w可表示为:w=aiyixi (7)即最优分类面的权系数向量是训练样本向量的线性组合。若ai*为最优解,求解上述问题后得到的最优分类函数是

9、:f(x)=sgn(w*x)+b*=sgnai*yi(xix)+b* (8)其中:sgn()为符号函数,b*是分类的阈值,可以由任意一个支持向量用式(7)求得,或通过两类中任意一对支持向量取中值求得。对于给定的未知样本x,只需计算sgn(wx+b),即可判定x所属的分类。2.2 线性不可分情况对于线性不可分的样本,希望使误分类的点数最小,为此在式(2)中引入松弛变量i0,即：yi(wxi)+b-1+i0,(i=1,2,n) (9)在式(9)中,对于给定的常数C,求出使(w,)= (ww)+Ci (10)取极小值的w,b,这一优化问题同样需要变换为用拉格朗日乘子表示的对偶问题,变换的过程与前面线

10、性可分样本的对偶问题类似,结果也几乎完全相同,只是约束条件略有变化:aiyi=0,(0aiC,i=1,2,n) (11)其中:C反映了在复杂性和不可分样本所占比例之间的折中。2.3 支持向量机如果用内积K(x,x)代替最优分类面中的点积,就相当于把原特征空间变换到了某一新的特征空间,此时优化函数变为:W(a)= ai -aiajyiyj(xixj) (12)相应的判别函数也应变为: f(x)=sgnai*yik(xix)+b* (13)算法的其他条件均不变,这就是支持向量机。支持向量机的基本思想可以概括为:首先通过非线性变换将输入空间变换到一个高维空间,然后在这个新空间中求取最优线性分类面,而

11、这种非线性变换是通过定义适当的内积函数实现的。常用的核函数有以下几种:(1) 线性内积函数K(x,y)=xy (14)(2) 多项式内积函数K(x,y)=(xy)+1d (15)(3) 径向基内积函数K(x,y)=exp-|x-y|2/2 (16)(4) 二层神经网络内积函数K(x,y)=tanh(k(xy)+c) (17)3 预处理本文是对已经扫描后的手写体数字图像进行识别，在识别前由于得到的图像和图像中的数字的各种属性有很大差别,这给识别增加了难度,因而对待识别图像要进行预处理。3.1 归一化归一化处理包括两个部分:一是裁剪裁剪出图像中只有文字信息的部分作为我们研究处理的对象;二是图像的

12、缩放。本文首先求得图像中的数字上、下、左、右四个边界点，剪裁掉图像四个方向的边界空白区域，并求出待识别数字的高度和宽度，然后按比例将图像统一缩放为64*40像素。3.2 二值化由于手写体数字识别只需要处理图像中的字符信息,对颜色等信息不作处理,所以可对扫描得到的图像进行二值化处理。本文直接利用matlab中的im2bw函数将待识别图像转化为二进制图像。3.3 细化一个图像的“骨架”,是指图像中央的骨架部分。它是描述图像几何及拓扑性质的重要特征之一。求一个图像骨架的过程通常称为对图像的“细化”过程。本文直接利用matlab中的bwperim函数得到待识别数字的骨架。4 基于SVM的手写数字的识

13、别4.1 特征提取数字识别中，特征提取是一个非常重要的环节，特征提取的方法有多种，本文提取的特征为：穿越次数特征和粗网格特征以及密度特征。（1）穿越次数特征：水平或垂直扫描，统计由白像素到黑像素的变化次数。首先从图像的1/5高度处水平扫描像素，计算穿越次数，可以将10个数字分为两类，穿越次数为1的数字必然有1，4，7，9，穿越次数为2的数字可能有0，2，3，5，6，8。如图1所示：1/5高度处图11、4、7、90、2、3、5、6、81、74、90、82、3、5、6再次从图像的4/5高度处水平扫描，穿越次数为1的数字必然有1、7，穿越次数为2的数字必然有0、8。这样基本上做出了如图2的分类：图2

14、（2）粗网格特征：就是图像归一化，分成 nn的网格,统计每个网格内的黑像素的数量即得到一个以数值表示的网格特征（40维）。归一化后图像为6440像素大小，将其分成85的网格，每个网格包含的像素为88个，统计每个网格内的黑像素的数量，大于等于1统一做为1来处理。（3）密度特征：统计归一化成6440像素、再分成85网格后计算水平块和垂直块的黑像素密度特征（40维）。4.2 识别过程利用支持向量机进行手写体数字识别的分类函数形式上类似一个神经网络，其输出是若干中间层节点的线性组合，而每一个中间层节点对应于输入样本与一个支持向量的内积，因此也称为支持向量网络，如图3所示。输出层中间层输入层X1X2Xd

15、1d(x)a1y1a2y2banynK(x1,x)K(x2,x)K(xn,x)图3 支持向量网络对于已进行过预处理的待识别手写阿拉伯数字图像，先提取穿越次数特征，进行基本分类；再提取粗网格特征以及密度特征，形成一个80维的向量，然后用SVM进行数字的分类识别。由于SVM分类器是二分类器,如果要识别09的数字必须要多个SVM分类器组合起来使用 ,这样工作量比较大,因此本文采用的是在matlab的基础上结合LIBSVM软件中的训练程序svmtrain和识别程序svmpredict,它们不仅支持二分类,同时也可以支持多分类,它采用的核函数为径向基内积函数.本文对每个字符提取了100个样本，所以共有1

16、0组样本，每组包含100个样本。将每个字符对应的训练样本输入，程序将自动提取待识别图像的特征，形成向量，进行训练.训练完成后,将待识别图像输入,程序自动进行特征提取后，形成向量并分类.5 总结本文采用了多个不同的手写体数字样本进行训练,由于在特征提取时应用了穿越特征进行了基本的分类，因此识别率有所提高。神经网络也是经常使用的一种文字识别方法，本文在基于同样的特征提取以及同样的样本集和测试集前提下,用SVM进行手写体数字识别和与BP神经网络进行手写体数字识别进行比较，发现用SVM方法的识别速度相对比较快。SVM算法是将问题转化为凸二次优化问题6，得到的解是全局最优解；而BP网络进行识别得到的解可

17、能存在局部最优解，因此SVM识别方案解决了BP网络方法中无法避免的局部极值问题，在识别率上优于BP网络识别方案。参考文献1 边肇祺,张学工.模式识别.北京:清华大学出版社,20022 周昌乐 .手写汉字的机器识别.北京：科学出版社,19973 Vladimir N Vaapnik 著.张学工译.统计学习理论的本质.北京:清华大学出版社,20004 施善昌.自动识别技术原理及应用.北京:人民邮电出版社,19895 马立权等.手写数字识别中的预处理技术研究.仪器仪表学报,2001；66 Taylor J A.Self:A Flexible Self Assessment Package for D

18、istance and other learners.Computers&Education,1998；31:329328Editors note: Judson Jones is a meteorologist, journalist and photographer. He has freelanced with CNN for four years, covering severe weather from tornadoes to typhoons. Follow him on Twitter: jnjonesjr (CNN) - I will always wonder what i

19、t was like to huddle around a shortwave radio and through the crackling static from space hear the faint beeps of the worlds first satellite - Sputnik. I also missed watching Neil Armstrong step foot on the moon and the first space shuttle take off for the stars. Those events were way before my time

20、.As a kid, I was fascinated with what goes on in the sky, and when NASA pulled the plug on the shuttle program I was heartbroken. Yet the privatized space race has renewed my childhood dreams to reach for the stars.As a meteorologist, Ive still seen many important weather and space events, but right

21、 now, if you were sitting next to me, youd hear my foot tapping rapidly under my desk. Im anxious for the next one: a space capsule hanging from a crane in the New Mexico desert.Its like the set for a George Lucas movie floating to the edge of space.You and I will have the chance to watch a man take

22、 a leap into an unimaginable free fall from the edge of space - live.The (lack of) air up there Watch man jump from 96,000 feet Tuesday, I sat at work glued to the live stream of the Red Bull Stratos Mission. I watched the balloons positioned at different altitudes in the sky to test the winds, know

23、ing that if they would just line up in a vertical straight line we would be go for launch.I feel this mission was created for me because I am also a journalist and a photographer, but above all I live for taking a leap of faith - the feeling of pushing the envelope into uncharted territory.The guy w

24、ho is going to do this, Felix Baumgartner, must have that same feeling, at a level I will never reach. However, it did not stop me from feeling his pain when a gust of swirling wind kicked up and twisted the partially filled balloon that would take him to the upper end of our atmosphere. As soon as

25、the 40-acre balloon, with skin no thicker than a dry cleaning bag, scraped the ground I knew it was over.How claustrophobia almost grounded supersonic skydiverWith each twist, you could see the wrinkles of disappointment on the face of the current record holder and capcom (capsule communications), C

26、ol. Joe Kittinger. He hung his head low in mission control as he told Baumgartner the disappointing news: Mission aborted.The supersonic descent could happen as early as Sunday.The weather plays an important role in this mission. Starting at the ground, conditions have to be very calm - winds less t

27、han 2 mph, with no precipitation or humidity and limited cloud cover. The balloon, with capsule attached, will move through the lower level of the atmosphere (the troposphere) where our day-to-day weather lives. It will climb higher than the tip of Mount Everest (5.5 miles/8.85 kilometers), drifting

28、 even higher than the cruising altitude of commercial airliners (5.6 miles/9.17 kilometers) and into the stratosphere. As he crosses the boundary layer (called the tropopause), he can expect a lot of turbulence.The balloon will slowly drift to the edge of space at 120,000 feet (22.7 miles/36.53 kilo

29、meters). Here, Fearless Felix will unclip. He will roll back the door.Then, I would assume, he will slowly step out onto something resembling an Olympic diving platform.Below, the Earth becomes the concrete bottom of a swimming pool that he wants to land on, but not too hard. Still, hell be travelin

30、g fast, so despite the distance, it will not be like diving into the deep end of a pool. It will be like he is diving into the shallow end.Skydiver preps for the big jumpWhen he jumps, he is expected to reach the speed of sound - 690 mph (1,110 kph) - in less than 40 seconds. Like hitting the top of

31、 the water, he will begin to slow as he approaches the more dense air closer to Earth. But this will not be enough to stop him completely.If he goes too fast or spins out of control, he has a stabilization parachute that can be deployed to slow him down. His team hopes its not needed. Instead, he pl

32、ans to deploy his 270-square-foot (25-square-meter) main chute at an altitude of around 5,000 feet (1,524 meters).In order to deploy this chute successfully, he will have to slow to 172 mph (277 kph). He will have a reserve parachute that will open automatically if he loses consciousness at mach spe

33、eds.Even if everything goes as planned, it wont. Baumgartner still will free fall at a speed that would cause you and me to pass out, and no parachute is guaranteed to work higher than 25,000 feet (7,620 meters).It might not be the moon, but Kittinger free fell from 102,800 feet in 1960 - at the dawn of an infamous space race that captured the hearts of many. Baumgartner will attempt to break that record, a feat that boggles the mind. This is one of those monumental moments I will always remember, because there is no way Id miss this.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

4 元

下载	加入VIP免费专享

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: 毕业论文设计-基于SVM 的手写体阿拉伯数字识别09494 毕业论文设计基于 SVM 手写体阿拉伯数字识别 09494

三一文库所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：毕业论文（设计）-基于SVM 的手写体阿拉伯数字识别09494.doc
链接地址：https://www.31doc.com/p-3945691.html