计算机视觉论文文献200字

3个回答默认排序

默认排序

按时间排序

Too兔rich

已采纳

摘要人体识别是计算机视觉领域的一大类热点问题，其研究内容涵盖了人体的监测与跟踪、手势识别、动作识别、人脸识别、性别识别和行为与事件识别等，有着非常广泛的应用价值。随机森林以它自身固有的特点和优良的分类效果在众多的机器学习算法中脱颖而出。随机森林算法的实质是一种树预测器的组合，其中每一棵树都依赖于一个随机向量，森林中的所有的向量都是独立同分布的。本文简单介绍了随机森林的原理，并对近几年来随机森林在姿势识别和人脸识别中的应用进行讨论。 1.人体识别概述人体识别是计算机视觉领域的一大类热点问题，其研究内容涵盖了人体的监测与跟踪、手势识别、动作识别、人脸识别、性别识别和行为与事件识别等。其研究方法几乎囊括了所有的模式识别问题的理论与技术，例如统计理论，变换理论，上下文相关性，分类与聚类，机器学习，模板匹配，滤波等。人体识别有着非常广泛的应用价值。绝大多数人脸识别算法和人脸表情分析算法在提取人脸特征之前,需要根据人脸关键点的位置(如眼角,嘴角)进行人脸的几何归一化处理。即使在已知人脸粗略位置的情况下,人脸关键点精确定位仍然是一个很困难的问题,这主要由外界干扰和人脸本身的形变造成。当前比较流行的算法有：基于启发式规则的方法、主成分分析(PCA)、独立元分析(ICA)、基于K-L 变换、弹性图匹配等。 2.随机森林综述随机森林顾名思义，使用随机的方式建立一个森林，森林里面有很多的决策树组成，随机森林的每一棵决策树之间是没有关联的。在得到森林之后，当有一个新的输入样本进入的死后，就让森林的每一棵决策树分别进行一下判断，看看这个样本应该属于哪一类（对于分类算法），然后看看哪一类能被选择最多，就预测这个样本为那一类。随机森林是一种统计学习理论，其随机有两个方面：首先是在训练的每一轮中，都是对原始样本集有放回的抽取固定数目的样本点，形成k个互不相同的样本集。第二点是：对于每一个决策树的建立是从总的属性中随机抽取一定量的属性作分裂属性集，这样对于k个树分类器均是不相同的。由随机生成的k个决策树组成了随机森林。对于每一个决策树来讲，其分裂属性是不断的选取具有最大信息增益的属性进行排列。整个随机森林建立后，最终的分类标准采用投票机制得到可能性最高的结果。下图是随机森林构建的过程：图1 随机森林构建过程 3.随机森林在人体识别中的应用 3.1 随机森林应用于姿势识别以[1]一文来讨论，论文中所涉及到的人体识别过程主要分为两步，首先是，身体部位标记：对于从单张景深图像中对人体进行分段，并标记出关键节点。之后进行身体关节定位，将标记的各个人体部分重新映射到三维空间中，对关键节点形成高可靠的空间定位。图2 深度图像-身体部位标记-关节投影文的最主要贡献在于将姿势识别的问题转化成了物体识别的问题，通过对身体不同部位的空间位置的确定来实现，做到了低计算消耗和高精确度。在身体部位标记的过程中，将问题转化成了对每个像素的分类问题，对于每个像素点，从景深的角度来确定该点的局域梯度特征。该特征是点特征与梯度特征的良好结合。举个例子，对于不同点的相同属性值的判别，如下图，图a中的两个测量点的像素偏移间均具有较大的景深差，而图b中的景深差则明显很小。由此看出，不同位置像素点的特征值是有明显差别的，这就是分类的基础。图3 景深图像特质示例文中对于决策树的分裂属性的选择来说。由于某两个像素点、某些图像特征选取的随意性，将形成大量的备选划分形式，选择对于所有抽样像素对于不同的分裂属性划分前后的信息熵增益进行比较，选取最大的一组ψ=(θ, τ)作为当前分裂节点。（信息增益与该图像块最终是否正确地分类相关，即图像块归属于正确的关键特征点区域的概率。）图4 决策时分类说明决策树的建立后，某个叶子节点归属于特定关键特征点区域的概率可以根据训练图像最终分类的情况统计得到，这就是随机森林在实际检测特征点时的最重要依据。在人体关节分类中，我们由形成的决策森林，来对每一个像素点的具体关节属性进行判断，并进行颜色分类。随机森林这种基于大量样本统计的方法能够对由于光照、变性等造成的影响，实时地解决关键特征点定位的问题。如图所示，是对于景深图像处理后的结果展示。图5 姿势识别处理结果应该这样说，这篇文章在算法的层面对随机森林没有太大的贡献。在划分函数的形式上很简单。这个团队值得称道的地方是通过计算机图形学造出了大量的不同体型不同姿势的各种人体图像，用作训练数据，这也是成为2011年CVPR Best Paper的重要原因。正是因为论文的成果运用于Kinect，在工业界有着巨大的作用，落实到了商用的硬件平台上，推动了随机森林在计算机视觉、多媒体处理上的热潮。 3.2 随机森林应用于人脸识别基于回归森林的脸部特征检测通过分析脸部图像块来定位人脸的关键特征点，在此基础上条件回归森林方法考虑了全局的脸部性质。对于[2]进行分析，这篇论文是2012年CVPR上的论文，本文考虑的是脸部朝向作为全局性质。其主要描述的问题是如何利用条件随机森林，来确定面部10个关键特征点的位置。与之前不同的是，在随机森林的基础上，加入了面部朝向的条件约束。图6 脸部10个特征点对于面部特征标记的问题转化成了对大量图像块的分类问题。类似于人体识别中的局域梯度特征识别。本文中，对于每一个图像块来说，从灰度值、光照补偿、相位变换等图像特征，以及该图像块中心与各个特征点的距离来判断图像块的位置特征。在决策树的分裂属性确定过程，依然使用“最大信息熵增益”原则。图7 条件随机森林算法说明文中提出了更进一步基于条件随机森林的分类方法，即通过设定脸部朝向的约束对决策树分类，在特征检测阶段能够根据脸部朝向选择与之相关的决策树进行回归，提高准确率和降低消耗。此论文还对条件随机森林，即如何通过脸部朝向对决策进行分类进行了说明，但这与随机森林算法没有太大关系，这里就不再继续讨论了。随机森林这种基于大量样本统计的方法能够对由于光照、变性等造成的影响，实时地解决关键特征点定位的问题。另一篇文章[3]对于脸部特征标记，提出了精确度更高、成本更低的方法。即，基于结构化输出的随机森林的特征标记方式。文中将面部划分为20个特征点，对于各个特征点来说，不仅有独立的图像块分类标记，还加入了例如，点4，对于其他嘴唇特征点3,18,19的依赖关系的判断。这样的方法使特征点标记准确率大大增加。该方法依然是使用随机森林的方法，有所不同的是引入了如式中所示的与依赖节点之间的关系。对于决策树的建立依然是依赖信息熵增益原则来决定，叶子节点不仅能得到特征的独立划分还会得到该特征对依赖特征的贡献，最终特征节点的判断会综合原始投票及空间约束。图8 脸部特征标记图9 决策树依赖关系例如当对下图中人脸特征点进行分类时，使用简单的随机森林方法，经过判断会将各个点进行标注，可以看到红色的点，标注出的鼻子特征。如果利用依赖节点进行判断，鼻子的点会被局限在其他鼻子特征点的周围，进行叠加后，得到了这个结果。显然，对于此节点的判断，利用结构输出的方式，准确度更高了。图10 结构化输出结果 4.随机森林总结大量的理论和实证研究都证明了RF具有很高的预测准确率，对异常值和噪声具有很好的容忍度，且不容易出现过拟合。可以说，RF是一种自然的非线性建模工具，是目前数据挖掘算法最热门的前沿研究领域之一。具体来说，它有以下优点： 1.通过对许多分类器进行组合，它可以产生高准确度的分类器； 2.它可以处理大量的输入变量； 3.它可以在决定类别时，评估变量的重要性； 4.在建造森林时，它可以在内部对于一般化后的误差产生不偏差的估计； 5.它包含一个好方法可以估计遗失的资料，并且，如果有很大一部分的资料遗失，仍可以维持准确度。 6.它提供一个实验方法，可以去侦测变量之间的相互作用； 7.学习过程是很快速的； 8.对异常值和噪声具有很好的容忍度，且不容易出现过拟合；随机森林的缺点： 1.对于有不同级别的属性的数据，级别划分较多的属性会对随机森林产生更大的影响，所以随机森林在这种数据上产出的属性权值是不可信的； 2.单棵决策树的预测效果很差：由于随机选择属性，使得单棵决策树的预测效果很差。参考文献： [1] Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A., “Real-time human pose recognition in parts from single depth images,”Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on , vol., no., pp.1297,1304, 20-25 June 2011 [2] Dantone M, Gall J, Fanelli G, et al. Real-time facial feature detection using conditional regression forests[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 2578-2585. [3] Heng Yang, Ioannis Patras, “Face Parts Localization Using Structured-output Regression Forests”, ACCV2012， Dajeon, Korea. 本文转自：，仅供学习交流

327 评论 1小时前发布

江河装饰

推荐下计算机视觉这个领域，依据学术范标准评价体系得出的近年来最重要的9篇论文吧：（对于英语阅读有困难的同学，访问后可以使用翻译功能）一、Deep Residual Learning for Image Recognition 摘要：Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. 全文链接：文献全文 - 学术范 (xueshufan.com) 二、Very Deep Convolutional Networks for Large-Scale Image Recognition 摘要：In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision. 全文链接：文献全文 - 学术范 (xueshufan.com) 三、U-Net: Convolutional Networks for Biomedical Image Segmentation 摘要：There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net. 全文链接：文献全文 - 学术范 (xueshufan.com) 四、Microsoft COCO: Common Objects in Context 摘要：We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model. 全文链接：文献全文 - 学术范 (xueshufan.com) 五、Rethinking the Inception Architecture for Computer Vision 摘要：Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Since 2014 very deep convolutional networks started to become mainstream, yielding substantial gains in various benchmarks. Although increased model size and computational cost tend to translate to immediate quality gains for most tasks (as long as enough labeled data is provided for training), computational efficiency and low parameter count are still enabling factors for various use cases such as mobile vision and big-data scenarios. Here we are exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization. We benchmark our methods on the ILSVRC 2012 classification challenge validation set demonstrate substantial gains over the state of the art: 21:2% top-1 and 5:6% top-5 error for single frame evaluation using a network with a computational cost of 5 billion multiply-adds per inference and with using less than 25 million parameters. With an ensemble of 4 models and multi-crop evaluation, we report 3:5% top-5 error and 17:3% top-1 error on the validation set and 3:6% top-5 error on the official test set. 全文链接：文献全文 - 学术范 (xueshufan.com) 六、Mask R-CNN 摘要：We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without tricks, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code will be made available. 全文链接：文献全文 - 学术范 (xueshufan.com) 七、Feature Pyramid Networks for Object Detection 摘要：Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available. 全文链接：文献全文 - 学术范 (xueshufan.com) 八、ORB: An efficient alternative to SIFT or SURF 摘要：Feature matching is at the base of many computer vision problems, such as object recognition or structure from motion. Current methods rely on costly descriptors for detection and matching. In this paper, we propose a very fast binary descriptor based on BRIEF, called ORB, which is rotation invariant and resistant to noise. We demonstrate through experiments how ORB is at two orders of magnitude faster than SIFT, while performing as well in many situations. The efficiency is tested on several real-world applications, including object detection and patch-tracking on a smart phone. 全文链接：文献全文 - 学术范 (xueshufan.com) 九、DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs 摘要：In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online. 全文链接：文献全文 - 学术范 (xueshufan.com) 希望对你有帮助！

304 评论 12小时前发布

汤汤小朋友

CV邻域经典论文有很多。而且CV领域包含的范围也很大。换而言之，你这个问题很大。你可以根据自己的研究方向去找经典论文。比如：边缘检测方向Canny1986年那篇论文就非常经典，还有Lowe2004年发表的关于特征点匹配的文章也同样很经典。这些论文之所以成为经典是因为作者关于某一个问题做了深入而又详细的工作。从经典论文中你能得到很多启发。判断经典论文的一个简单方法是：看文献引用次数。一般引用次数越多说明，说明这个领域内的研究人员越认可论文作者的工作。

139 评论 12小时前发布

计算机视觉论文文献200字

3个回答 默认排序 默认排序 按时间排序

相关问答

学术期刊

向你推荐

热门问题

3个回答默认排序

默认排序

按时间排序