Notes on Computer Vision

Home About Category Tags

Introduction

论文描述了作者在ILSVRC 2015场景分类竞赛中夺冠所采用的模型结构和优化思想,主要包括以下几点:

  1. Relay BP;
  2. VGG+Inception+SPP;
  3. Class-aware Sampling。

Relay BP

CNN训练中经常遇到的梯度消失或者爆炸问题,已经可以通过Batch Normalization(BN)或者Auxiliary Loss(AL,见DSN论文)来解决。

尽管更深的网络具有更好的表达能力,但随着网络层数的增多,网络的性能未必会却随之提升,甚至会有所损害。如下表,作者在Places2数据集上训练VGG结构的模型,随着深度从19增加到25,Top5的错误率是慢慢上升的。

Depth 19 22 25
Top5 Err. 18.93% 19.00% 19.21%

这个问题既然不是由于梯度的原因导致的,而作者在实验中也未曾发现过拟合的现象,那怎么来解释呢?作者从数据处理的角度来考虑这个问题,认为是BP的过程中,反向传播的层数太多,导致了信息的损失。于是,提出了Relay BP,限制BP中的Error只反向传播一定数量的层数。 Relay BP的操作有点类似DSN,通过引入额外的AL来进行BP,不同的地方在于,DSN每个AC的Loss都会一层层反向传播到最底层,而Relay BP的每个AC只负责反向传播到一定层数(<=N层)就停止,然后由另外一个AC接力,向底层反向传播误差,依此继续下去。两个相邻AC之间有重叠(比如重叠n层)。这里的N和n都是需要靠经验调整的。

有重叠的层,就采用加权平均的方式计算梯度,没有重叠的层,就直接采用BP计算得到的梯度。这样,通过SGD就可以对整个网络进行求解。

网络结构

作者采用了A和B两种网络结构模型。

A模型是VGG19的加深版,即conv3,conv4和conv5后都多加了一个conv层,得到一个VGG22模型。

B模型采用Inception结构,第一层采用7x7:2和2x2:2的conv+pool,随后接了三组Inception,每组Inception包括4个Inception模块,每组Inception后接2x2:2的pool。

两个模型相同点在于最后一个pool均采用[7,3,2,1]的SPP,然后接了2个4096的FC层和1个401类的Softmax。此外,在28x28分辨率(A模型的conv4,B模型的第二组Inception)上的conv+pool之后采用一个额外的Loss(在lr降到0.001的时候才引入),进行BP。

Class-aware Sampling

Places2数据集有401类809万训练样本,每类包括4000-30000样本不等。

这种大规模的数据集以及类别的不均衡,会给建模带来挑战,于是作者采用了一种基于类别的均匀采样方式,采用两种list(class list + image list),分两步走:首先是从class list中随机选择一个类别,然后在该类别的image list中随机选择图像。当一个image list都被选择过一遍之后,对这个list进行shuffle。在calss list都被过一遍之后,也进行shuffle。

这种基于类别的均匀采样,可以在验证集上提升0.6%左右的性能。

结果

  • 额外增加一个loss,对top5性能有微小的提升(0.00%-0.18%)
  • Relay BP相对BP,对top5性能有较大幅度的提升(0.53%-1.17%)
  • single model相对center crop的top5性能提升,并不像在ImageNet上那么明显(仅1.5%-1.8%左右)
  • A/B两个模型Ensemble,性能提升也不明显,约为0.4%
method testing A top1 A top5 B top1 B top5
loss1+BP center 50.91 19.00 50.62 18.69
loss1&2+BP center 50.72 18.84 50.59 18.68
loss1&2+Relay BP center 49.75 17.83 49.77 17.86
loss1+BP single 48.67 17.19 48.29 16.89
loss1&2+BP single 48.55 17.05 48.27 16.89
loss1&2+Relay BP single 47.86 16.35 47.72 16.36

Introduction

Perceptron is to learn a linear function .

Each corresponds to one hypothesis .

A prediction is correct if .

Goal of learning is to find a good such that makes few mis-predictions.

How to solve

Use stochastic gradient descent to solve the loss functions.

so we get the update rule:

Table of Contents:

Caffe

Layer Name Input Size Filter Num Filter Size Stride(:Pad) Output Size RF Formula RF Size
conv1 227x227x3 96 11x11 4 55x55x96 (x - 7) / 4 11
pool1 55x55x96 96 3x3 2 27x27x96 (x - 11) / 8 19
conv2 27x27x96 256 5x5 1:2 27x27x256 (x - 43) / 8 51
pool2 27x27x256 256 3x3 2 13x13x256 (x - 51) / 16 67
conv3 13x13x256 384 3x3 1:1 13x13x384 (x - 83) / 16 99
conv4 13x13x384 384 3x3 1:1 13x13x384 (x - 115) / 16 131
conv5 13x13x384 256 3x3 1:1 13x13x256 (x - 147) / 16 163
pool5 13x13x256 256 3x3 2 6x6x256 (x - 163) / 32 195
fc6 6x6x256 4096 6x6 1 1x1x4096 (x - 323) / 32 353
fc7 1x1x4096 4096 1x1 1 1x1x4096 (x - 323) / 32 353
fc8 1x1x4096 1000 1x1 1 1x1x1000 (x - 323) / 32 353

Zeiler-Fergus

Layer Name Input Size Filter Num Filter Size Stride(:Pad) Output Size RF Formula RF Size
conv1 224x224x3 96 7x7 2:1 110x110x96 (x - 5) / 2 7
pool1 110x110x96 96 3x3 2:1 55x55x96 (x - 7) / 4 11
conv2 55x55x96 256 5x5 2 26x26x256 (x - 19) / 8 27
pool2 26x26x256 256 3x3 2:1 13x13x256 (x - 27) / 16 43
conv3 13x13x256 384 3x3 1:1 13x13x384 (x - 59) / 16 75
conv4 13x13x384 384 3x3 1:1 13x13x384 (x - 91) / 16 107
conv5 13x13x384 256 3x3 1:1 13x13x256 (x - 123) / 16 139
pool5 13x13x256 256 3x3 2 6x6x256 (x - 139) / 32 171
fc6 6x6x256 4096 6x6 1 1x1x4096 (x - 299) / 32 331
fc7 1x1x4096 4096 1x1 1 1x1x4096 (x - 299) / 32 331
fc8 1x1x4096 1000 1x1 1 1x1x1000 (x - 299) / 32 331

VGG-16

Layer Name Input Size Filter Num Filter Size Stride(:Pad) Output Size RF Formula RF Size
conv1-1 224x224x3 64 3x3 1:1 224x224x64 (x - 2) / 1 3
conv1-2 224x224x64 64 3x3 1:1 224x224x64 (x - 4) / 1 5
pool1 224x224x64 64 2x2 2 112x112x64 (x - 4) / 2 6
conv2-1 112x112x64 128 3x3 1:1 112x112x128 (x - 8) / 2 10
conv2-2 112x112x128 128 3x3 1:1 112x112x128 (x - 12) / 2 14
pool2 112x112x128 128 2x2 2 56x56x128 (x - 12) / 4 16
conv3-1 56x56x128 256 3x3 1:1 56x56x256 (x - 20) / 4 24
conv3-2 56x56x256 256 3x3 1:1 56x56x256 (x - 28) / 4 32
conv3-3 56x56x256 256 3x3 1:1 56x56x256 (x - 36) / 4 40
pool3 56x56x256 256 2x2 2 28x28x256 (x - 36) / 8 44
conv4-1 28x28x256 512 3x3 1:1 28x28x512 (x - 52) / 8 60
conv4-2 28x28x512 512 3x3 1:1 28x28x512 (x - 68) / 8 76
conv4-3 28x28x512 512 3x3 1:1 28x28x512 (x - 84) / 8 92
pool4 28x28x512 512 2x2 2 14x14x512 (x - 84) / 16 100
conv5-1 14x14x512 512 3x3 1:1 14x14x512 (x - 116) / 16 132
conv5-2 14x14x512 512 3x3 1:1 14x14x512 (x - 148) / 16 164
conv5-3 14x14x512 512 3x3 1:1 14x14x512 (x - 180) / 16 196
pool5 14x14x512 512 2x2 2 7x7x512 (x - 180) / 32 212
fc6 7x7x512 4096 7x7 1 1x1x4096 (x - 372) / 32 404
fc7 1x1x4096 4096 1x1 1 1x1x4096 (x - 372) / 32 404
fc8 1x1x4096 1000 1x1 1 1x1x1000 (x - 372) / 32 404
to be continued.

安装Ruby Installer和DevKit

从如下地址下载RubyInstaller 2.1.5,64位版本,并安装到D:\Ruby

rubyinstaller-2.1.5-x64.exe

同样网页下载64位Devkit:DevKit-mingw64-64-4.7.2-20130224-1432-sfx.exe,安装到D:\Ruby\Devkit,或CSDN上下载:

DevKit-mingw64-64-4.7.2-20130224-1432-sfx.exe

安装和配置:

cd D:\Ruby\DevKit
ruby dk.rb init
ruby dk.rb install

安装Jekyll

修改gem源,先删除原来的源,增加taobao的源(http://ruby.taobao.org,直接用ip)

gem sources -r http://rubygems.org/
gem sources -a http://223.6.253.37/
gem install jekyll
gem install rdiscount
gem install redcarpet

运行Jekyll

jekyll new Blog
cd Blog
jekyll serve

打开,http://localhost:4000/,就可以看见网页了。

Jekyll bootstrap is a theme for jekyll which uses the twitter bootstrap css framework. The advantage of jekyll is that it is self hosted (or hosted on github) and that you can write in markdown on a text editor and just git push new posts to a server which produces html files. The advantage of this theme is so that you can start blogging almost right away without having to worry about making a theme.

Setting up

To start you own blog, simply git clone the repository on github. You could also press the “fork” button on github.

git clone git://github.com/nhoss2/jekyll-bootstrap.git

If you want to have your blog on github, make sure you change to the gh-pages branch.

git checkout gh-pages

Then you will need to edit the _config.yml file at the root of repository.

To add your own posts, add a file to the _posts directory which has the name year-month-day-title.md. Note - the file does not have to markdown.

To publish the post, just git push it to your own github repo and your set!

Things to change on _config.yml

There is a config file at the root called _config.yml. By Default it looks like:

permalink: /:year/:title/
paginate: 10
exclude:
name: Jekyll Bootstrap
baseurl: /jekyll-bootstrap/

You will need to change the name and baseurl fields. The others are optional. The baseurl field is used for the css files and pagination, if you are hosting the blog on github, you will need to change it to your repository name unless your repository is the same name as your github user name, which means you will need to have no value for baseurl.

For more information on Jekyll, visit their wiki on github.

For more information on github pages: http://pages.github.com.