Notes on Computer Vision

Introduction

论文描述了作者在ILSVRC 2015场景分类竞赛中夺冠所采用的模型结构和优化思想，主要包括以下几点：

Relay BP；
VGG+Inception+SPP；
Class-aware Sampling。

Relay BP

CNN训练中经常遇到的梯度消失或者爆炸问题，已经可以通过Batch Normalization（BN）或者Auxiliary Loss（AL，见DSN论文）来解决。

尽管更深的网络具有更好的表达能力，但随着网络层数的增多，网络的性能未必会却随之提升，甚至会有所损害。如下表，作者在Places2数据集上训练VGG结构的模型，随着深度从19增加到25，Top5的错误率是慢慢上升的。

Depth	19	22	25
Top5 Err.	18.93%	19.00%	19.21%

这个问题既然不是由于梯度的原因导致的，而作者在实验中也未曾发现过拟合的现象，那怎么来解释呢？作者从数据处理的角度来考虑这个问题，认为是BP的过程中，反向传播的层数太多，导致了信息的损失。于是，提出了Relay BP，限制BP中的Error只反向传播一定数量的层数。 Relay BP的操作有点类似DSN，通过引入额外的AL来进行BP，不同的地方在于，DSN每个AC的Loss都会一层层反向传播到最底层，而Relay BP的每个AC只负责反向传播到一定层数（<=N层）就停止，然后由另外一个AC接力，向底层反向传播误差，依此继续下去。两个相邻AC之间有重叠（比如重叠n层）。这里的N和n都是需要靠经验调整的。

有重叠的层，就采用加权平均的方式计算梯度，没有重叠的层，就直接采用BP计算得到的梯度。这样，通过SGD就可以对整个网络进行求解。

网络结构

作者采用了A和B两种网络结构模型。

A模型是VGG19的加深版，即conv3，conv4和conv5后都多加了一个conv层，得到一个VGG22模型。

B模型采用Inception结构，第一层采用7x7:2和2x2:2的conv+pool，随后接了三组Inception，每组Inception包括4个Inception模块，每组Inception后接2x2:2的pool。

两个模型相同点在于最后一个pool均采用[7,3,2,1]的SPP，然后接了2个4096的FC层和1个401类的Softmax。此外，在28x28分辨率（A模型的conv4，B模型的第二组Inception）上的conv+pool之后采用一个额外的Loss（在lr降到0.001的时候才引入），进行BP。

Class-aware Sampling

Places2数据集有401类809万训练样本，每类包括4000-30000样本不等。

这种大规模的数据集以及类别的不均衡，会给建模带来挑战，于是作者采用了一种基于类别的均匀采样方式，采用两种list（class list + image list），分两步走：首先是从class list中随机选择一个类别，然后在该类别的image list中随机选择图像。当一个image list都被选择过一遍之后，对这个list进行shuffle。在calss list都被过一遍之后，也进行shuffle。

这种基于类别的均匀采样，可以在验证集上提升0.6%左右的性能。

结果

额外增加一个loss，对top5性能有微小的提升（0.00%-0.18%）
Relay BP相对BP，对top5性能有较大幅度的提升（0.53%-1.17%）
single model相对center crop的top5性能提升，并不像在ImageNet上那么明显（仅1.5%-1.8%左右）
A/B两个模型Ensemble，性能提升也不明显，约为0.4%

method	testing	A top1	A top5	B top1	B top5
loss1+BP	center	50.91	19.00	50.62	18.69
loss1&2+BP	center	50.72	18.84	50.59	18.68
loss1&2+Relay BP	center	49.75	17.83	49.77	17.86
loss1+BP	single	48.67	17.19	48.29	16.89
loss1&2+BP	single	48.55	17.05	48.27	16.89
loss1&2+Relay BP	single	47.86	16.35	47.72	16.36

Introduction

Perceptron is to learn a linear function $f(w, x)=w^Tx+b$ .

Each $w$ corresponds to one hypothesis $h(x)=sign(f(w, x))$ .

A prediction is correct if $y*(w^Tx)>0$ .

Goal of learning is to find a good $w$ such that $h(x)$ makes few mis-predictions.

How to solve

Use stochastic gradient descent to solve the loss functions.

$\mathcal J(\mathbf w)=\frac{1}{N}\sum_{i=1}^N \max(0, -y_i\mathbf w^T \mathbf x_i) \\ \mathcal J_i(\mathbf w)=max(0, -y_i\mathbf w^T\mathbf x_i) \\ \frac{\partial \mathcal J_i}{\partial w_j} = \begin{cases} 0, & \text{if $y_i\mathbf w^T\mathbf x_i>0$} \\ -y_ix_{ij}, & \text{otherwise} \end{cases} \\ \begin{cases} 0, & \text{if $y_i\mathbf w^T\mathbf x_i>0$} \\ -y_i\mathbf x_i, & \text{otherwise} \end{cases}$

so we get the update rule:

$\mathbf w = \mathbf w + y_i\mathbf x_i, \;\text{if it is a mistake}$

Table of Contents:

Caffe Model
ZF Model
VGG-16 Model

Caffe

Layer Name	Input Size	Filter Num	Filter Size	Stride(:Pad)	Output Size	RF Formula	RF Size
conv1	227x227x3	96	11x11	4	55x55x96	(x - 7) / 4	11
pool1	55x55x96	96	3x3	2	27x27x96	(x - 11) / 8	19
conv2	27x27x96	256	5x5	1:2	27x27x256	(x - 43) / 8	51
pool2	27x27x256	256	3x3	2	13x13x256	(x - 51) / 16	67
conv3	13x13x256	384	3x3	1:1	13x13x384	(x - 83) / 16	99
conv4	13x13x384	384	3x3	1:1	13x13x384	(x - 115) / 16	131
conv5	13x13x384	256	3x3	1:1	13x13x256	(x - 147) / 16	163
pool5	13x13x256	256	3x3	2	6x6x256	(x - 163) / 32	195
fc6	6x6x256	4096	6x6	1	1x1x4096	(x - 323) / 32	353
fc7	1x1x4096	4096	1x1	1	1x1x4096	(x - 323) / 32	353
fc8	1x1x4096	1000	1x1	1	1x1x1000	(x - 323) / 32	353

Zeiler-Fergus

Layer Name	Input Size	Filter Num	Filter Size	Stride(:Pad)	Output Size	RF Formula	RF Size
conv1	224x224x3	96	7x7	2:1	110x110x96	(x - 5) / 2	7
pool1	110x110x96	96	3x3	2:1	55x55x96	(x - 7) / 4	11
conv2	55x55x96	256	5x5	2	26x26x256	(x - 19) / 8	27
pool2	26x26x256	256	3x3	2:1	13x13x256	(x - 27) / 16	43
conv3	13x13x256	384	3x3	1:1	13x13x384	(x - 59) / 16	75
conv4	13x13x384	384	3x3	1:1	13x13x384	(x - 91) / 16	107
conv5	13x13x384	256	3x3	1:1	13x13x256	(x - 123) / 16	139
pool5	13x13x256	256	3x3	2	6x6x256	(x - 139) / 32	171
fc6	6x6x256	4096	6x6	1	1x1x4096	(x - 299) / 32	331
fc7	1x1x4096	4096	1x1	1	1x1x4096	(x - 299) / 32	331
fc8	1x1x4096	1000	1x1	1	1x1x1000	(x - 299) / 32	331

VGG-16

Layer Name	Input Size	Filter Num	Filter Size	Stride(:Pad)	Output Size	RF Formula	RF Size
conv1-1	224x224x3	64	3x3	1:1	224x224x64	(x - 2) / 1	3
conv1-2	224x224x64	64	3x3	1:1	224x224x64	(x - 4) / 1	5
pool1	224x224x64	64	2x2	2	112x112x64	(x - 4) / 2	6
conv2-1	112x112x64	128	3x3	1:1	112x112x128	(x - 8) / 2	10
conv2-2	112x112x128	128	3x3	1:1	112x112x128	(x - 12) / 2	14
pool2	112x112x128	128	2x2	2	56x56x128	(x - 12) / 4	16
conv3-1	56x56x128	256	3x3	1:1	56x56x256	(x - 20) / 4	24
conv3-2	56x56x256	256	3x3	1:1	56x56x256	(x - 28) / 4	32
conv3-3	56x56x256	256	3x3	1:1	56x56x256	(x - 36) / 4	40
pool3	56x56x256	256	2x2	2	28x28x256	(x - 36) / 8	44
conv4-1	28x28x256	512	3x3	1:1	28x28x512	(x - 52) / 8	60
conv4-2	28x28x512	512	3x3	1:1	28x28x512	(x - 68) / 8	76
conv4-3	28x28x512	512	3x3	1:1	28x28x512	(x - 84) / 8	92
pool4	28x28x512	512	2x2	2	14x14x512	(x - 84) / 16	100
conv5-1	14x14x512	512	3x3	1:1	14x14x512	(x - 116) / 16	132
conv5-2	14x14x512	512	3x3	1:1	14x14x512	(x - 148) / 16	164
conv5-3	14x14x512	512	3x3	1:1	14x14x512	(x - 180) / 16	196
pool5	14x14x512	512	2x2	2	7x7x512	(x - 180) / 32	212
fc6	7x7x512	4096	7x7	1	1x1x4096	(x - 372) / 32	404
fc7	1x1x4096	4096	1x1	1	1x1x4096	(x - 372) / 32	404
fc8	1x1x4096	1000	1x1	1	1x1x1000	(x - 372) / 32	404

to be continued.

安装Ruby Installer和DevKit

从如下地址下载RubyInstaller 2.1.5，64位版本，并安装到D:\Ruby

rubyinstaller-2.1.5-x64.exe

同样网页下载64位Devkit：DevKit-mingw64-64-4.7.2-20130224-1432-sfx.exe，安装到D:\Ruby\Devkit，或CSDN上下载：

DevKit-mingw64-64-4.7.2-20130224-1432-sfx.exe

安装和配置：

cd D:\Ruby\DevKit
ruby dk.rb init
ruby dk.rb install

安装Jekyll

修改gem源，先删除原来的源，增加taobao的源（http://ruby.taobao.org，直接用ip）

gem sources -r http://rubygems.org/
gem sources -a http://223.6.253.37/
gem install jekyll
gem install rdiscount
gem install redcarpet

运行Jekyll

jekyll new Blog
cd Blog
jekyll serve

打开，http://localhost:4000/，就可以看见网页了。

Jekyll bootstrap is a theme for jekyll which uses the twitter bootstrap css framework. The advantage of jekyll is that it is self hosted (or hosted on github) and that you can write in markdown on a text editor and just git push new posts to a server which produces html files. The advantage of this theme is so that you can start blogging almost right away without having to worry about making a theme.

Setting up

To start you own blog, simply git clone the repository on github. You could also press the “fork” button on github.

git clone git://github.com/nhoss2/jekyll-bootstrap.git

If you want to have your blog on github, make sure you change to the gh-pages branch.

git checkout gh-pages

Then you will need to edit the _config.yml file at the root of repository.

To add your own posts, add a file to the _posts directory which has the name year-month-day-title.md. Note - the file does not have to markdown.

To publish the post, just git push it to your own github repo and your set!

Things to change on `_config.yml`

There is a config file at the root called _config.yml. By Default it looks like:

permalink: /:year/:title/
paginate: 10
exclude:
name: Jekyll Bootstrap
baseurl: /jekyll-bootstrap/

You will need to change the name and baseurl fields. The others are optional. The baseurl field is used for the css files and pagination, if you are hosting the blog on github, you will need to change it to your repository name unless your repository is the same name as your github user name, which means you will need to have no value for baseurl.

For more information on Jekyll, visit their wiki on github.

For more information on github pages: http://pages.github.com.

Notes on Computer Vision

Learning Deep Convolutional Neural Networks for Places2 Scene Recognition

Introduction

Relay BP

网络结构

Class-aware Sampling

结果

21 December 2015

paper reading

Perceptron

Introduction

How to solve

13 March 2015

machine learning

Receptive Field Computation for Deep Models

Caffe

Zeiler-Fergus

VGG-16

to be continued.

19 January 2015

deep learning

Jekyll安装指南

安装Ruby Installer和DevKit

安装Jekyll

运行Jekyll

13 January 2015

jekyll

Introducing Jekyll Bootstrap

Setting up

Things to change on `_config.yml`

03 September 2011

jekyll

Learning Deep Convolutional Neural Networks for Places2 Scene Recognition

Introduction

Relay BP

网络结构

Class-aware Sampling

结果

21 December 2015

paper reading

Perceptron

Introduction

How to solve

13 March 2015

machine learning

Receptive Field Computation for Deep Models

Caffe

Zeiler-Fergus

VGG-16

to be continued.

19 January 2015

deep learning

Jekyll安装指南

安装Ruby Installer和DevKit

安装Jekyll

运行Jekyll

13 January 2015

jekyll

Introducing Jekyll Bootstrap

Setting up

Things to change on _config.yml

03 September 2011

jekyll

Things to change on `_config.yml`