caffe 命令行与python接口

命令行接口 cmdcaffe

caffe经过编译后才会生成对应的工具,这个工具在目录caffe-ROOT/build/tools中,在此路目录中可用的命令有:

1
2
3
4
./caffe train           #train or finetune a model
./caffe test #score a model
./caffe device_query #show GPU diagnostic information
./caffe time #benchmark model execution time

训练

训练

caffe提供三种训练方式。

  1. 从头开始训练模型。需要提供.prototxt配置文件的路径,如:

    1
    2
    3
    4
    5
    6
    7
    # 训练,默认使用CPU
    ./build/tools/caffe train \
    -solver examples/mnist/lenet_solver.prototxt
    # 使用编号为2 的GPU训练
    ./build/tools/caffe train \
    -solver examples/mnist/lenet_solver.prototxt \
    -gpu 2
  2. 从snapshot中恢复训练。需要提供.solverstate文件路径

    1
    2
    3
    4
    # 提供 -snapshot继续训练
    ./build/tools/caffe train \
    -solver examples/mnist/lenet_solver.prototxt \
    -snapshot examples/mnist/lenet_iter_5000.solverstate

    如果最初设定的最大训练次数不够的话,可以在配置文件lenet_prototxt.solver中修改max_iter: 10000,比如增加此时为20000.

  3. 使用预训练模型微调(迁移学习)。需要提供.caffemodel文件路径

    1
    2
    3
    4
    # 指明 -weights 关键字,提供预训练模型
    ./build/tools/caffe train \
    -solver examples/finetuning_on_flickr_style/solver.prototxt \
    -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

    这里由完整的微调例子examples/finetuning_on_flickr_style

多GPU并行

-gpu后指定要使用的GPU编号,如-gpu 0,1,2,3,表示使用4个GPU并行计算。使用多GPU时,相同的网络配置会被分配到每一个gpu,每一个GPU所处理数据的batch_size相同,所以,整体并行处理的数据量是batch_size*4

1
2
# 使用可用的所有GPU设备
caffe train -solver examples/mnist/lenet_solver.prototxt -gpu all

检查GPU

使用如下命令检查指定GPU是否正常工作:

1
./build/tools/caffe device_query -gpu 0

返回0号GPU的硬件信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
I0603 16:40:28.905443 13455 caffe.cpp:138] Querying GPUs 0
I0603 16:40:28.927069 13455 common.cpp:178] Device id: 0
I0603 16:40:28.927090 13455 common.cpp:179] Major revision number: 6
I0603 16:40:28.927093 13455 common.cpp:180] Minor revision number: 1
I0603 16:40:28.927096 13455 common.cpp:181] Name: GeForce GTX 1050
I0603 16:40:28.927099 13455 common.cpp:182] Total global memory: 2099904512
I0603 16:40:28.927103 13455 common.cpp:183] Total shared memory per block: 49152
I0603 16:40:28.927106 13455 common.cpp:184] Total registers per block: 65536
I0603 16:40:28.927109 13455 common.cpp:185] Warp size: 32
I0603 16:40:28.927112 13455 common.cpp:186] Maximum memory pitch: 2147483647
I0603 16:40:28.927115 13455 common.cpp:187] Maximum threads per block: 1024
I0603 16:40:28.927119 13455 common.cpp:188] Maximum dimension of block: 1024, 1024, 64
I0603 16:40:28.927122 13455 common.cpp:191] Maximum dimension of grid: 2147483647, 65535, 65535
I0603 16:40:28.927125 13455 common.cpp:194] Clock rate: 1493000
I0603 16:40:28.927129 13455 common.cpp:195] Total constant memory: 65536
I0603 16:40:28.927151 13455 common.cpp:196] Texture alignment: 512
I0603 16:40:28.927155 13455 common.cpp:197] Concurrent copy and execution: Yes
I0603 16:40:28.927160 13455 common.cpp:199] Number of multiprocessors: 5
I0603 16:40:28.927183 13455 common.cpp:200] Kernel execution timeout: Yes

准确度测试

测试会给出模型的每一batch的loss和accuracy以及整体平均的loss和accuracy。test表示只进行forward计算,没有backward。即推理,而非训练

1
2
3
4
5
./build/tools/caffe test \
-model examples/mnist/lenet_train_test.prototxt \
-weights examples/mnist/lenet_iter_10000.caffemodel \
-gpu 0 \
-iterations 100

lenet_train_test.prototxt所定义的模型结构上,使用模型lenet_iter_10000.caffemodel,对测试样本执行100次iteration。batch_size为100,所以iteration×batch_size=10000,覆盖了所有的测试样本这个测试数据在哪??

时间测试

指明./build/tools/caffe time测试模型,输出每一层的前先计算后向计算的时间。

  1. 下面为lenet计时cpu计算10次迭代。(默认测试50次迭代)

    1
    2
    3
    ./build/tools/caffe time \
    -model examples/mnist/lenet_train_test.prototxt \
    -iterations 10

    结果:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    I0603 17:30:35.768501 15346 caffe.cpp:365] *** Benchmark begins ***
    I0603 17:30:35.768518 15346 caffe.cpp:366] Testing for 10 iterations.
    I0603 17:30:35.835475 15346 caffe.cpp:394] Iteration: 1 forward-backward time: 66 ms.
    I0603 17:30:35.902711 15346 caffe.cpp:394] Iteration: 2 forward-backward time: 67 ms.
    I0603 17:30:35.969769 15346 caffe.cpp:394] Iteration: 3 forward-backward time: 67 ms.
    I0603 17:30:36.036651 15346 caffe.cpp:394] Iteration: 4 forward-backward time: 66 ms.
    I0603 17:30:36.105055 15346 caffe.cpp:394] Iteration: 5 forward-backward time: 68 ms.
    I0603 17:30:36.174151 15346 caffe.cpp:394] Iteration: 6 forward-backward time: 69 ms.
    I0603 17:30:36.241129 15346 caffe.cpp:394] Iteration: 7 forward-backward time: 66 ms.
    I0603 17:30:36.308782 15346 caffe.cpp:394] Iteration: 8 forward-backward time: 67 ms.
    I0603 17:30:36.376447 15346 caffe.cpp:394] Iteration: 9 forward-backward time: 67 ms.
    I0603 17:30:36.443658 15346 caffe.cpp:394] Iteration: 10 forward-backward time: 67 ms.
    I0603 17:30:36.443676 15346 caffe.cpp:397] Average time per layer:
    I0603 17:30:36.443698 15346 caffe.cpp:400] mnist forward: 0.015 ms.
    I0603 17:30:36.443706 15346 caffe.cpp:403] mnist backward: 0.0009 ms.
    I0603 17:30:36.443711 15346 caffe.cpp:400] conv1 forward: 7.4511 ms.
    I0603 17:30:36.443714 15346 caffe.cpp:403] conv1 backward: 7.8538 ms.
    I0603 17:30:36.443718 15346 caffe.cpp:400] pool1 forward: 3.3165 ms.
    I0603 17:30:36.443740 15346 caffe.cpp:403] pool1 backward: 0.5728 ms.
    I0603 17:30:36.443745 15346 caffe.cpp:400] conv2 forward: 12.81 ms.
    I0603 17:30:36.443769 15346 caffe.cpp:403] conv2 backward: 25.1095 ms.
    I0603 17:30:36.443774 15346 caffe.cpp:400] pool2 forward: 1.5992 ms.
    I0603 17:30:36.443778 15346 caffe.cpp:403] pool2 backward: 0.5698 ms.
    I0603 17:30:36.443783 15346 caffe.cpp:400] ip1 forward: 2.6873 ms.
    I0603 17:30:36.443787 15346 caffe.cpp:403] ip1 backward: 4.9053 ms.
    I0603 17:30:36.443791 15346 caffe.cpp:400] relu1 forward: 0.0563 ms.
    I0603 17:30:36.443809 15346 caffe.cpp:403] relu1 backward: 0.0507 ms.
    I0603 17:30:36.443814 15346 caffe.cpp:400] ip2 forward: 0.1712 ms.
    I0603 17:30:36.443819 15346 caffe.cpp:403] ip2 backward: 0.2362 ms.
    I0603 17:30:36.443845 15346 caffe.cpp:400] loss forward: 0.0529 ms.
    I0603 17:30:36.443848 15346 caffe.cpp:403] loss backward: 0.0013 ms.
    I0603 17:30:36.443868 15346 caffe.cpp:408] Average Forward pass: 28.1725 ms.
    I0603 17:30:36.443892 15346 caffe.cpp:410] Average Backward pass: 39.3101 ms.
    I0603 17:30:36.443895 15346 caffe.cpp:412] Average Forward-Backward: 67.5 ms.
    I0603 17:30:36.443900 15346 caffe.cpp:414] Total Time: 675 ms.
    I0603 17:30:36.443918 15346 caffe.cpp:415] *** Benchmark ends ***
  2. 使用GPU测试10 侧迭代:

    1
    2
    3
    4
    ./build/tools/caffe time \
    -model examples/mnist/lenet_train_test.prototxt \
    -gpu 0 \
    -iterations 10

    结果:

    1
    2
    3
    4
    I0603 17:32:11.830056 15434 caffe.cpp:365] *** Benchmark begins ***
    ... # 省略
    I0603 17:32:11.876488 15434 caffe.cpp:414] Total Time: 43.9143 ms.
    I0603 17:32:11.876494 15434 caffe.cpp:415] *** Benchmark ends ***
  3. 测试某个训练好的模型各层执行时间。

    1
    2
    3
    4
    5
    ./build/tools/caffe time \
    -model examples/mnist/lenet_train_test.prototxt \
    -weights examples/mnist/lenet_iter_10000.caffemodel \
    -gpu 0 \
    -iterations 10

    python接口 pycaffe

pycaffe接口需要先编译,看这里

caffe/examples中的ipython notebook中是使用pycaffe的实例。