(It is a backup of the post from my wordpress https://no2147483647.wordpress.com/2015/12/07/deep-learning-for-hackers-with-mxnet-1/, and the Chinese version is published on logdown too http://phunter.logdown.com/posts/314562)
I am going to have a series of blogs about implementing deep learning models and algorithms with MXnet. The topic list covers MNIST, LSTM/RNN, image recognition, neural artstyle image generation etc. Everything here is about programing deep learning (a.k.a. deep learning for hackers), instead of theoritical tutorials, so basic knowledge of machine learning and neural network is a prerequisite. I assume readers know how neural network works and what backpropagation
is. If difficulties, please review Andew Ng's coursera class Week 4https://www.coursera.org/learn/machine-learning.
Surely, this blog doesn't show everything about deep learning. It is very important to understand the fundamental deep learning knowledge. For readers who want to know in-depth theoritical deep learning knowledge, please read some good tutorials, for example, http://deeplearning.net/reading-list/tutorials/.
MXnet: lightweight, distributed, portable deep learning toolkit
MXnet is a deep learning toolkit written in C++11, and it comes with DMLC (Distributed (Deep) Machine Learning Common
http://dmlc.ml/). You might have known MXnet's famous DMLC-sibling xgboost
https://github.com/dmlc/xgboost, a parallel gradient boosting decision tree which dominates most Kaggle competitions and is generally used in many projects.
MXnet is very lightweight, dynamic, portable, easy to distribute, memory efficient, and one of the coolest features is, it can run on portable devices (e.g. image recognition on your Android phone ) MXnet also has clear design plus clean C++11 code, let go star and fork it on github: https://github.com/dmlc/mxnet
Recently MXnet has received much attention in multiple conferences and blogs for its unique features of speed and efficient memory usage. Professionals are comparing MXnet with Caffe, Torch7 and Google's TensorFlow. These benchmarks show that MXnet is a new raising star. Go check this recent tweet from Quora's Xavier Amatriain: https://twitter.com/xamat/status/665222179668168704
Install MXnet with GPU
MXnet natively supports multiple platforms (Linux, Mac OS X and Windows) and multiple languages (C++, Java, Python, R and Julia, plus a recent support on javascript, no joking MXnet.js). In this tutorial, we use Ubuntu 14.04 LTS and Python for example. Just a reminder that, since we use CUDA for GPU computing and CUDA hasn't yet support ubuntu 15.10 or newer (with gcc 5.2), let's stay with 14.04 LTS, or, at latest 15.04.
The installation can be done on physical machines with nVidia CUDA GPUs or cloud instance, for example AWS GPU instance g2.2xlarge
or g2.8xlarge
. The following steps mostly come from the official installation guide http://mxnt.ml/en/latest/build.html#building-on-linux, with some CUDA modification.
Please note: for installing CUDA on AWS from scratch, some additional steps are needed for updating linux-image-extra-virtual
and disabling nouveau
, for more details, please refer to Caffe's guide: https://github.com/BVLC/caffe/wiki/Install-Caffe-on-EC2-from-scratch-Ubuntu,-CUDA-7,-cuDNN)(
Install dependency
MXnet only needs minimal dependency: gcc, BLAS, and OpenCV (optional), that is it. One can install git
just in case it hasn't been installed.
sudo apt-get update
sudo apt-get install -y build-essential git libblas-dev libopencv-dev
Clone mxnet
git clone --recursive https://github.com/dmlc/mxnet
Just another reminder that --recursive
is needed: MXnet depends on DMLC common packages mshadow
, ps-lite
and dmlc-core
, where --recursive
can clone all necessary ones. Please don't compile now, and we need to install CUDA firstly.
Install CUDA
CUDA installation here is universal for other deep learning packages. Please go to https://developer.nvidia.com/cuda-downloads for selecting the CUDA installation for the corresponding system. For example, installing CUDA for Ubuntu 14.04 should looks like this, and deb(network)
is suggested for fastest downloading from the closest Ubuntu source.
Or, here it is the command-line-only solution:
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
If everything goes well, plese check the video card status by nvidia-smi
, and it should look like this:
CPU info may vary, and I am using a GTX 960 4GB (approximately 200$ now). MXnet has very efficient memory usage, and 4GB is good for most of the problems.
Optional: CuDNN: Mxnet supports cuDNN
too. cuDNN
is nVidia deep learning toolkit which optimizes operations like convolution, poolings etc, for better speed and memory. Usually it can speed up MXnet by 40% to 50%. If interested, please go apply for the developer program here https://developer.nvidia.com/cudnn, and install cuDNN
by the official instruction when approved,
Compile MXnet with CUDA support
MXnet needs to turn on CUDA support in the configuration. Please find config.mk
from mxnet/make/
, copy to mxnet/
, and edit these three lines:
USE_CUDA = 1
USE_CUDA_PATH = /usr/local/cuda
USE_BLAS = blas
where the second line is for CUDA installation path. The path usually is /usr/local/cuda
or /usr/local/cuda-7.5
. If readers prefer other BLAS
implementations. e.g. OpenBlas
or Atlas
, please change USE_BLAS
to openblas
or atlas
and add the blas path to ADD_LDFLAGS
and ADD_CFLAGS
.
We can compile MXnet with CUDA (-j4
for multi-thread compiling):
make -j4
One more reminder that, if one has non-CUDA video cards, for example Intel Iris or AMD R9, or there is not video card, please change USE_CUDA
to 0
. MXnet is dynamic for switching between CPU and GPU: instead of GPU version, one can compile multi-theading CPU version by setting USE_OPENMP = 1
or leave it to 0 so BLAS
can take care of multi-threading, either way is fine with MXnet.
Install Python support
MXnet natively supports Python, one can simply do:
cd python; python setup.py install
Python 2.7 is suggested while Python 3.4 is also supported. One might need setuptools
and numpy
if not yet installed. I personally suggest Python from Anaconda
or Miniconda
wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh
bash Miniconda-latest-Linux-x86_64.sh
(answer some installation questions)
conda install numpy
Let's run MNIST, a handwritten digit recognizer
Now we have a GPU-ready MXnet, let's have the first deep learning example: MNIST. MNIST is a handwritten digit dataset with 60,000 training samples and 10,000 testing samples, where each samples is a 28X28 greyscale picture of number, and the goal of MNIST is training a small machine learning model for recognizing hand-writing digits.
Let's run it:
cd mxnet/example/image-classification/
python train_mnist.py
train_mnist.py
will download MNIST dataset for the first time, please be patient.
Note:train_mnist.py
by default uses CPU only. MXnet has easy swtich between CPU and GPU. Since we have GPU, let's turn it on by:
python train_mnist.py --gpus 0
That is it. --gpus 0
means using the first GPU. If one has multiple GPUs, for example 4 GPUs, one can set --gpus 0,1,2,3
for using all of them. While running with GPU, the nvidia-smi
should look like this:
where one can see python
is using GPU. Since MNIST is not a heavy task, with MXnet efficient GPU meomory usage, GPU usage is about 30-40% while memory usage at 67MB。
Trouble shooting
When run with GPU for the first time, readers may see something like this:
ImportError: libcudart.so.7.0: cannot open shared object file: No such file
It is because of the PATH of CUDA dynamic link lib, one can add this to ./bashrc
:
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH
Or compile it to MXnet by adding in config.mk
:
ADD_LDFLAGS = -I/usr/local/cuda-7.5/targets/x86_64-linux/lib/
ADD_CFLAGS =-I/usr/local/cuda-7.5/targets/x86_64-linux/lib/
MNIST code secrete revealed: design a simple MLP
In train_mnist.py
, there is an function get_mlp()
. It implements a multilayer perceptron (MLP). In MXnet, a MLP needs some definition, like this in the code:
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=10)
mlp = mx.symbol.Softmax(data = fc3, name = 'mlp')
Let's understand what is going on for this neural network. Samples in MNIST look like these:
- Each same (a digit) is a 28X28 pixel grey scale image, which can be represented as a vector of 28X28=784 float value where each value is the grey scale of the pixel.
- In MLP, each layer needs a layer structure. For example in the first layer,
fc1
is a full connected layermx.symbol.FullyConnected
which takes input fromdata
. This first layer has 128 nodes, defined asnum_hidden
. - Each layer also need an activation function
Activation
to connect to the next layer, in other words, transferring values from the current layer to the next layer. In this example, theActivation
function for connecting layerfc1
andfc2
isReLu
, which is short forrectified linear unit
orRectifier
, a function asf(x)=max(0,x)
.ReLU
is a very commonly used activation function in deep learning, mostly because it is easy to calculate and easy to converge in gradient decent. In addition, we chooseReLU
for the MNIST problem because MNIST has sparse feature where most values are 0. For more information aboutReLU
, please check wikipedia and other deep learning books or tutorials. - The second layer
fc2
is similar to the first layerfc1
: it takes the output fromfc1
as input, and output to the third layerfc3
. -
fc3
works similar to the previous layer, and the only difference is that, it is an output layer, which has 10 nodes where each node outputs the probability of being one of the 10 digits, thusnum_hidden=10
.
With this network structure, MXnet also needs the stucture of input. Since each sample is 28X28 grey scale, MXnet takes the grey scale value vector of 28X28=784 elements, and give a python iterator get_iterator()
for feeding data to the network defined above. The detailed code is in the example which is very clean, so I don't copy-n-paste here.
The final step is running the model. If readers know scikit-learn
, MXnet's python looks very familiar, right?
train_model.fit(args, net, get_iterator)
Congratulations! We can implement a MLP! It is the first step of deep learning, not so hard, right?
It is Q&A time now. Some of my dear readers may ask, "Do I need to design my MNIST recognizer or some other deep learning network exactly like what you did here?" "hey, I see another function get_lenet()
in the code, what is that?"
The answer to these two questions can be, most real life problems on deep learning are about designing the networks, and, no, you don't have to design your network exactly like what I did here. Designing the neural network is art, and each problem needs different networks. In the MNIST example code, get_lenet()
implements Yann Lecun's convolution network LeNet
for digit recognition, where each layer needs Convolution
Activation
and Pooling
where the kernel size and filter are needed, instead of FullyConnected
and ReLU
. FYI: the detailed explaination of super cool convolution network (ConvNet) can be found at Yann Lecun's tutorial: http://www.cs.nyu.edu/~yann/talks/lecun-ranzato-icml2013.pdf . Another good reference can be "Understanding ConvNet" by Colah. I may later on write a blog post for explaining ConvNet, since convnet is my personal favorite.
I have a fun homework for my dear readers. Let's tune up some network parameters like the number of nodes and the activation function in get_mlp()
, and see how it helps the precision and accuracy. One can also try changing num_epoch
(number of iterations of learning from the data) and learning_rate
(the speed of gradient decent, a.k.a the rate of learning) for better speed and precision. Please leave your comment for your network structure and precision score. Kaggle also has a MNIST competition, one can also go compete with mxnet MNIST models, please mention it is MXnet model. The portal: https://www.kaggle.com/c/digit-recognizer
Outlook
Thanks for reading this first blog of "Deep learning for hackers with MXnet". The following blogs will include some examples in MXnet, which may include RNN/LSTM for generating a Shakespeare script (well, looks like Shakespeare), generative models of simulating Van Gogh for painting a cat, etc. Some of these models are available now on MXnet github https://github.com/dmlc/mxnet/tree/master/example. Is MXnet cool? Go star and fork it on github https://github.com/dmlc/mxnet
用MXnet实战深度学习之一:安装GPU版mxnet并跑一个MNIST手写数字识别
我想写一系列深度学习的简单实战教程,用mxnet做实现平台的实例代码简单讲解深度学习常用的一些技术方向和实战样例。这一系列的主要内容偏向于讲解实际的例子,从样例和代码里中学习解决实际问题。我会默认读者有一定神经网络和深度学习的基础知识,读者在这里不会看到大段推导和理论阐述。基础理论知识十分重要,如果读者对理论知识有兴趣,可以参看已有的深度学习教程补充和巩固理论基础,这里http://deeplearning.net/reading-list/tutorials/有一些不错的理论教程,相关的理论知识在此不赘述。
MXnet: 轻量化分布式可移植深度学习计算平台
MXnet是一群聪明勇敢勤劳的年轻计算机科学家实现的开源深度学习计算平台,它是DMLC分布式机器学习通用工具包 http://dmlc.ml/ 的重要部分(如果你知道xgboost https://github.com/dmlc/xgboost 这个并行GBT的实现,应该对DMLC也不会陌生)。MXnet的优点是,轻量化、可移植性高、也可轻松分布式并行,并且高效利用显存,更可以灵活的运行在移动设备上。它的代码和使用方法也简洁明了,适合学习实战。这么有意思的深度学习工具平台,大家快去点这个github连接给它加个星加个fork吧,传送门:https://github.com/dmlc/mxnet
安装MXnet
MXnet支持Linux,Windows和Mac平台。本文使用的主要平台是ubuntu 14.04 LTS。提醒注意,这一些系列教程使用CUDA平台做GPU运算,而在本文写作的时候CUDA暂时还不支持最新的ubuntu 15.10版本的环境和编译器(主要是gcc 5.2的编译器),所以强烈建议坚守14.04 LTS版本或者是最多到15.04版。
安装环境可以是带nVidia显卡的实体机器或者是带GPU的云服务器。如果选择实体机,请不要通过虚拟机安装,比如原生Windows下面跑个虚拟的Linux,因为多数虚拟机软件不支持直接调用本机显卡。如果选择云服务器,请一定选择GPU instance比如AWS的g2.2xlarge
或g2.8xlarge
,或者是terminal.com
的GPU instance。注意:terminal.com
号称运行时可以改虚拟机的类型,但是纯CPU的虚拟机在运行时不能无缝切换到GPU,建议一开始就选择GPU instance。
以下安装步骤参考于官方文档:http://mxnt.ml/en/latest/build.html#building-on-linux,本文根据CUDA的安装和实际操作略有修改。
基本依赖的安装
MXnet的另一个优点就是它只需要很少的第三方包,它基本只需要gcc的编译器,BLAS以及可选安装OpenCV。这里如果还没有安装git可以顺道安装一下。
sudo apt-get update
sudo apt-get install -y build-essential git libblas-dev libopencv-dev
下载mxnet
git clone --recursive https://github.com/dmlc/mxnet
这里提醒注意一定不要忘记--recursive
参数,因为mxnet依赖于DMLC通用工具包http://dmlc.ml/,--recursive
参数可以自动加载mshadow
等依赖。这里暂时不要着急编译,我们还要装一下CUDA。
安装CUDA
这里提到的CUDA安装方法也适用于除MXnet之外的其他深度学习软件包。我们通过nVidia官方链接下载安装CUDA驱动和工具包,请前往 https://developer.nvidia.com/cuda-downloads 选择对应的安装方式。国内读者建议网络安装方式deb(network)
,这样ubuntu会选择就近的国内的源安装,速度可能比较快。
如果用ubuntu 14.04,不用去官网,直接运行以下这些命令也可以调用官网下载(安装包较大需要耐心等待):
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get install cuda
如果一切安装成功,可以用nvidia-smi
命令查看你的显卡使用情况,一般空闲的显卡状态是这个样子的:
显卡型号取决于个人经济能力,不过mxnet的显存利用率高,一般一个4G的显卡就足够处理多数别的工具包要很多显存的问题。
可选安装:Mxnet也支持cuDNN
,它是nVidia推出的深度学习加速工具包,能高效实现一些卷积等深度学习常用操作,在内存使用和计算速度上面能有所提高。大家可以到这里 https://developer.nvidia.com/cudnn 申请开发者项目,如果批准通过可以下载安装cuDNN工具包,具体请参照nVidia官方教程。
编译支持GPU的MXnet
MXnet需要打开一个编译和链接选项来支持CUDA。在前一步git clone
得到的mxnet/
目录里找到mxnet/make/
子目录,把该目录下的config.mk
复制到mxnet/
目录,用文本编辑器打开,找到并修改以下几行:
USE_CUDA = 1
USE_CUDA_PATH = /usr/local/cuda
其中第二行是CUDA的安装目录。如果选择默认安装方式,它会在/usr/local/cuda
或者是类似/usr/local/cuda-7.5
这样的原始安装目录,如果是自定义目录的安装,请自行修改本条。
如果用户选择安装atlas或者openblas等其他BLAS的实现,需要额外的修改。如果ubuntu的atlas
实现(sudo apt-get install libatlas-base-dev
或者sudo apt-get install libopenblas-dev
),需要修改为:
USE_BLAS = atlas 或者 openblas
修改之后,在mxnet/
目录下编译(-j4
是可选参数表示用4线程编译):
make -j4
注意:如果没有CUDA支持的显卡(比如Intel的Iris显卡或者AMD的R系列显卡)或者没有显卡,安装和编译GPU版本的mxnet会出错。解决方法是,把USE_CUDA = 1
改回USE_CUDA = 0
,并确保USE_OPENMP = 1
,mxnet会自动编译CPU版本并使用OpenMP进行多核CPU计算。根据问题的不同,GPU版本对比CPU版一般会有20-30倍左右的加速。
安装Python支持
MXnet支持python调用。简单来说就这么安装:
cd python; python setup.py install
建议使用python 2.7版本,需要预先安装setuptools
和numpy
(sudo apt-get install python-numpy
)。如果你的系统安装Numpy有些困难,可以考虑安装Anaconda或者Miniconda之类的python发行版:
wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh
bash Miniconda-latest-Linux-x86_64.sh
(确认回答若干安装问题后)
conda install numpy
运行MNIST手写数字识别
2015年11月19日更新:这里的样例基于旧版mxnet/example
的目录结构,新版的MNIST代码在mxnet/example/image-classification/
下,可以通过--gpu (gpu_id)
开启GPU计算选项,请自行更新并参见新版说明:https://github.com/dmlc/mxnet/tree/master/example/image-classification 。
当MXnet一切安装好之后,可以试试看一下最简单的例子,MNIST手写数字识别。MNIST数据集包含6万个手写数字的训练数据集以及1万个测试数据集,每个图片是28x28的灰度图。在mxnet/example/mnist
里可以找到MXnet自带MNIST的识别样例,我们可以先运行一下试试:
cd mxnet/example/mnist
python mlp.py
mlp.py
会自动下载MNIST数据集,在第一次运行的时候耐心等待一下。
注意:mlp.py
默认使用CPU,训练过程可以跑起来但是很慢。我们已经安装了GPU,只需要修改一行代码,把FeedForward
调用的CPU部分改成GPU即可让MXnet运行在GPU上:
model = mx.model.FeedForward(
ctx = mx.cpu(), symbol = mlp, num_epoch = 20,
learning_rate = 0.1, momentum = 0.9, wd = 0.00001)
变成:
model = mx.model.FeedForward(
ctx = mx.gpu(), symbol = mlp, num_epoch = 20,
learning_rate = 0.1, momentum = 0.9, wd = 0.00001)
再运行一下,是不是快多了呢?MXnet的优点就是接口简洁。运行的时候,nvidia-smi
命令查看显卡状态差不多是这个样子的:
可以看到python
进程在使用GPU,因为这是个比较小的问题同时MXnet的显存优化较好,GPU使用率30%到40%之间,显存占用67MB。
可能出现的问题
运行GPU例子的时候可能会遇到这样的问题:
ImportError: libcudart.so.7.0: cannot open shared object file: No such file
这是因为没有把CUDA的动态链接库加入PATH里,解决方法是,可以在./bashrc
里面加入:
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/targets/x86_64-linux/lib/:$LD_LIBRARY_PATH
或者是在编译MXnet的时候,在config.mk
里的
ADD_LDFLAGS = -I/usr/local/cuda-7.5/targets/x86_64-linux/lib/
ADD_CFLAGS =-I/usr/local/cuda-7.5/targets/x86_64-linux/lib/
MNIST代码简单讲解:设计一个最简单的多层神经网络
mlp.py
实现的是一个多层感知器网络(multilayer perceptron (MLP) )或者叫多层神经网络。在MXnet里,实现一个MLP首先需要定义一下这个MLP的结构,比如在代码里一个三层网络的MLP就是这样的:
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data = data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(data = fc1, name='relu1', act_type="relu")
fc2 = mx.symbol.FullyConnected(data = act1, name = 'fc2', num_hidden = 64)
act2 = mx.symbol.Activation(data = fc2, name='relu2', act_type="relu")
fc3 = mx.symbol.FullyConnected(data = act2, name='fc3', num_hidden=10)
mlp = mx.symbol.Softmax(data = fc3, name = 'mlp')
简单解释一下这几行代码:MNIST的数据集每组数据是28x28的灰度图像,差不多如下图:
每组数据就可以表示成一个长度为28x28=784的一维数组,数组的每个元素是这个像素的灰度值。MLP的每一层需要定义这一层节点的样式,比如fc1
就是接受输入的第一层,它定义为一个全链接层mx.symbol.FullyConnected
,通过data
接受输入,这一层包含了128个节点(num_hidden
)。每一层也需要定义激活函数Activation
,比如第一层到第二层之间的激活函数就是relu
(代表rectified linear unit或者叫Rectifier
)ReLu是深度神经网络里最常见的一个激活函数,主要因为计算函数相对容易和梯度下降不会发散,并且由于MNIST的问题比较稀疏更适合ReLU。限于这里篇幅主要是为了介绍实现一个网络,关于ReLU的相关背景知识请参考wikipedia和其他相关教程。第二层网络fc2
和第一层相似,它接受fc1
的数据作为输入,输出给第三层。第三层网络fc3
和前两层类似,不一样的是它是个结果输出层,产生的是输入图片对应于0-9总共10个数字里每个数字的概率,所以它的num_hidden=10
。
设计好了网络结构之后,MXnet需要声明输入feature的格式,因为每个图片都是28x28大小,按照每个像素的灰度值展开成一列向量就是784维,我们可以告诉mxnet数据的输入尺寸是784,mnist_iterator
是一个python generator一次提供100组数据给我们刚刚设计的MLP,参见同目录的data.py
:
train, val = mnist_iterator(batch_size=100, input_shape = (784,))
接下来就让MXnet建立并运行这个一个模型,就是这样简单,如果你会scikit-learn
会感到很亲切,对不对(记得刚刚修改的指定GPU运行的那一行么?):
model = mx.model.FeedForward(
ctx = mx.gpu(), symbol = mlp, num_epoch = 20,
learning_rate = 0.1, momentum = 0.9, wd = 0.00001)
model.fit(X=train, eval_data=val)
到这里,大家就基本会实现一个多层感知器MLP,恭喜你们这是掌握深度学习的第一步。MXnet的方式比Caffe等其他工具要写个配置文件简单的多了。工业界和学术界的多数深度学习的实际问题都是围绕着设计多层感知器展开,在结构设计激活函数设计等方面有很多有意思的问题。
有读者会问,MLP是不是非要像MNIST手写数字识别这么设计。不是的,这个三层网络只是一个最简单的MLP的例子,这里每一层并不一定需要这样。设计一个更好更高效的多层神经网络和艺术一样没有止境。比如在MNIST同一个目录下的lenet.py
就是用Yann Lecun设计的卷积网络实现数字识别,每层网络需要做的是Convolution
Activation
和Pooling
(如果想知道这三个具体是什么,请参看他的深度学习教程,以后的文章里面可能也会提到。
当做课后作业,读者可以自己试试调一下mlp.py
里不同的节点数和激活函数看看对数字识别率有什么提升,也可以增加num_epoch
调整learning_rate
等参数,在转发、评论或留言写下你们的设计方法和识别准确度(并没有奖励,嗯)。Kaggle针对MNIST数据集有一个教学比赛,读者可以用MXnet训练一个自己的MNIST模型,把结果提交上去比一比,记得说你是用MXnet做的哟,传送门: https://www.kaggle.com/c/digit-recognizer
后记
这篇文章是这一系列的第一篇,我本意是想写个MXnet的GPU安装方法,后来想想加个例子讲解一下各种模型顺便当做另外一种深度学习入门教程吧。后续的一些文章会挑选mxnet自带的例子,介绍一些常见的有意思的深度学习模型,比如RNN,LSTM,以及它们在MXnet里的实现,比如写个自动作词机模仿汪峰老师作词之类的。MXnet这么有意思的深度学习工具平台,大家快去这个github连接给它加个星加个fork吧,传送门:https://github.com/dmlc/mxnet