nvidia驱动、cuda、cudnn、TensorFlow版本对应关系
cuda、cudnn、docker、nvidia-docker版本一定看是否兼容。不同版本有兼容问题。
nvidia驱动、cuda对应关系
https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
cuda、cudnn对应关系
https://developer.nvidia.com/rdp/cudnn-archive
TensorFlow、cuda、cudnn、python对应关系
https://tensorflow.google.cn/install/source#linux
升级驱动(非必须)
如果已经安装低版本的驱动,需要升级,需要将所有使用nvidia的程序暂停(一般只有docker),再卸载。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| service docker stop
lsmod | grep nvidia rmmod nvidia-* rmmod nvidia
rpm -qa | grep nvidia rpm -e --nodeps *-nvidia-*
nvidia-smi
|
安装Python3
1 2 3 4 5 6 7 8 9 10 11
| mkdir -p /usr/local/python3 tar -zxvf Python-3.6.6.tgz cd Python-3.6.6 ./configure --prefix=/usr/local/python3 make && make install ln -s /usr/local/python3/bin/python3 /usr/bin/python3 vim ~/.bash_profile
export PATH=$PATH:$HOME/bin:/usr/local/python3/bin
python3 -V
|
安装pip3
1 2 3 4 5
| tar -zxvf pip-9.0.1.tgz cd pip-9.0.1 python3 setup.py install ln -s /usr/local/bin/pip /usr/bin/pip3 pip3 -V
|
安装cuda
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| wget http://mirrors.*******/cuda-repo-rhel7-9-0-local-9.0.176-1.x86_64.rpm rpm -ivh cuda-repo-*.rpm yum clean expire-cache yum install -y cuda
vim ~/.bashrc export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda export PATH=/usr/local/cuda/bin:$PATH source ~/.bashrc
cat /usr/local/cuda/version.txt 或 nvcc --version
|
安装cudnn
直接将cudnn中相关文件,拷贝到cuda对应目录即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| tar -xvf cudnn-8.0-linux-x64-v6.0.tgz
cd /usr/local/cuda
cp /xxx/cuda/include/cudnn.h include/ cp /xxx/cuda/lib64/libcudnn* lib64/
chmod a+r include/cudnn.h chmod a+r lib64/libcudnn*
vim ~/.bashrc export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda export PATH=/usr/local/cuda/bin:$PATH source ~/.bashrc
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
ldconfig -v | grep cudnn /
|
升级docker到18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
|
yum remove docker
cd /usr/local/bin rm -rf docker*
rpm -ivh containerd.io-1.2.2-3.el7.x86_64.rpm rpm -ivh container-selinux-2.74-1.el7.noarch.rpm rpm -ivh docker-ce-cli-18.09.5-3.el7.x86_64.rpm rpm -ivh docker-ce-18.09.5-3.el7.x86_64.rpm
cd /usr/bin cp docker* /usr/local/bin
systemctl status docker.service docker --version
|
安装nvidia-docker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
| rpm -ivh libnvidia-container1-1.0.2-1.x86_64.rpm rpm -ivh libnvidia-container-tools-1.0.2-1.x86_64.rpm rpm -ivh nvidia-container-runtime-hook-1.4.0-2.x86_64.rpm rpm -ivh nvidia-container-runtime-2.0.0-3.docker18.09.5.x86_64.rpm rpm -ivh nvidia-docker2-2.0.3-3.docker18.09.5.ce.noarch.rpm
nvidia-docker --version
vim /etc/docker/daemon.json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
systemctl daemon-reload systemctl enable docker systemctl restart docker
|