Graphics Drivers
nVidia is the best and worst graphics card for Linux. It is the worst because it is fraught with proprietary nonsense and it is the best, well, because it works pretty well.
If you need a system where you can audit all the source code, nVidia hardware may not be an option. But if you just need some simple Linux workstations for 3d graphics, it might be the simplest option.
I find that using nVidia’s automagical installer/driver just works. Usually.
Also for CentOS specific package technology involving Nvidia drivers see my CentOS notes.
Drivers
At the current time (late 2012) the Linux drivers live
here. Note that "Linux
x86/IA32" is for 32 bit systems. (Check yours with something like
file /sbin/init
). These days, you probably want "Linux
x86_64/AMD64/EM64T".
What version are you currently using? Check with this.
cat /proc/driver/nvidia/version
Installing and Updates
It turns out that GPU drivers are deeply in touch with the kernel. The driver itself is a kernel module. This module must match the kernel and must be built to fit. The Nvidia installer automagically takes care of all this (assuming you have a build environment with a complier, etc).
The problem is that whenever you update your machine and there is a
kernel update (which is about every two weeks in my experience), the
graphics will stop working. You must reboot into the new kernel (you
can’t fix it right after doing the update while running the previous
kernel). Then you’ll be in some no-mans-land text console with no
prompt (CentOS6). Use "Alt-F2" to go to a console with a getty
login
prompt. Log in and re install the Nvidia driver. This also is the
process after you first install CentOS.
I find that I do this so often that I have a tiny script to make it automatic so I don’t have to answer questions and generally hold its hand. My little script looks like:
#!/bin/bash sh /pro/nvidia/current -a -q -X --ui=none -f -n
For the Debian style distributions this works.
#!/bin/bash echo "Shutting down X server..." sudo service lightdm stop echo "Running NVIDIA kernel module installer..." sudo sh ~/src/NVIDIA-Linux-x86_64-304.117.run -a -q -X --ui=none -f -n
And that lives in a directory with an assortment of drivers where
current
is a link to the one I need most often:
:->[host][~]$ ls /usr/local/nvidia/
NVIDIA-Linux-x86-304.64.run NVIDIA-Linux-x86_64-304.64.run
NVIDIA-Linux-x86_64-173.14.22-pkg2.run current
NVIDIA-Linux-x86_64-190.53-pkg2.run nvfix
NVIDIA-Linux-x86_64-195.36.15-pkg2.run
Update Process
When I update I usually do it remotely. I log in and do sudo yum -y
update
. Then if a new kernel has been installed, I do sudo reboot
.
Then wait a couple of minutes (sleep 111
). And then log in again.
This time everything seems fine and is updated, but the users sitting
at the workstation will find a confusing text screen with no prompt.
This is because graphics are actually dead. This is when you need to
run the nvfix
script shown above, that’s sudo
/usr/local/nvidia/nvfix
of course since it must be run as root. Then
you must sudo reboot
again. At that point everything should be cool.
It’s a good idea to wait and log back in when it comes up. I’ve had
machines mysteriously not wake up after the reboot.
ElRepo
It might be smarter these days to try to use prepackaged proprietary drivers from the ElRepo repository.
One problem I had after upgrading from 7.x to 7.4 is that although the
modules seem inserted and everything seems fine, no graphics happen.
This
talks about it and has some good general troubleshooting tips. It
seems that lightdm wasn’t starting or staying started. But doing
systemctl start lightdm
seems to have started it and system enable
lightdm
seems to have cured it.
Nouveau Issues
In CentOS 6 and later the default thing to do on installation is to use the new open source Nouveau drivers. That’s nice and I’m glad that someone’s working on a wholesome alternative. But the problem is that these drivers under-perform, by a factor of 2 in my tests. Test it yourself before committing.
Now the really gruesome bit is that you can’t easily install the proprietary drivers while the Nouveau ones are in. Maybe nVidia will fix their installer to be less stupid but for now it’s quite a chore to extricate the Nouveau driver. The best plan is to often reinstall CentOS and make sure you select the reduced graphics mode. I forget what it’s called, but it doesn’t just affect the installation graphics, it affects what drivers are installed. With the low quality (or whatever it’s called) mode, the normal non-accelerated X drivers are installed and those can be replaced by the nVidia installer.
Legacy
Sometimes you’ll have an older machine:
:->[ws9-ablab.ucsd.edu][~]$ lspci | grep -i [n]vi
01:00.0 VGA compatible controller: NVIDIA Corporation NV43
[GeForce 6600 GT] (rev a2)
And running the normal installer fails with some kind of message about
legacy drivers. On the machine above I had to run
NVIDIA-Linux-x86_64-304.64.run
and then it worked. This version was
found on the driver page above and called Latest Legacy GPU version
(304.xx series)
. There are other legacy series like 71.86.xx,
96.43.xx, and 173.14.xx. Use what the installers suggest.
Manual Tweaking With xrandr
I had two vertical 1080x1920 monitors and the "Display" program in Mate was just garbling them. Here’s what I did to sort that out.
xrandr --fb 2160x1920 \
--output HDMI-1 \
--auto \
--pos 0x0 \
--output DVI-I-1 \
--auto \
--pos 1080x0
Or more recently with a different card…
xrandr --fb 2160x1920 \
--output HDMI-0 --auto --rotate left --pos 0x0 \
--output DVI-D-0 --auto --rotate right --pos 1080x0
Here’s another example of my 3 vertical HP monitor setup which each have the slightly unusual resolution of 1920x1200.
xrandr --fb 3600x1920 \
--output VGA-0 --auto --pos 0x0 \
--output DVI-D-0 --auto --pos 1200x0 \
--output HDMI-0 --auto --pos 2400x0
Also note these, which I did not need, if required for emphasis.
--rotate left
--output A --left-of B
In CentOS 7’s Mate I’m finding that the
System->Preference->Hardware->Displays
tool just can’t put my vertical
monitors together properly. What works is to close that, use an
xrandr
command as shown above. Then go back to the Displays GUI tool
when everything is correct. Then it will come up detected correctly
and this is when you want to click "Apply" and then "Apply
system-wide". I don’t know what that writes but it once it’s written,
things work as they should. Well, not the display manager of course,
but who cares about that?
Dummy
From the xpra Xdummy documentation. "Proprietary drivers often install their own copy of libGL which conflicts with the use of software GL rendering. You cannot use this GL library to render directly on Xdummy (or Xvfb)."
This is why you might have trouble using non-interactive rendering tools.
Here is one way Andrey got this problem solved. First he grabbed a libGL.so.1 from a Mesa system (no nvidia drivers). That can be stored locally with no privileges.
Then run the application with something like this.
LD_PRELOAD=/home/${USER}/tmp/libGL.so.1 /usr/bin/Xvfb :96 -cc 4 -screen 0 1024x768x16
AMD
Just some quick notes on AMD/ATI drivers. AMD tries to match nVidia, but they’re a bit behind. However, here are some programs that might come in handy.
amdcccle
fglrxinfo
fgl_glxgears
CUDA And GPU Programming
Setup
You might need one or more of these.
apt install nvidia-driver
apt install nvidia-dev
apt install nvidia-support
apt install nvidia-cuda-toolkit
Invalid (Rotated) Repo Signing Keys
I recently (turns out, after 2022-04-27) did a normal apt update
on
an unremarkable Ubuntu machine and got this disturbing unexpected
error.
http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is no longer signed.
It looks like Nvidia is revoking repo keys on some kind of annoyingly frequent schedule now. I’m not exactly clear on the practical real security benefits. But whatever.
If you see this, this fixed it for me.
distro=ubuntu2004
arch=x86_64
wget
https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo mv -v /etc/apt/sources.list.d/cuda.list /tmp/
sudo apt update # Works now.
sudo apt upgrade
The solution is described here.
nvidia-smi - Checking GPU Action
The "smi" stands for System Management Interface. This command is important for (1) seeing what kind of GPU your system thinks it has access to and (2) how hard that GPU is working right now.
cat /proc/driver/nvidia/version
sudo apt install nvidia-smi
nvidia-smi
nvidia-smi -l 1 # One second refresh.
What processes are using your nvidia device? This can be interesting.
$ sudo fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: root 752 F...m Xorg
/dev/nvidiactl: root 752 F...m Xorg
/dev/nvidia-modeset: root 752 F.... Xorg
Also for real time monitoring try this.
nvidia-smi pmon
Oh and check out nvtop
! That’s a very nice visualization tool.
sudo apt install nvtop
Of course if you’re using an Nvidia computer with an Nvidia distro, this won’t be available! Gah! Maybe install from source? Yes, that seems to work. Here’s the process that worked for me.
apt install cmake libncurses-dev
git clone https://github.com/Syllo/nvtop.git
cd ~/src
mkdir -p nvtop/build && cd nvtop/build
cmake .. -DNVIDIA_SUPPORT=ON
make
sudo make install
Or maybe no GPU shows up with lspci
. Good to check that first.
Or maybe you need to ensure this library — or one like it — is present.
Also nvtop
needs this.
LD_PRELOAD=/usr/lib/nvidia-367/libnvidia-ml.so nvidia-smi
This is the Nvidia Management Library. This page offers downloads that my be helpful. Or they may be malware. Hard to say.
CUDA Specs From Software
Writing a program that needs some CUDA? How can you check if what you have is sufficient? After stumbling into some kind of bug with the photogrammetry project Meshroom, I wanted to know how to check my CUDA Compute Capability, whatever the hell that is. I dug into the AliceVision source code and pulled out the offending checks that said I did not have a CUDA-capable card. Specifically from here. I distilled it into the following short program which does all the checks Meshroom seems to know about. These checks seem generally useful so here is the program.
// Compile with `g++ -o ckgpu ckgpu.cpp -lcudart` #include <string> #include <iostream> #include <sstream> #include <cuda_runtime.h> // ================== gpuSupportCUDA ================== bool gpuSupportCUDA(int minComputeCapabilityMajor, int minComputeCapabilityMinor, int minTotalDeviceMemory=0) { int nbDevices = 0; cudaError_t success; success = cudaGetDeviceCount(&nbDevices); if (success != cudaSuccess) { std::cout << "cudaGetDeviceCount failed: " << cudaGetErrorString(success); nbDevices = 0; } if(nbDevices > 0) { for(int i = 0; i < nbDevices; ++i) { cudaDeviceProp deviceProperties; if(cudaGetDeviceProperties(&deviceProperties, i) != cudaSuccess) { std::cout << "Cannot get properties for CUDA gpu device " << i; continue; } if((deviceProperties.major > minComputeCapabilityMajor || (deviceProperties.major == minComputeCapabilityMajor && deviceProperties.minor >= minComputeCapabilityMinor)) && deviceProperties.totalGlobalMem >= (minTotalDeviceMemory*1024*1024)) { std::cout << "Supported CUDA-Enabled GPU detected." << std::endl; return true; } else { std::cout << "CUDA-Enabled GPU detected, but the compute capabilities is not enough.\n" << " - Device " << i << ": " << deviceProperties.major << "." << deviceProperties.minor << ", global memory: " << int(deviceProperties.totalGlobalMem / (1024*1024)) << "MB\n" << " - Requirements: " << minComputeCapabilityMajor << "." << minComputeCapabilityMinor << ", global memory: " << minTotalDeviceMemory << "MB\n"; } } // End for i<nbDevices std::cout << ("CUDA-Enabled GPU not supported."); } // End if nbDevices else { std::cout << ("Can't find CUDA-Enabled GPU."); } return false; } // End gpuSupportCUDA() // ================== gpuInformationCUDA ================== std::string gpuInformationCUDA() { std::string information; int nbDevices = 0; if( cudaGetDeviceCount(&nbDevices) != cudaSuccess ) { std::cout << ( "Could not determine number of CUDA cards in this system" ); nbDevices = 0; } if(nbDevices > 0) { information = "CUDA-Enabled GPU.\n"; for(int i = 0; i < nbDevices; ++i) { cudaDeviceProp deviceProperties; if(cudaGetDeviceProperties( &deviceProperties, i) != cudaSuccess ) { std::cout << "Cannot get properties for CUDA gpu device " << i; continue; } if( cudaSetDevice( i ) != cudaSuccess ) { std::cout << "Device with number " << i << " does not exist" ; continue; } std::size_t avail; std::size_t total; if(cudaMemGetInfo(&avail, &total) != cudaSuccess) { // if the card does not provide this information. avail = 0; total = 0; std::cout << "Cannot get available memory information for CUDA gpu device " << i << "."; } std::stringstream deviceSS; deviceSS << "Device information:" << std::endl << "\t- id: " << i << std::endl << "\t- name: " << deviceProperties.name << std::endl << "\t- compute capability: " << deviceProperties.major << "." << deviceProperties.minor << std::endl << "\t- total device memory: " << deviceProperties.totalGlobalMem / (1024 * 1024) << " MB " << std::endl << "\t- device memory available: " << avail / (1024 * 1024) << " MB " << std::endl << "\t- per-block shared memory: " << deviceProperties.sharedMemPerBlock << std::endl << "\t- warp size: " << deviceProperties.warpSize << std::endl << "\t- max threads per block: " << deviceProperties.maxThreadsPerBlock << std::endl << "\t- max threads per SM(X): " << deviceProperties.maxThreadsPerMultiProcessor << std::endl << "\t- max block sizes: " << "{" << deviceProperties.maxThreadsDim[0] << "," << deviceProperties.maxThreadsDim[1] << "," << deviceProperties.maxThreadsDim[2] << "}" << std::endl << "\t- max grid sizes: " << "{" << deviceProperties.maxGridSize[0] << "," << deviceProperties.maxGridSize[1] << "," << deviceProperties.maxGridSize[2] << "}" << std::endl << "\t- max 2D array texture: " << "{" << deviceProperties.maxTexture2D[0] << "," << deviceProperties.maxTexture2D[1] << "}" << std::endl << "\t- max 3D array texture: " << "{" << deviceProperties.maxTexture3D[0] << "," << deviceProperties.maxTexture3D[1] << "," << deviceProperties.maxTexture3D[2] << "}" << std::endl << "\t- max 2D linear texture: " << "{" << deviceProperties.maxTexture2DLinear[0] << "," << deviceProperties.maxTexture2DLinear[1] << "," << deviceProperties.maxTexture2DLinear[2] << "}" << std::endl << "\t- max 2D layered texture: " << "{" << deviceProperties.maxTexture2DLayered[0] << "," << deviceProperties.maxTexture2DLayered[1] << "," << deviceProperties.maxTexture2DLayered[2] << "}" << std::endl << "\t- number of SM(x)s: " << deviceProperties.multiProcessorCount << std::endl << "\t- registers per SM(x): " << deviceProperties.regsPerMultiprocessor << std::endl << "\t- registers per block: " << deviceProperties.regsPerBlock << std::endl << "\t- concurrent kernels: " << (deviceProperties.concurrentKernels ? "yes":"no") << std::endl << "\t- mapping host memory: " << (deviceProperties.canMapHostMemory ? "yes":"no") << std::endl << "\t- unified addressing: " << (deviceProperties.unifiedAddressing ? "yes":"no") << std::endl << "\t- texture alignment: " << deviceProperties.textureAlignment << " byte" << std::endl << "\t- pitch alignment: " << deviceProperties.texturePitchAlignment << " byte" << std::endl; information += deviceSS.str(); } // End for i<nbDevices } // End nbDevices>0 else { information = "No CUDA-Enabled GPU."; } return information; } // End gpuInformationCUDA() int main(int argc, char **argv){ gpuSupportCUDA(2,0); std::cout << gpuInformationCUDA(); return 0; }
As you can see, contrary to what Meshroom believes for some erroneous reason, I do have a GPU that can pass the very same checks that software uses.
$ g++ -o ckgpu ckgpu.cpp -lcudart
$ ./ckgpu
Supported CUDA-Enabled GPU detected.
CUDA-Enabled GPU.
Device information:
- id: 0
- name: GeForce GTX 1050 Ti
- compute capability: 6.1
- total device memory: 4039 MB
- device memory available: 3797 MB
- per-block shared memory: 49152
- warp size: 32
- max threads per block: 1024
- max threads per SM(X): 2048
- max block sizes: {1024,1024,64}
- max grid sizes: {2147483647,65535,65535}
- max 2D array texture: {131072,65536}
- max 3D array texture: {16384,16384,16384}
- max 2D linear texture: {131072,65000,2097120}
- max 2D layered texture: {32768,32768,2048}
- number of SM(x)s: 6
- registers per SM(x): 65536
- registers per block: 65536
- concurrent kernels: yes
- mapping host memory: yes
- unified addressing: yes
- texture alignment: 512 byte
- pitch alignment: 32 byte
vulkaninfo
This seems to be like the OpenGL tester and just shows what kind of OpenGL features seem to be supported.
$ vulkaninfo | grep -A7 VkPhysicalDeviceProperties
VkPhysicalDeviceProperties:
--------------------------
apiVersion = 4206797 (1.3.205)
driverVersion = 142622784 (0x8804040)
vendorID = 0x10de
deviceID = 0xa5ba03d7
deviceType = PHYSICAL_DEVICE_TYPE_INTEGRATED_GPU
deviceName = NVIDIA Tegra Xavier (nvgpu)
Note this can take up to 8 seconds to run and is run at every shell launch on Jetson Xavier.
lshw
What interesting thing does lshw
say about your system?
$ sudo lshw -C system
mic-730ai
description: Computer
product: Jetson-AGX
vendor: Unknown
version: Not Specified
serial: 1421121018345
width: 64 bits
capabilities: smbios-3.0.0 dmi-3.0.0 smp cp15_barrier setend swp tagged_addr_disabled
configuration: boot=normal family=Unknown sku=Unknown
/proc/driver/nvidia
I think this only works if there is an nvidia
in your lsmod
.
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 460.91.03 Fri Jul 2 06:04:10 UTC 2021
GCC version: gcc version 10.2.1 20210110 (Debian 10.2.1-6)
Here are some other things to maybe look for.
dmesg | grep -iE 'nvidia|nvrm|agp|vga'
ls -l /dev/dri/* /dev/nvidia*
jetson-release
This produces a lot of useful looking information, but I wonder if it’s just using envp and jtop.
$ jetson_release -v
'DISPLAY' environment variable not set... skipping surface info
- NVIDIA Jetson UNKNOWN
* Jetpack UNKNOWN [L4T 34.1.1]
* NV Power Mode: MODE_15W_DESKTOP - Type: 7
* jetson_stats.service: active
- Board info:
* Type: UNKNOWN
* SOC Family: tegra194 - ID:
* Module: UNKNOWN - Board: P2822-0000
* Code Name: galen
* CUDA GPU architecture (ARCH_BIN): NONE
* Serial Number: 1421121018345
- Libraries:
* CUDA: NOT_INSTALLED
* cuDNN: NOT_INSTALLED
* TensorRT: NOT_INSTALLED
* Visionworks: NOT_INSTALLED
* OpenCV: 4.5.4 compiled CUDA: NO
* VPI: NOT_INSTALLED
* Vulkan: 1.3.203
- jetson-stats:
* Version 3.1.4
* Works on Python 3.8.10
jetsonUtilities
This can be a helpful diagnostic.
git clone https://github.com/jetsonhacks/jetsonUtilities
cd ./jetsonUtilities
./jetsonInfo.py
Here’s what you don’t want to see.
NVIDIA Jetson UNKNOWN
L4T 34.1.1 [ JetPack UNKNOWN ]
Ubuntu 20.04.4 LTS
Kernel Version: 5.10.65-tegra
'DISPLAY' environment variable not set... skipping surface info
CUDA NOT_INSTALLED
CUDA Architecture: NONE
OpenCV version: 4.5.4
OpenCV Cuda: NO
CUDNN: NOT_INSTALLED
TensorRT: NOT_INSTALLED
Vision Works: NOT_INSTALLED
VPI: NOT_INSTALLED
Vulcan: 1.3.203
GPU Compiler Version
Is nvcc
installed and working reasonably? Note this is probably not
in a default path.
$ /usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_11_23:44:05_PST_2021
Cuda compilation tools, release 11.4, V11.4.166
Build cuda_11.4.r11.4/compiler.30645359_0
deviceQuery
If that looks a little too tricky, here’s another approach to finding
out exactly what you have. This requires that you have a working
nvcc
but other than that, this procedure was quite painless,
produced a lot of good information, compiles on exotic architectures
(such as Nvidia’s own aarch64 Jetson products) and was in the helpful
form of an illustrative code example. I approve!
git clone https://github.com/NVIDIA/cuda-samples
cd cuda-samples/1_Utilities/deviceQuery
make
./deviceQuery
Here’s the kind of output you should be hoping to see.
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1050 Ti"
CUDA Driver Version / Runtime Version 11.2 / 11.2
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 4040 MBytes (4235919360 bytes)
(006) Multiprocessors, (128) CUDA Cores/MP: 768 CUDA Cores
GPU Max Clock rate: 1392 MHz (1.39 GHz)
Memory Clock rate: 3504 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.2, CUDA Runtime Version = 11.2, NumDevs = 1
Result = PASS
Kernel Module Checks
Obviously start with lsmod
and maybe lsmod | grep nv
to have a
look at what kernel modules are active. You might be able to learn
more with something like this.
$ modinfo /usr/lib/modules/5.10.65-tegra/kernel/drivers/gpu/nvgpu/nvgpu.ko | grep version
vermagic: 5.10.65-tegra SMP preempt mod_unload modversions aarch64
jetson-stats
This one seems to require a service, but it installed ok.
sudo pip3 install jetson-stats
sudo systemctl restart jetson_stats.service
jtop
It actually required a reboot but after that, yes, it is very nice! Kind of an htop sort of thing for the GPU cores.
Compiling
Using CUDA is pretty well behaved because it is easiest (required
maybe) to use nvcc
which seems to be a gcc wrapper that just
includes all the right stuff properly.
nvcc -O3 -arch sm_30 -lineinfo -DDEBUG -c kernel.cu
nvcc -O3 -arch sm_30 -lineinfo -DDEBUG -o x.naive_transpose kernel.o
Concurrency
Can concurrently do any of the following.
-
Compute
-
move data from host to device
-
move data from device to host
-
4-way concurrency would also have CPU involved
-
each thread can do basic 3-way so many more parallel concurrencies
This is serial (input, compute, output).
iiiiiiccccccoooooo
This is with concurrencies.
iiicccooo
iiicccooo
Nvidia offers a fancy visual profiler that does the visualizations quite nicely to optimize concurrency.
Organizational Structures
-
SM - Streaming Multiprocessors
-
Scalar processors or "cores" (32 or so out of maybe 512)
-
Shared memory
-
L1 data cache
-
Shared registers
-
Special Function Units
-
Clocks
-
-
blocks
-
warp
-
executed in parallel (SIMD)
-
contains 32 threads
-
-
threads
-
registers are 32bit
-
global memory is not really system wide global.
-
coalesce loads and stores
-
-
shared memory
-
the /cfs of GPUs
-
32 banks of 4 bytes
-
Needs
syncthreads()
-
-
banks
-
like a doorway to provide access to threads
-
two threads accessing one bank get serialized
-
best to get each thread accessing their own unique bank
-
-
streams
-
a queue of work
-
ordered list of commands
-
FIFO
-
multiple streams have no ordering between them
-
if not specified, goes to default stream, 0.
-
multistream programming needs >0 stream for async
-
-
kernel - is the callback like function that runs in the CUDA cores.
-
__global__ void mykernelfn(const int a, const int b){...}
-
kernel<<<blocks,threads,[smem[,stream]]>>>();
-
Examples
Using GPU 0: Tesla K80
Matrix size is 16000
Total memory required per matrix is 2048.000000 MB
Total time CPU is 1.255781 sec
Performance is 3.261715 GB/s
Total time GPU is 0.067238 sec
Performance is 60.918356 GB/s
Using GPU 0: Tesla K80
Matrix size is 16000
Total memory required per matrix is 2048.000000 MB
Total time CPU is 1.256058 sec
Performance is 3.260996 GB/s
Total time GPU is 0.035628 sec
Performance is 114.964519 GB/s
Machine Learning And Jetson
Nvidia is into machine learning in a big way. They have specialized products and dev kits.
Resources
-
https://github.com/dusty-nv/jetson-inference/
-
Hello AI World
-
Two Days to a Demo
-
Jetson Nano
-
Model P3448-0000
-
Gigabit ethernet
-
(x4) USB3.0 ports
-
4K HDMI and DisplayPort connector (groan)
-
MIPI CSI (Mobile Industry Processor Interface Camera Serial Interface) - listed as working with Raspberry Pi Camera Module V2
-
Dedicated UART header
-
40 pin header (GPIO, I2C, UART)
-
J48 jumper - connected means micro-usb2.0 jack operates in device mode, otherwise power supply
-
J40 jumpers - power, reset, etc
-
J15 PWM fan header
-
J18 M.2 Key E connector
Tools
sudo nvpmodel -q # Check active power mode.
tegrastats # Sort of a top for jetson. Includes power too.
GPIO
echo 38 > /sys/class/gpio/export # Map GPIO pin
echo out > /sys/class/gpio/gpio38/direction # Set direction
echo 1 > /sys/class/gpio/gpio38/value # Bit banging
echo 38 > /sys/class/gpio/unexport # Unmap GPIO
cat /sys/kernel/debug/gpio # Diagnostic
Video
Argus (libargus) = Nvidia’s library
12 CSI lanes.
nvarguscamerasrc
nvgstcapture # Camera view application
v4l2 puts video streams on /dev/video
-
nvhost-msenc
-
nvhost-nvdec
-
gstinspect