CUDA-Linux下计时器-GPU信息-Device函数修饰词

CUDA 中修饰函数的三个修饰词

__global__ : 此函数由CPU调用，在GPU端执行。可调用自身或者两一个global函数。

__host__: 此函数由CPU调用，在CPU端执行。一般默认省略。在CPU端只能调用global函数，不能调用device函数。

__device__ : 此函数由GPU调用，在GPU端执行。只能由global函数或device函数调用。可调用device函数。

实例

//add 1 for each element in the vector. 
//__device__ functions can be called by __gloable__  functions
__device__ void addOne(float& z){
	z += 1;
}

// add yourself to you 
//__device__fucntions can be called by __device__ functions
__device__ void addSelf(float& z){
	z += z;
	addOne(z);
}

// 1) add __global__ to kernel, AKA device code
__global__ void add(const float* x, const float* y, float* z){  
	int tid = threadIdx.x + blockIdx.x * blockDim.x;

	if (tid < N){
		z[tid] = x[tid] + y[tid];
		addSelf(z[tid]);
	}
}

体会`Think Parallel`

在CPU端，只能调用add()。在add() 函数中，对于每一个线程，除了元素求和之外，还调用了addSelf() 函数。因为addSelf() 由__device__修饰，所以可以被add() 函数调用。

在addSelf() 函数中，每个元素自己加上自己，后调用了另一个__device__函数： addOne()：元素加一。

Linux 下的计时器

在<sys/time.h>中：

#include <sys/time.h>
#include <stdio.h>

double cpuSecond() {
	struct timeval tp;
	gettimeofday(&tp,NULL);
	return ((double)tp.tv_sec + (double)tp.tv_usec*1.e-6);
}


int main(){
	double iStart = cpuSecond();
    // Do what ever you want here
	double iElaps = cpuSecond() - iStart;
	printf("time: %.10f \n", iElaps);
}

获得当前使用GPU的信息

这应当是写CUDA code的第一步，了解你所用工具的基本信息。

当机器由不止一个GPU时，需要知道当前由多少个GPU，默认使用哪一个，指定使用哪一个。

可使用（CUDA-enabled）的GPU个数: cudaGetDeviceCount()

1
2
3

int deviceCount = 0;
cudaError_t error_id = cudaGetDeviceCount(&deviceCount);
printf("Device number: %d\n", deviceCount);

GPU个数存在deviceCount中，此时可以使用循环来打印各个GPU的信息：

for (dev = 0; dev < deviceCount; ++dev) {  
	cudaSetDevice(dev);            // 制定使用索引为dev的GPU
	cudaDeviceProp deviceProp;      // 创建一个property对象
	cudaGetDeviceProperties(&deviceProp, dev);   //得到这个GPU的property

	printf("\nDevice %d: \"%s\"\n", dev, deviceProp.name);
	cudaDriverGetVersion(&driverVersion);
	cudaRuntimeGetVersion(&runtimeVersion);

	printf("  CUDA Driver Version / Runtime Version          %d.%d / %d.%d\n", 
		driverVersion / 1000, (driverVersion % 100) / 10, runtimeVersion / 1000, (runtimeVersion % 100) / 10);
	printf("  CUDA Capability Major/Minor version number:    %d.%d\n", 
		deviceProp.major, deviceProp.minor);

}

当前使用哪一个GPU: cudaGetDevice()

// the device that is currently used
void setupDevice(){
	int dev;
	cudaGetDevice(&dev);
	cudaDeviceProp prop;
	cudaGetDeviceProperties(&prop, dev);
	printf("\nDevice name %d: %s \n", dev, prop.name);
}

制定使用哪个GPU: cudaSetDevice()

1 2	int dev = 2; cudaSetDevice(dev); // 使用索引为2 的GPU