Setup CUDA 3.1 on Ubuntu 10.04 with nVidia Geforce 8400 GS card

2010.10.12 20:03
System Description
OS: Ubuntu 10.04 64-bit
Graphic card: nVidia Geforce 8400 GS

Find out which video card you have
># lshw -c video

Install nVidia driver
># apt-get install build-essential linux-headers-2.6.32-25-server linux-source-2.6.32
Instead of using, download the lastest nVidia driver
># ./
* Make sure X Windows is running after the reboot

Make sure PATH and LD_LIBRARY_PATH is set correctly
># vim ~/.bashrc
 export PATH=$PATH:/usr/local/cuda/bin
># source ~/.bashrc

># vim /etc/
 # libc default configuration
># ldconfig

Install cuda toolkit under /usr/local/cuda
># ./

Install SDK as a regular user (ykoh)
ykoh@punky:~$ ./

Run a CUDA sample program and check whether it works.
ykoh@punky:~$ cd ~/NVIDIA_GPU_Computing_SDK/C
ykoh@punky:~$ make
* Failed to compile some CUDA programs, let's debug one at a time.
See the next section.

* Set "verbose := 1" at the beginning of the C/common/ file
># vi C/common/
 SM_VERSIONS   := 10 11 12 13 20
verbose := 1

1. /usr/bin/ld: cannot find -lGLU
Quick fix, copy to /usr/local/cuda/lib64
># ldconfig -p | grep GLU (libc6,x86-64) => /usr/lib/
># cp /usr/lib/ /usr/local/cuda/lib64/
># make

2. /usr/bin/ld: cannot find -lX11
># ldconfig -p | grep X11 (libc6,x86-64) => /usr/lib/
># cp /usr/lib/ /usr/local/cuda/lib64/

* You may find similiar errors, just fix them like how it done above.
/usr/bin/ld: cannot find -lXi
/usr/bin/ld: cannot find -lXmu
/usr/bin/ld: cannot find -lGL

3. /usr/bin/ld: cannot find -lglut
># ldconfig -p | grep glut
># apt-get install libglut3-dev

Run matrixMul
># cd ~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release
># ./matrixMul

We need to load the X windows with nvidia driver
* Make sure that nvidia driver is loaded correctly
># startx
># lsmod | grep nvidia
nvidia              11086868  28

># ./matrixMul

저작자 표시

Frank kenshin579 programming/cuda cuda 3.1, Setup

How to upgrade CUDPP library on CUDA 2.3

2010.01.14 15:37
I couldn't figure out which version CUDPP library is installed on CUDA 2.3 so I'm just going to upgrade to the latest version.

Compiling the latest version of CUDPP library
># wget
># tar -xzvf cudpp_src_1.1.tar.gz
># cd cuddp_1.1 && make
># cd ../common
># make; make dbg=1; make emu=1; make emu=1 dbg=1

Integerated with current installed CUDA 2.3
># cd NVIDIA_GPU_SDK_2.3/C/common
># find . -name '*cudpp*'

># cd NVIDIA_GPU_SDK_2.3/C/src/cudpp_1.1
># find . -name '*cudpp*'
># cp -p ./lib/* $HOME/NVIDIA_GPU_SDK_2.3/C/common/lib/linux/
># cp -p ./cudpp/include/cudpp.h $HOME/NVIDIA_GPU_SDK_2.3/C/common/inc/cudpp/

Just replace with the compiled version and test the CUDPP library by compiling the following sample code.

1. cat cudpp_1.1/cudpp/include/cudpp.h

저작자 표시

Frank kenshin579 programming/cuda CUDA 2.3, CUDPP 1.1

Compile in DEBUG mode

2010.01.04 13:36
If you have only file to compile, then this comand should be enough.
># nvcc -g -G -o bitreverse

If you have additional files to compile, then it's better to edit the Makefile
># vim
ifeq ($(dbg),1)
  COMMONFLAGS += -g -G <-- It seems -G optional is missing.
># vim Makefile

# Rules and targets

># make
># cd ~/NVIDIA_GPU_SDK_2.3/C/bin/linux/debug
># cuda-gdb

저작자 표시

Frank kenshin579 programming/cuda CUDA, dbg, Debug

How to compile CUDPP in Visual Studio 2005

2009.12.29 21:52
If you want to compile a cuda program with CUDPP library like the following program.
You need to do the following:
1. Select the Properties of the project
2. Add the cudpp32d_emu.lib


#include "cudpp/cudpp.h"
// Initialize scan
CUDPPConfiguration scanConfig;
scanConfig.algorithm = CUDPP_SCAN;
scanConfig.datatype  = CUDPP_UINT;
scanConfig.op        = CUDPP_ADD;
cudppPlan(&mScanPlan, scanConfig, N, 1, 0);

cudppScan(mScanPlan, d_ovalues, d_ivalues, N);

cudaMemcpy(h_valuesSorted, d_ovalues, N * sizeof(uint), cudaMemcpyDeviceToHost);
uint *reference = (uint *)malloc(N * sizeof(uint));
computeGold(reference, h_values, N);


I still can't figure out how to create CUDA project from scratch in Visual Studio 2005.
CUDPPHandle   mScanPlan;        // CUDPP plan handle for prefix sum

In Linux, simply add the one line to Makefile file.
># vim Makefile
EXECUTABLE  := radixSort
# Cuda source files (compiled with nvcc)
# C++ source files (compiled with gcc)
CCFILES   := testradixsort.cpp radixsort.cpp

저작자 표시

Frank kenshin579 programming/cuda CUDA, CUDPP


2009.12.28 18:41
This function is important for anyone who is launching a kernel many times (example: from a for loop). This is because a CUDA kernel launch is asynchronous, and returns immediately. This means that your CPU side for loop will finish in an instant and try to launch everything at once.

Calling cudaThreadSynchronize() will make the CPU wait till all previously launched kernels terminate.

Then, when and where do I have to use this fuction, cudaThreadSynchronize?
Simply call cudaThreadSynchronize() before measuring the end time.

  uint hTimer;
  uint hTimer2;


  printf("Allocating and initializing CUDA arrays...\n");
  CUDA_SAFE_CALL(cudaMalloc((void**)&dvalues, sizeof(uint) * N));
  CUDA_SAFE_CALL(cudaMemcpy(dvalues, values, sizeof(uint) * N, cudaMemcpyHostToDevice));
  cutilSafeCall( cudaThreadSynchronize() );


  printf("Running GPU bitonic sort...\n");
  int threadCount = 512;
  int blockCount = N / threadCount;

  int numKernelLaunch = 1;
  // bitonicSortBlock1
  bitonicSortBlock1<<<blockCount, threadCount>>>(dvalues, threadCount);
  cutilSafeCall( cudaThreadSynchronize() );
  printf("Average time: %f ms\n", cutGetTimerValue(hTimer));

  for (int size = 2 * threadCount; size <= N; size <<= 1) {
    for (int stride = size / 2; stride > 0; stride >>= 1) {
        bitonicSort<<<blockCount, threadCount>>>(dvalues, size, stride);
  cutilSafeCall( cudaThreadSynchronize() );

Here are different types of sync functions:
In order to reduce unwanted CPU utilization, the following APIs have been modified to yield the CPU when the device is busy.
- cuCtxSynchronize
- cuEventSynchronize
- cuStreamSynchronize
- cudaThreadSynchronize
- cudaEventSynchronize
- cudaStreamSynchroniz
저작자 표시

'programming > cuda' 카테고리의 다른 글

Compile in DEBUG mode  (0) 2010.01.04
How to compile CUDPP in Visual Studio 2005  (0) 2009.12.29
cudaThreadSynchronize()  (0) 2009.12.28
How to see the intermediate file (PTX) for CUDA program  (0) 2009.12.17
Enable Highlight on Visual Studio for CUDA  (0) 2009.11.26
CUDA Visual Profiler  (0) 2009.11.11

Frank kenshin579 programming/cuda CUDA, cudaThreadSynchronize

How to see the intermediate file (PTX) for CUDA program

2009.12.17 18:13
># make clean
># make keep=1

># ls

># vim bitonic.ptx

저작자 표시

Frank kenshin579 programming/cuda CUDA, Intermediate file

Enable Highlight on Visual Studio for CUDA

2009.11.26 17:36
Open up Visual Studio
Tools > Options > Text Editor > File Extension >
Add cu extension like the following

저작자 표시

Frank kenshin579 programming/cuda CUDA, Visual Studio

CUDA Visual Profiler

2009.11.11 11:34
GPU 용 Visual Profiler

쿠다 비주얼 프로파일러(CUDA Visual Profiler)는 GPU상에서 가동되는 C 어플리케이션을 프로파일링하는 그래픽 툴이다. 일반적으로 어플리케이션 성능 튜닝은 어플리케이션 프로파일링을 한 후 코드 수정 단계로 진행되며, 쿠다 비주얼 프로파일러 최신 버전은 메모리 트랜스액션을 위한 메트릭스(metrics)를 포함하고 있어, 성능 향상을 위해 튜닝할 때 가장 중요한 부분 중 하나를 개발자들이 시각적으로 확인할 수 있도록 한다.

$ cd /usr/local/cuda/cudaprof/bin
$ cudaprof &

Profiler Output Windows

GPU Time Summary Plot

GPU Time Height Plot과 GPU Time Width Plot의 차이점은??
I'm not quite sure for the moment.

- gld : global load
- glst : global store
NOTE: CUDA profiler counts 4x as many coalesed writes as it did reads. GLD gets incremented by 1 when 32/64/128B glod request is sent
but the GST gets incremented by 2 for 32B, 4 for 64B and 8 for 128B requests.
- warp serialize : state when warps cannot be scheduled indepedently of each other due to application flow constraints.
- occupancy : the ratio between the warps actually scheduled to the max. warps that could concurrently be scheduled on one
multiporcessor if no register or shared memory usage contraints would exist.
- CTA : an array of threads that execute a kernel concurrently or in parallel

1. 엔비디아 쿠다(CUDA),
2. NVIDA CUda Best Practices Guide 2.3,
3. /usr/local/cuda/cuda/cudaprof/doc/cudaprof.html

저작자 표시

Frank kenshin579 programming/cuda CUDA, visual profiler

Installing CUDA on Ubuntu 8.04/9.04 with GeForce 9600 GT

2009.10.29 16:47
$ sudo lshw -C video
$ sudo ./
$ sudo ./
$ sudo ./
$ cd $HOME/NVIDIA_GPU_Computing_SDK/C
$ cd make
$ cd bin/linux/release
$ ./matrixMul

NOTE: I don't understand why I cannot run a cuda program even though nvidia driver is loaded.
It seems that another driver is loaded when x windows starts. I can't figure what it is.
Does anyone know?

$ sudo modprobe nvidia
$ lsmod | grep nvidia
nvidia               9618728  0
i2c_core               28128  1 nvidia

$ cd ~/NVIDIA_CUDA_SDK/bin/linux/release
$ ./matrixMul
cudaSafeCall() Runtime API error in file <>, line 108 : no CUDA-capable device is available.
$ startx
$ ./matrixMul
Processing time: 0.107000 (ms)

Press ENTER to exit...

저작자 표시

Frank kenshin579 programming/cuda CUDA, nVidia, Setup

Setting up CUDA (emulation mode) on Ubuntu 8.10

2009.10.28 22:20

To install CUDA, you need to install the following three packages [1]
1. CUDA Driver (if you actually have nvidia card)
2. CUDA Toolkit
3. CUDA SDK code samples

Here is what you have to do to run the CUDA emulator on your linux machine.
Those people who have the NVidia graphica card, please refer to the guideline for more info[2].
OS: Ubuntu 8.10
Kernel: Linux ubuntu 2.6.27-14-server #1 SMP

$ sudo apt-get install build-essential
$ sudo apt-get install libglu1-mesa-dev freeglut3 freeglut3-dev libxmu-dev kernel-package libxi-dev libglut3-dev

$ sudo ./
$ sudo ./

$ vim ~/.bashrc
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib
$ . ~/.bashrc

$ cd $HOME/NVIDIA_GPU_Computing_SDK/C
$ make emu=1
$ cd bin/linux/emurelease/

NOTE: to compile in debug mode
$ make dbg=1
$ cd bin/linux/debug

For Windows machine, you need to have Visual Studio 2005 to compile source code.
It should be pretty straightforward to install the CUDA on Windows.
Please refer to the Windows Guideline for more information[5].


1. Download the packages,
2. CUDA_Getting_Started_2.3_Linux,
3. NVIDIA_CUDA_Programming_Guide_2.3,
4. Testcode,
5. CUDA_Getting_Started_2.3_Windows,
저작자 표시

Frank kenshin579 programming/cuda CUDA, emulation