Compiling with CUDA support

Asked by Chris Tiee

I've recently developed an interest in GPU computing and I'm trying to get FEniCS to compile (via dorsal) with PETSc and CUSP as dorsal's "USAGE" file instructs. Running on OS X 10.8 (I used dorsal to install unstable versions which is the only thing that currently allows installation there). I set CUDA_DIR, but it still fails to find the nvcc compiler. If I specify the path to nvcc explicitly via a configuration file tweak, it says one of two things:

"CUDA version error: PETSC currently requires CUDA version 4.0 or higher - when compiling with CUDA"... but this is with CUDA 5.0!

Reading PETSc's docs, it doesn't explicitly say it supports CUDA 5.0, instead saying it is known to work with 4.1 and 4.2. When downgrading (I made sure with the nvcc --version command), I get the even less informative

"CUDA compiler you provided with -with-cudac=${CUDA_DIR}/bin/nvcc does not work"

A final note; I didn't see this issue mentioned at all: my laptop NVIDIA graphics chip only supports single-precision. I figure it is still worth it to try to have it as a separate build (since it should be sufficient for many applications anyway)... I tweaked things, such as --with-precision=single as an extra line in petsc.package, giving the nvcc compiler an extra flag. None of this seemed to make the slightest bit of difference (error messages exactly the same for 5.0 and 4.2 respectively). I'd be interested in knowing other people's experiences CUDA-izing their FEniCS...

Question information

Language:
English Edit question
Status:
Solved
For:
FEniCS Project Edit question
Assignee:
No assignee Edit question
Solved by:
Chris Tiee
Solved:
Last query:
Last reply:

This question was reopened

Revision history for this message
Anders Logg (logg) said :
#1

You might find some useful information here:

http://fenicsproject.org/pub/misc/documents/msc-thesis-valdmanis-2012.pdf

I've used CUDA 4.2 myself and haven't tried 5.0. I don't know if
anyone has tried this on Mac before.

--
Anders

On Sun, Nov 18, 2012 at 02:01:09PM -0000, Chris Tiee wrote:
> New question #214519 on FEniCS Project:
> https://answers.launchpad.net/fenics/+question/214519
>
> I've recently developed an interest in GPU computing and I'm trying to get FEniCS to compile (via dorsal) with PETSc and CSUP as the "USAGE" file instructs. Running on OS X 10.8 (I used dorsal to install unstable versions which is the only thing that currently allows installation there). I set CUDA_DIR, but it still fails to find the nvcc compiler. If I specify the path to nvcc explicitly via a configuration file tweak, it says one of two things:
>
> "CUDA version error: PETSC currently requires CUDA version 4.0 or higher - when compiling with CUDA"... but this is with CUDA 5.0!
>
> Reading PETSc's docs, it doesn't explicitly say it supports CUDA 5.0, instead saying it is known to work with 4.1 and 4.2. When downgrading (I made sure with the nvcc --version command), I get the even less informative
>
> "CUDA compiler you provided with -with-cudac=${CUDA_DIR}/bin/nvcc does not work"
>
> A final note; I didn't see this issue mentioned at all: my laptop NVIDIA graphics chip only supports single-precision. I figure it is still worth it to try to have it as a separate build (since it should be sufficient for many applications anyway)... I tweaked things, such as --with-precision=single as an extra line in petsc.package, giving the nvcc compiler an extra flag. None of this seemed to make the slightest bit of difference (error messages exactly the same for 5.0 and 4.2 respectively). I'd be interested in knowing other people's experiences CUDA-izing their FEniCS...
>

Revision history for this message
Felix Ospald (felix-ospald) said :
#2

The link is not working (Forbidden).

Am 19.11.2012 09:11, schrieb Anders Logg:
> Question #214519 on FEniCS Project changed:
> https://answers.launchpad.net/fenics/+question/214519
>
> Status: Open => Answered
>
> Anders Logg proposed the following answer:
> You might find some useful information here:
>
> http://fenicsproject.org/pub/misc/documents/msc-thesis-
> valdmanis-2012.pdf
>
> I've used CUDA 4.2 myself and haven't tried 5.0. I don't know if
> anyone has tried this on Mac before.
>
> --
> Anders
>
> On Sun, Nov 18, 2012 at 02:01:09PM -0000, Chris Tiee wrote:
>> New question #214519 on FEniCS Project:
>> https://answers.launchpad.net/fenics/+question/214519
>>
>> I've recently developed an interest in GPU computing and I'm trying to get FEniCS to compile (via dorsal) with PETSc and CSUP as the "USAGE" file instructs. Running on OS X 10.8 (I used dorsal to install unstable versions which is the only thing that currently allows installation there). I set CUDA_DIR, but it still fails to find the nvcc compiler. If I specify the path to nvcc explicitly via a configuration file tweak, it says one of two things:
>>
>> "CUDA version error: PETSC currently requires CUDA version 4.0 or higher - when compiling with CUDA"... but this is with CUDA 5.0!
>>
>> Reading PETSc's docs, it doesn't explicitly say it supports CUDA 5.0, instead saying it is known to work with 4.1 and 4.2. When downgrading (I made sure with the nvcc --version command), I get the even less informative
>>
>> "CUDA compiler you provided with -with-cudac=${CUDA_DIR}/bin/nvcc does not work"
>>
>> A final note; I didn't see this issue mentioned at all: my laptop NVIDIA graphics chip only supports single-precision. I figure it is still worth it to try to have it as a separate build (since it should be sufficient for many applications anyway)... I tweaked things, such as --with-precision=single as an extra line in petsc.package, giving the nvcc compiler an extra flag. None of this seemed to make the slightest bit of difference (error messages exactly the same for 5.0 and 4.2 respectively). I'd be interested in knowing other people's experiences CUDA-izing their FEniCS...
>>

--
Mit freundlichen Grüßen,

Felix Ospald

Revision history for this message
Anders Logg (logg) said :
#3

Fixed now.

--
Anders

On Mon, Nov 19, 2012 at 08:21:10AM -0000, Felix Ospald wrote:
> Question #214519 on FEniCS Project changed:
> https://answers.launchpad.net/fenics/+question/214519
>
> Felix Ospald proposed the following answer:
> The link is not working (Forbidden).
>
> Am 19.11.2012 09:11, schrieb Anders Logg:
> > Question #214519 on FEniCS Project changed:
> > https://answers.launchpad.net/fenics/+question/214519
> >
> > Status: Open => Answered
> >
> > Anders Logg proposed the following answer:
> > You might find some useful information here:
> >
> > http://fenicsproject.org/pub/misc/documents/msc-thesis-
> > valdmanis-2012.pdf
> >
> > I've used CUDA 4.2 myself and haven't tried 5.0. I don't know if
> > anyone has tried this on Mac before.
> >
> >
> > On Sun, Nov 18, 2012 at 02:01:09PM -0000, Chris Tiee wrote:
> >> New question #214519 on FEniCS Project:
> >> https://answers.launchpad.net/fenics/+question/214519
> >>
> >> I've recently developed an interest in GPU computing and I'm trying to get FEniCS to compile (via dorsal) with PETSc and CSUP as the "USAGE" file instructs. Running on OS X 10.8 (I used dorsal to install unstable versions which is the only thing that currently allows installation there). I set CUDA_DIR, but it still fails to find the nvcc compiler. If I specify the path to nvcc explicitly via a configuration file tweak, it says one of two things:
> >>
> >> "CUDA version error: PETSC currently requires CUDA version 4.0 or higher - when compiling with CUDA"... but this is with CUDA 5.0!
> >>
> >> Reading PETSc's docs, it doesn't explicitly say it supports CUDA 5.0, instead saying it is known to work with 4.1 and 4.2. When downgrading (I made sure with the nvcc --version command), I get the even less informative
> >>
> >> "CUDA compiler you provided with -with-cudac=${CUDA_DIR}/bin/nvcc does not work"
> >>
> >> A final note; I didn't see this issue mentioned at all: my laptop NVIDIA graphics chip only supports single-precision. I figure it is still worth it to try to have it as a separate build (since it should be sufficient for many applications anyway)... I tweaked things, such as --with-precision=single as an extra line in petsc.package, giving the nvcc compiler an extra flag. None of this seemed to make the slightest bit of difference (error messages exactly the same for 5.0 and 4.2 respectively). I'd be interested in knowing other people's experiences CUDA-izing their FEniCS...
> >>
>
>

Revision history for this message
Chris Tiee (choni0281) said :
#4

Thanks, looks pretty extensive.. I'll take a look...

Revision history for this message
Chris Tiee (choni0281) said :
#5

A bit more sleuthing and learning a whole lot about Python along the way, I think I've gotten it. First off, I can't get it to work in single precision... a whole slew of packages need it, and there's several compile errors that simply are straight off "cannot convert double to PetscComplex" which seem to point at this being a wild goose chase. But I did manage to get my hands on a computer with better GPU compute capability and so it is definitely worthwhile to report the findings.

The "at least version 4.0" issue was actually a problem caused by the testing code being unable to find various CUDA libraries. It tests the version of CUDA by trying to compile and run a test program--a program that simply tests the inequality CUDA_VERSION < 4.0. But the program itself fails to link, so the test program never even executes! The python script does not distinguish between the error modes (successful run but too old a version, or failure to run entirely).

Anyway, the fix is simple: add the line in your platform file (/path/to/dorsal-core/main/FEniCS/platforms/contributed/my.custom.platform)

export DYLD_LIBRARY_PATH=${CUDA_DIR}/lib:${DYLD_LIBRARY_PATH}

This works, as far as I can tell, with CUDA 5.0--so long as you clean out all your old CUDA files entirely (I did a rm -rf /usr/local/cuda), because, at least in my situation, installing an older version over a newer one messed up a lot of things-- the "Does not work" error arises from it being unable to find nvopencc, which is in turn caused by all the symlinks getting messed up. So moral, really scrub that clean.

Next, I had to add --with-cudac=\"${CUDA_DIR}/bin/nvcc -m64\" to the FEniCS/packages/petsc.package file (yes, with the backslashes), as well as --with-cuda-arch=sm_13 . Or whatever, if you've got better compute capability--it allows as high as sm_21. The problem with simply using "nvcc" instead of "nvcc -m64" is because it assumes 32-bit, which is incompatible with my 64-bit system.

HOWEVER, I found that I did have to (as per PETSc's own recommendations on their site!) use petsc-dev instead of petsc-3.3-p3, because that would fail to compile due to some mismatch between "extern" and "static" variables. This I managed by changing the file name in FEniCS/packages/petsc.package. I also had to download it manually and stick petsc-dev.tar.gz into the build directory to fool dorsal into not downloading from the site indicated in the package file, because I get a classic HTTP 404 error in that case. The compile goes through. I can't get SLEPc to compile, however. Presumably I need to download the development version of that, too, but I've simply commented it out in my platform file for now. Things are still building now as we speak, so I'll be back with my final word (and whether I get the fabled speedup!) soon.

Revision history for this message
Chris Tiee (choni0281) said :
#6

Ahh, no dice! But here's the rest of the story. So, when building dolfin, the PETSC_TEST_RUNS fail. In order to make sure that old files didn't get in the way, I rebuilt everything from scratch, deleting the whole project directory. petsc-3.3p3 compiles here without complaint. The whole project will build properly without cusp, and trashing and rebuilding yet again gives the error again, so I think I've isolated the problem.

The CMakeError.log is at http://math.ucsd.edu/~ctiee/CMakeError.log.txt . It would seem that it can't find where it installed HYPRE, even though this works fine when compiling without cusp. Also, it can't find Trilinos for some reason, regardless of whether the compile is with cusp or not--even though my working configuration on which this "experimental" build is based on compiled fine with it. This could be due to it downloading latest dolfin revisions.

One more thing... I wonder if the fact that PETSC seems to insist on being built by clang has something to do with it (it says ignoring environmental variable CC and CXX which builds everything else using my system using system gcc and g++, the llvm-4.2 one installed by Xcode).

Chris

Revision history for this message
Chris Tiee (choni0281) said :
#7

(repeating the same trick, that is, adding the library path manually to DYLD_LIBRARY_PATH, didn't help ..)

Revision history for this message
Johannes Ring (johannr) said :
#8

Building PETSc with shared libraries (--with-shared-libraries=1) will probably help.

Revision history for this message
Chris Tiee (choni0281) said :
#9

Interesting. I do have that option set (it was unchanged from my working installation), but nevertheless the .dylib file is missing. I notice that the .dylib does get built when I configure it without cusp. This also explains a lot, actually (even if there's still a problem), as trying to install over my older version would compile correctly but crash when run (because it was using the old .dylib file which I'd assumed would simply get replaced).

Revision history for this message
Chris Tiee (choni0281) said :
#10

Ok. So comparing different installs (re-ran without cusp) the major difference is that PETSc builds using "libfast" in the make tree (which seems to skip a lot of steps and be what you'd do to upgrade it--it does this even when creating everything from scratch) with cusp, but without it, it builds normally (complete with the percentage complete indicators). Unfortunately, this doesn't seem to be anything configurable and I can't seem to find where this decision is being made (too many python scripts, makefiles, cmake files, generated code....)

Revision history for this message
Chris Tiee (choni0281) said :
#11

Asked the question in the petsc-dev mailing list. The libfast build is simply the legacy build that doesn't use CMake. Will update. If I finally get this working, do I get to have a contributed .platform file? =)

Revision history for this message
Chris Tiee (choni0281) said :
#12

petsc-dev reports: running make with CUDA only works with one process, and no cmake ("make -j n all" will fail to build a shared library for n > 1). I fixed it by modifying petsc(-dev).package to add OLDPROCS=${PROCS} at the start of the file. Then add PROCS=1 in the "if [ "${CUDA_DIR}"] && .. " branch, and finally restore it in the package_specific_register(): PROCS=${OLDPROCS}.

With all that, it builds correctly. Running it still crashes, but I see from the crash report that this is specifically a problem with cusp/thrust. At the end, I've included the relevant part of the crash log (just in case anyone who is better than I am at deciphering ridiculous error messages associated with templates wants to see it, LOL). Otherwise, though, I think it's solved on the FEniCS side, so I'm going to reply this as "Problem Solved":

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libpetsc.dylib 0x000000010f733453 void cusp::detail::device::spmv_csr_vector<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>, double>(cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag> const&, double const*, double*) + 7
1 libpetsc.dylib 0x000000010f733525 void cusp::detail::device::multiply<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag> >(cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag> const&, cusp::array1d<double, thrust::detail::cuda_device_space_tag> const&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&, cusp::csr_format, cusp::array1d_format, cusp::array1d_format) + 127
2 libpetsc.dylib 0x000000010f61088b void cusp::detail::device::multiply<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag> >(cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag> const&, cusp::array1d<double, thrust::detail::cuda_device_space_tag> const&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&) + 25
3 libpetsc.dylib 0x000000010f73354b void cusp::detail::dispatch::multiply<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag> >(cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag> const&, cusp::array1d<double, thrust::detail::cuda_device_space_tag> const&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&, thrust::detail::cuda_device_space_tag, thrust::detail::cuda_device_space_tag, thrust::detail::cuda_device_space_tag) + 9
4 libpetsc.dylib 0x000000010f6108b1 void cusp::detail::multiply<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag> >(cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&, cusp::known_format) + 25
5 libpetsc.dylib 0x000000010f733568 void cusp::multiply<cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag>, cusp::array1d<double, thrust::detail::cuda_device_space_tag> >(cusp::csr_matrix<int, double, thrust::detail::cuda_device_space_tag>&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&, cusp::array1d<double, thrust::detail::cuda_device_space_tag>&) + 16
6 libpetsc.dylib 0x000000010f60d53a MatMult_SeqAIJCUSP(_p_Mat*, _p_Vec*, _p_Vec*) + 637