multithreading help, GC object already tracked

Asked by Nick Davies

Hi everyone, I have tried to implement multithreading in a couple of different ways.
First, ubuntu 12.04, only change to a code that works is adding parameters["num_threads"] = 4 This gives a bunch of
 *** Warning: Form::coloring does not properly consider form type. errors (which from what I have read these are a reminder to the dev team about something)
followed by
Fatal Python error: GC object already tracked

second attempt: arch, same code as above, gives same result

Third arch with the line
parameters["graph_coloring_library"] = "Zoltan"
This gives an error at the solve() line about not being able to find the function color() or something similar. (sorry about the lack of description, that computer is working at the moment and I forgot to copy the error, but as soon as it stops I will post it) I cant try this in ubuntu as its not compiled with trilinos

GC object already tracked seams to be a fairly generic error, but after reading about it im still not 100% sure what it means or how to debug it. I know multithreading is experimental but is this a known problem or a common mistake in codes that cause it?

Also, I am hoping to use PaStiX for multithreading as well, but have no idea where to start, in the book it eludes to this being doable, but has it been done? Are there any resources around that could help?

Thanks for any help

Question information

Language:
English Edit question
Status:
Answered
For:
FEniCS Project Edit question
Assignee:
No assignee Edit question
Last query:
Last reply:
Revision history for this message
Nick Davies (ntd14) said :
#1

Sorry it for the late update, here is the output when zoltan is used:

Solving nonlinear variational problem.
  *** Warning: Form::coloring does not properly consider form type.
  Coloring mesh.
ZOLTAN Parameter COLORING_PROBLEM = distance-1
ZOLTAN Parameter SUPERSTEP_SIZE = 1000
ZOLTAN Parameter COMM_PATTERN = S
ZOLTAN Parameter VERTEX_VISIT_ORDER = I
ZOLTAN Parameter COLORING_METHOD = F
ZOLTAN Parameter RECOLORING_TYPE = SYNCHRONOUS
ZOLTAN Parameter RECOLORING_PERMUTATION = NONDECREASING
ZOLTAN Parameter RECOLORING_NUM_OF_ITERATIONS = 0
ZOLTAN Parameter GRAPH_SYMMETRIZE = NONE
ZOLTAN Parameter GRAPH_SYM_WEIGHT = ADD
ZOLTAN Parameter GRAPH_BIPARTITE_TYPE = OBJ
ZOLTAN Parameter GRAPH_BUILD_TYPE = NORMAL
ZOLTAN Parameter GRAPH_FAST_BUILD_BASE = 0
Traceback (most recent call last):
  File "/home/nick/Dropbox/python_codes/isotropic_test.py", line 678, in <module>
    mesh, u_weight = growth()
  File "/home/nick/Dropbox/python_codes/isotropic_test.py", line 253, in growth
    form_compiler_parameters=ffc_options)
  File "/usr/lib/python2.7/site-packages/dolfin/fem/solving.py", line 268, in solve
    _solve_varproblem(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/dolfin/fem/solving.py", line 316, in _solve_varproblem
    solver.solve()
RuntimeError:

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
*** using the information listed below, you can ask for help at
***
*** <email address hidden>
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error: Unable to complete call to function color().
*** Reason: Assertion color < num_colors failed.
*** Where: This error was encountered inside /home/nick/dolfin-git/src/dolfin/dolfin/mesh/MeshColoring.cpp (line 91).
*** Process: 0
*** -------------------------------------------------------------------------

A second update, I am no longer getting the garbage collection error anymore, now maybe a mem race issue? This is the output from using the default settings:

Solving nonlinear variational problem.
  *** Warning: Form::coloring does not properly consider form type.
  Coloring mesh.
  *** Warning: Form::coloring does not properly consider form type.
  *** Warning: Form::coloring does not properly consider form type.
[0]PETSC ERROR: PetscMallocValidate: error detected at PetscOptionsEnd_Private() line 475 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/objects/aoptions.c
[0]PETSC ERROR: Memory at address 0x2d97870 is corrupted
[0]PETSC ERROR: Probably write past beginning or end of array
[0]PETSC ERROR: Last intact block allocated in PetscIntStackCreate() line 177 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/plog/utils/stack.c
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Memory corruption!
[0]PETSC ERROR: !
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Unknown Name on a arch-linu named localhost by nick Wed May 8 11:11:37 2013
[0]PETSC ERROR: Libraries linked from /home/nick/petsc/pkg/petsc/opt/petsc/arch-linux2-cxx-opt/lib
[0]PETSC ERROR: Configure run at Tue May 7 10:20:39 2013
[0]PETSC ERROR: Configure options --prefix=/home/nick/petsc/pkg/petsc/opt/petsc/arch-linux2-cxx-opt --PETSC_ARCH=arch-linux2-cxx-opt --with-shared-libraries=1 --with-clanguage=cxx --with-blacs-lib=/usr/lib/libscalapack.so --with-blacs-include=/usr/include --with-scalapack-lib=/usr/lib/libscalapack.so --with-scalapack-include=/usr/include --with-ptscotch-lib="[/usr/lib/libscotcherr,/usr/lib/libscotch,/usr/lib/libscotcherrexit,/usr/lib/libptscotcherr,/usr/lib/libptscotch,/usr/lib/libptscotcherrexit]" --with-ptscotch-include=/usr/include --download-hypre --download-superlu --download-metis --download-parmetis --with-boost-dir=/usr
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: PetscMallocValidate() line 138 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/memory/mtr.c
[0]PETSC ERROR: PetscOptionsEnd_Private() line 475 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/objects/aoptions.c
[0]PETSC ERROR: VecView_Private() line 288 in /home/nick/petsc/src/petsc-3.3-p6/src/vec/vec/interface/vector.c
[0]PETSC ERROR: VecAssemblyEnd() line 343 in /home/nick/petsc/src/petsc-3.3-p6/src/vec/vec/interface/vector.c
[0]PETSC ERROR: PetscMallocValidate: error detected at PetscOptionsEnd_Private() line 475 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/objects/aoptions.c
[0]PETSC ERROR: Memory at address 0x2d97870 is corrupted
[0]PETSC ERROR: Probably write past beginning or end of array
[0]PETSC ERROR: Last intact block allocated in PetscIntStackCreate() line 177 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/plog/utils/stack.c
[0]PETSC ERROR: --------------------- Error Message ------------------------------------
[0]PETSC ERROR: Memory corruption!
[0]PETSC ERROR: !
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013
[0]PETSC ERROR: See docs/changes/index.html for recent updates.
[0]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
[0]PETSC ERROR: See docs/index.html for manual pages.
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Unknown Name on a arch-linu named localhost by nick Wed May 8 11:11:37 2013
[0]PETSC ERROR: Libraries linked from /home/nick/petsc/pkg/petsc/opt/petsc/arch-linux2-cxx-opt/lib
[0]PETSC ERROR: Configure run at Tue May 7 10:20:39 2013
[0]PETSC ERROR: Configure options --prefix=/home/nick/petsc/pkg/petsc/opt/petsc/arch-linux2-cxx-opt --PETSC_ARCH=arch-linux2-cxx-opt --with-shared-libraries=1 --with-clanguage=cxx --with-blacs-lib=/usr/lib/libscalapack.so --with-blacs-include=/usr/include --with-scalapack-lib=/usr/lib/libscalapack.so --with-scalapack-include=/usr/include --with-ptscotch-lib="[/usr/lib/libscotcherr,/usr/lib/libscotch,/usr/lib/libscotcherrexit,/usr/lib/libptscotcherr,/usr/lib/libptscotch,/usr/lib/libptscotcherrexit]" --with-ptscotch-include=/usr/include --download-hypre --download-superlu --download-metis --download-parmetis --with-boost-dir=/usr
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: PetscMallocValidate() line 138 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/memory/mtr.c
[0]PETSC ERROR: PetscOptionsEnd_Private() line 475 in /home/nick/petsc/src/petsc-3.3-p6/src/sys/objects/aoptions.c
[0]PETSC ERROR: VecView_Private() line 288 in /home/nick/petsc/src/petsc-3.3-p6/src/vec/vec/interface/vector.c
[0]PETSC ERROR: VecAssemblyEnd() line 343 in /home/nick/petsc/src/petsc-3.3-p6/src/vec/vec/interface/vector.c
  Newton iteration 1: r (abs) = 1.409e-06 (tol = 1.000e-10) r (rel) = 7.622e-11 (tol = 1.000e-09)
  Newton solver finished in 1 iterations and 1 linear solver iterations.

Any ideas?

Thanks everyone

Revision history for this message
Nick Davies (ntd14) said :
#2

But the garbage collection is still all screwy when using Epetra:

Solving nonlinear variational problem.
  *** Warning: Form::coloring does not properly consider form type.
  Coloring mesh.
Fatal Python error: GC object already tracked
[localhost:01437] *** Process received signal ***
[localhost:01437] Signal: Aborted (6)
[localhost:01437] Signal code: (-6)
[localhost:01437] [ 0] /usr/lib/libpthread.so.0(+0xf0e0) [0x7f024f0d30e0]
[localhost:01437] [ 1] /usr/lib/libc.so.6(gsignal+0x39) [0x7f024ed4c1c9]
[localhost:01437] [ 2] /usr/lib/libc.so.6(abort+0x148) [0x7f024ed4d5c8]
[localhost:01437] [ 3] /usr/lib/libpython2.7.so.1.0(+0xf82fe) [0x7f024f3d82fe]
[localhost:01437] [ 4] /usr/lib/libpython2.7.so.1.0(PyType_GenericAlloc+0x97) [0x7f024f3787e7]
[localhost:01437] [ 5] /usr/lib/libpython2.7.so.1.0(+0x9c999) [0x7f024f37c999]
[localhost:01437] [ 6] /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f024f329c13]
[localhost:01437] [ 7] /usr/lib/python2.7/site-packages/dolfin/cpp/_function.so(+0x2399c) [0x7f022588799c]
[localhost:01437] [ 8] /usr/lib/python2.7/site-packages/dolfin/cpp/_function.so(_ZNK23SwigDirector_Expression4evalERN6dolfin5ArrayIdEERKS2_RKN3ufc4cellE+0x129) [0x7f02258890c9]
[localhost:01437] [ 9] /usr/lib/libdolfin.so.1.2(_ZNK6dolfin15GenericFunction8evaluateEPdPKdRKN3ufc4cellE+0x6e) [0x7f0240ea293e]
[localhost:01437] [10] /home/nick/.instant/cache/instant_module_823e490a08f046ecfde13b33dbc63f083ca7e008/_instant_module_823e490a08f046ecfde13b33dbc63f083ca7e008.so(_ZNK66ffc_form_1527181c5d2f58379ded1dc058a8f9a7084bb783_finite_element_213evaluate_dofsEPdRKN3ufc8functionEPKdiRKNS1_4cellE+0x56) [0x7f021abfc9c6]
[localhost:01437] [11] /usr/lib/libdolfin.so.1.2(_ZNK6dolfin15GenericFunction24restrict_as_ufc_functionEPdRKNS_13FiniteElementERKNS_4CellERKN3ufc4cellE+0x46) [0x7f0240ea2d46]
[localhost:01437] [12] /usr/lib/libdolfin.so.1.2(_ZN6dolfin3UFC6updateERKNS_4CellE+0x8e) [0x7f0240c5e4be]
[localhost:01437] [13] /usr/lib/libdolfin.so.1.2(+0x4cde46) [0x7f0240c58e46]
[localhost:01437] [14] /usr/lib/libdolfin.so.1.2(_ZN6dolfin15OpenMpAssembler34assemble_cells_and_exterior_facetsERNS_13GenericTensorERKNS_4FormERNS_3UFCEPKNS_12MeshFunctionImEESB_PSt6vectorIdSaIdEE+0x8de) [0x7f0240c5b2ee]
[localhost:01437] [15] /usr/lib/libdolfin.so.1.2(_ZN6dolfin15OpenMpAssembler8assembleERNS_13GenericTensorERKNS_4FormE+0x415) [0x7f0240c5dc45]
[localhost:01437] [16] /usr/lib/libdolfin.so.1.2(_ZN6dolfin9Assembler8assembleERNS_13GenericTensorERKNS_4FormE+0x346) [0x7f0240c21076]
[localhost:01437] [17] /usr/lib/libdolfin.so.1.2(_ZN6dolfin8assembleERNS_13GenericTensorERKNS_4FormE+0x28) [0x7f0240c357f8]
[localhost:01437] [18] /usr/lib/libdolfin.so.1.2(_ZN6dolfin26NonlinearVariationalSolver24NonlinearDiscreteProblem1FERNS_13GenericVectorERKS2_+0x79) [0x7f0240c6c709]
[localhost:01437] [19] /usr/lib/libdolfin.so.1.2(_ZN6dolfin12NewtonSolver5solveERNS_16NonlinearProblemERNS_13GenericVectorE+0x109) [0x7f0240dd4549]
[localhost:01437] [20] /usr/lib/libdolfin.so.1.2(_ZN6dolfin26NonlinearVariationalSolver5solveEv+0x4de) [0x7f0240c6e6fe]
[localhost:01437] [21] /usr/lib/python2.7/site-packages/dolfin/cpp/_fem.so(+0x644ec) [0x7f02255b34ec]
[localhost:01437] [22] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4c2f) [0x7f024f3bd2df]
[localhost:01437] [23] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850) [0x7f024f3be280]
[localhost:01437] [24] /usr/lib/libpython2.7.so.1.0(+0x6dbcd) [0x7f024f34dbcd]
[localhost:01437] [25] /usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7f024f329c13]
[localhost:01437] [26] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x16a0) [0x7f024f3b9d50]
[localhost:01437] [27] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850) [0x7f024f3be280]
[localhost:01437] [28] /usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9) [0x7f024f3bd479]
[localhost:01437] [29] /usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850) [0x7f024f3be280]
[localhost:01437] *** End of error message ***

Revision history for this message
Nick Davies (ntd14) said :
#3

Sorry for keeping clogging this thread up but I am really stumpted on this one. It randomly gives the errors above on one run then it will go back to the normal GC error on the next without any changes.

Solving nonlinear variational problem.
  *** Warning: Form::coloring does not properly consider form type.
  Coloring mesh.
Fatal Python error: GC object already tracked
Fatal Python error: GC object already tracked
[localhost:00788] *** Process received signal ***

any ideas would be grate this is well beyond me.

Revision history for this message
Nick Davies (ntd14) said :
#4

The latest one is again at the solve() line

RuntimeErrorTraceback (most recent call last):
  File "anisotropic_rebuild.py", line 644, in <module>
    mesh, u_weight = growth()
  File "anisotropic_rebuild.py", line 220, in growth
    solve(F == 0, u, bc, J=J)
  File "/usr/lib/python2.7/dist-packages/dolfin/fem/solving.py", line 268, in solve
    _solve_varproblem(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/dolfin/fem/solving.py", line 316, in _solve_varproblem
    solver.solve()
RuntimeError:

*** Error: Unable to access vector of degrees of freedom.
*** Reason: Cannot access a non-const vector from a subfunction.
*** Where: This error was encountered inside Function.cpp.
*** Process: 1

This error is repeated for all of the threads.

I have found this is often related to split() but I dont use that command anywhere. Are there other commands which invoke split? Particually, solve, derivative, dot, inner, grad, det, tr, DirichletBC or Identity?
Again if I set it with a single thread, then it runs fine.
eg
$ mpirun -n 1 python anisotropic_rebuild.py or just run it as usual from spyder then it works fine. (this is without parameters["num_threads"] set)

$ mpirun -n 4 python anisotropic_rebuild.py gives the above errors.

Revision history for this message
Johannes Ring (johannr) said :
#5

FEniCS no longer uses Launchpad for Questions & Answers. Please consult the documentation on the FEniCS web page for where and how to (re)post your question: http://fenicsproject.org/support/

Can you help with this problem?

Provide an answer of your own, or ask Nick Davies for more information if necessary.

To post a message you must log in.