User.Basis true gives seg fault

Asked by Leonardo Fonseca

Dear all, for the sake of experimentation for future use, I added the line User.Basis true to my .fdf file. I did it in a calculation involving only Pt and it worked fine. Then I did it in a calculation involving Ta and O and it also worked. But in a calculation involving all 3 elements the run crashed with the message below. All I did was to add that line and restart the job. I also added the line to other different .fdf files involving these 3 species and in all cases the job failed in the same way shown below. I wonder if anyone has already faced and solved this problem. Thanks a lot for your help!

Leo

************************** End of input data file *****************************

reinit: -----------------------------------------------------------------------
reinit: System Name: stoich
reinit: -----------------------------------------------------------------------
reinit: System Label: stoich
reinit: -----------------------------------------------------------------------
Siesta Version: siesta-4.1--736
Architecture : unknown
Compiler flags: mpifort -O2 -fPIC -ftree-vectorize
PP flags : -DMPI -DFC_HAVE_ABORT -DCDF -DNCDF -DNCDF_4
Libraries : /home/apascon/lib/librefblas.a /home/apascon/lib/libreflapack.a /home/apasconn/lib/libscalapack.a libncdf.a libfdict.a -lnetcdff -lnetcdf -lhdf5_hl -lhdf5 -lz
PARALLEL version
NetCDF support
NetCDF-4 support

* Running in serial mode with MPI
>> Start of run: 4-MAR-2018 17:41:25

initatom: Reading input for the pseudopotentials and atomic orbitals ----------

Reading PAOs and KBs from ascii files...
Species number: 1 Atomic number: 78 Label: Pt
Species number: 2 Atomic number: 8 Label: O
Species number: 3 Atomic number: 73 Label: Ta

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7FB556544467
#1 0x7FB556544AAE
#2 0x7FB55582F24F
#3 0x6E467A in __m_matel_registry_MOD_register_in_rf_pool
#4 0x6E4D82 in register_rfs_
#5 0x4F97D9 in __m_siesta_init_MOD_siesta_init
Falha de segmentação (imagem do núcleo gravada)

Question information

Language:
English Edit question
Status:
Answered
For:
Siesta Edit question
Assignee:
Nick Papior Edit question
Last query:
Last reply:
Revision history for this message
Leonardo Fonseca (fonseca65) said :
#1

Dear all, for the sake of experimentation for future use, I added the line User.Basis true to my .fdf file. I did it in a calculation involving only Pt and it worked fine. Then I did it in a calculation involving Ta and O and it also worked. But in a calculation involving all 3 elements the run crashed with the message below. All I did was to add that line and restart the job. I also added the line to other different .fdf files involving these 3 species and in all cases the job failed in the same way shown below. I wonder if anyone has already faced and solved this problem. Thanks a lot for your help!

Leo

************************** End of input data file *****************************

reinit: -----------------------------------------------------------------------
reinit: System Name: stoich
reinit: -----------------------------------------------------------------------
reinit: System Label: stoich
reinit: -----------------------------------------------------------------------
Siesta Version: siesta-4.1--736
Architecture : unknown
Compiler flags: mpifort -O2 -fPIC -ftree-vectorize
PP flags : -DMPI -DFC_HAVE_ABORT -DCDF -DNCDF -DNCDF_4
Libraries : /home/apascon/lib/librefblas.a /home/apascon/lib/libreflapack.a /home/apasconn/lib/libscalapack.a libncdf.a libfdict.a -lnetcdff -lnetcdf -lhdf5_hl -lhdf5 -lz
PARALLEL version
NetCDF support
NetCDF-4 support

* Running in serial mode with MPI
>> Start of run: 4-MAR-2018 17:41:25

initatom: Reading input for the pseudopotentials and atomic orbitals ----------

Reading PAOs and KBs from ascii files...
Species number: 1 Atomic number: 78 Label: Pt
Species number: 2 Atomic number: 8 Label: O
Species number: 3 Atomic number: 73 Label: Ta

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7FB556544467
#1 0x7FB556544AAE
#2 0x7FB55582F24F
#3 0x6E467A in __m_matel_registry_MOD_register_in_rf_pool
#4 0x6E4D82 in register_rfs_
#5 0x4F97D9 in __m_siesta_init_MOD_siesta_init
Falha de segmentação (imagem do núcleo gravada)

Revision history for this message
Alberto Garcia (albertog) said :
#2

Could you please provide your fdf file and the .ion files?

Revision history for this message
Nick Papior (nickpapior) said :
#3

This seems related to https://answers.launchpad.net/siesta/+question/665219

I tried to reproduce with the ptcda-au test (uses 4 pseudos). However, I cannot reproduce.

I can, however reproduce with the bug linked above.

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#4

Dear Alberto,

Attached you will find the .fdf and .ion files, in addition to the .psf
files. If you change User.Basis to false then you should have to problem to
run the .fdf. But with User.Basis true the code crashes with a seg. fault,
at least in my system.

Thanks for your help!

Leo

2018-03-05 6:52 GMT-03:00 Alberto Garcia <
<email address hidden>>:

> Your question #665219 on Siesta changed:
> https://answers.launchpad.net/siesta/+question/665219
>
> Status: Open => Needs information
>
> Alberto Garcia requested more information:
> Could you please provide your fdf file and the .ion files?
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/siesta/+question/665219
>
> You received this question notification because you asked the question.
>

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#5

I have been searching for the problem that causes the code to crash. It occurs in subroutine register_in_rf_pool, in the lines below:

matel_pool(gindex)%lcut = l
matel_pool(gindex)%rcut = func%cutoff

but I still do not understand why. I will keep searching since I do need this problem solved with some urgency. If you have any idea how to solve it let me know.

Thanks for your help!

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#6

I have done new tests on this problem. First I recompiled siesta with a lower (O1) and higher (O3) optimization level to see if the issue was created by the compiler. The problem appears at all levels of optimization.

Then I found out that a model bulk Pt with the 111 direction along z also gives the same error message when I use the option User.Basis true. Then I replaced Pt by Ta and the problem is still there. In a separate directory I have a very similar input file, but for Ta2O5. Using User.Basis true does not create any error message. Then I replaced Ta by Pt and again no error was found.

So the problem is not with the compiler, it is not with the .ion files and it is not specific to one particular .fdf file of mine.

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#7

In my latest test I run siesta version siesta-4.0b-494 on the same input file which crashed before upon setting User.Basis to true. Now there is no crash and the code runs smoothly. Therefore the problem appears only for versions siesta-4.1--625 and siesta-4.1--736.

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#8

Good afternoon Alberto and Nick,

I have submitted a few test results regarding the bug I found in the latest
versions of siesta, which appear sometimes when the key word User.Basis is
set to true. About a month ago you asked me for my input files, but since
then I have not heard any comments on the problem. I wonder if it has been
considered and if there is some plan to release either some ideas on how to
circumvent it or a general fix. I understand that a fix may only be
released to all users with the next version by June or July. Since I cannot
avoid the combination of siesta's latest version and the User.Basis keyword
set to true in my current research, I wonder if you can provide me with the
corrected source files so I can recompile my code right away without
further delaying my project.

My best regards, Leo

2018-03-05 6:52 GMT-03:00 Alberto Garcia <
<email address hidden>>:

> Your question #665219 on Siesta changed:
> https://answers.launchpad.net/siesta/+question/665219
>
> Status: Open => Needs information
>
> Alberto Garcia requested more information:
> Could you please provide your fdf file and the .ion files?
>
> --
> To answer this request for more information, you can either reply to
> this email or enter your reply at the following page:
> https://answers.launchpad.net/siesta/+question/665219
>
> You received this question notification because you asked the question.
>

Revision history for this message
Nick Papior (nickpapior) said :
#9

Dear Leonardo,

First, sorry for the long processing time. Sadly, time and support for these large projects are very small. So all the help you provide are very good!

1) The attachment failed, sadly one cannot attach on the Launchpad site. So if you have some other way of sending your input, it would be nice. Was it you who provided the ptcda-au example? As I said, I can't reproduce on that one either.
2) I have tried to reproduce your Pt example (111 using GGA on siesta web-page), to no avail.
3) I have tried with both full debug options as well as high optimizations and lower optimizations, neither show the bug.
4) I also tried all above tests with 4.1-736 version, to no avail.

So as it stands it seems that your problem is localized on your machine/hardware, something that makes it impossible for me to debug. :(

If you can provide any additional details, it would be really helpful. But as it stands it can be bugs in the software setup, hardware failures... Or....

Revision history for this message
Nick Papior (nickpapior) said :
#10

Could you try the new patch in
https://bugs.launchpad.net/siesta/+bug/1751723

I tried to look around, and found a few inconsistencies that may be what is needed...

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#11

Hi Nick,

I made the changes but compilation is having trouble in the routine below.
The added subroutine pseudo_init_constant is not found in module
pseudopotential. See compilation output at the end of this message:

=== modified file 'Src/basis_types.f'
--- Src/basis_types.f 2017-12-15 10:34:49 +0000
+++ Src/basis_types.f 2018-04-03 18:33:23 +0000
@@ -23,7 +23,7 @@
 !
 !
       use atmparams, only: lmaxd, nzetmx, nsemx, nkbmx
- use pseudopotential, only: pseudopotential_t
+ use pseudopotential, only: pseudopotential_t, pseudo_init_constant
       use precision, only: dp
       use sys, only : die

@@ -290,10 +290,12 @@
       p%lmxldaupj_requested = -1
       p%nkbshells = -1
       p%nldaushells = -1
+ p%nldauprojs_lm = -1
       p%nshells_tmp = -1
       p%label = 'Unknown'
       p%semic = .false.
- p%ionic_charge = huge(1.0_dp) ! To signal it was not set
+ p%ionic_charge = huge(1.0_dp) ! To signal it was not set
+ call pseudo_init_constant(p%pseudopotential)
       nullify(p%lshell)
       nullify(p%kbshell)
       nullify(p%tmp_shell)

The compilation gives the error

Compilation architecture to be used: unknown
If this is not what you want, create the right
arch.make file using the models in Src/Sys

Hit ^C to abort...

==> Incorporating information about present compilation (compiler and flags)
make "FPPFLAGS=-DMPI -DFC_HAVE_ABORT -DCDF -DNCDF -DNCDF_4" compinfo.o
make[1]: Entrando no diretório `/home/apascon/siesta-4.1-b3/Obj'
mpifort -c -O2 -fPIC -ftree-vectorize
 -I/home/apascon/siesta-4.1-b3/Docs/build/netcdf/4.4.1.1/include -DMPI
-DFC_HAVE_ABORT -DCDF -DNCDF -DNCDF_4 compinfo.F90
make[1]: Saindo do diretório `/home/apascon/siesta-4.1-b3/Obj'

mpifort -c -O2 -fPIC -ftree-vectorize
 -I/home/apascon/siesta-4.1-b3/Docs/build/netcdf/4.4.1.1/include
 /home/apascon/siesta-4.1-b3/Src/basis_types.f
/home/apascon/siesta-4.1-b3/Src/basis_types.f:26.51:

      use pseudopotential, only: pseudopotential_t, pseudo_init_constant
                                                   1
Error: Symbol 'pseudo_init_constant' referenced at (1) not found in module
'pseudopotential'
make: ** [basis_types.o] Erro 1

2018-04-03 16:17 GMT-03:00 Nick Papior <<email address hidden>
>:

> Your question #665219 on Siesta changed:
> https://answers.launchpad.net/siesta/+question/665219
>
> Status: Open => Answered
>
> Nick Papior proposed the following answer:
> Could you try the new patch in
> https://bugs.launchpad.net/siesta/+bug/1751723
>
> I tried to look around, and found a few inconsistencies that may be what
> is needed...
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/siesta/+question/665219/+confirm?answer_id=9
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/siesta/+question/665219
>
> You received this question notification because you asked the question.
>

Revision history for this message
Nick Papior (nickpapior) said :
#12

Could you try without that part of the patch, it shouldn't (hopefully) be relevant.

Otherwise, simply remove pseudo_init_constant call and use statements.

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#13

Ok. Without the pseudo_init_constant siesta compiles. However the problem
is still there. The seg fault shows up when I run serial. If I run in
parallel the jobs gets stuck in the same place but no error message is
issued:

reinit:
-----------------------------------------------------------------------
reinit: System Name: stoich
reinit:
-----------------------------------------------------------------------
reinit: System Label: stoich
reinit:
-----------------------------------------------------------------------
Siesta Version: siesta-4.1--736
Architecture : unknown
Compiler flags: mpifort -O2 -fPIC -ftree-vectorize
PP flags : -DMPI -DFC_HAVE_ABORT -DCDF -DNCDF -DNCDF_4
Libraries : /home/apascon/lib/librefblas.a
/home/apascon/lib/libreflapack.a /home/apasconn/lib/libscalapack.a
libncdf.a libfdict.a -lnetcdff -lnetcdf -lhdf5_hl -lhdf5 -lz
PARALLEL version
NetCDF support
NetCDF-4 support

* Running in serial mode with MPI
>> Start of run: 4-APR-2018 16:57:41

initatom: Reading input for the pseudopotentials and atomic orbitals
----------

Reading PAOs and KBs from ascii files...
Species number: 1 Atomic number: 78 Label: Pt
Species number: 2 Atomic number: 8 Label: O
Species number: 3 Atomic number: 73 Label: Ta

Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.

Backtrace for this error:
#0 0x7F7E82350467
#1 0x7F7E82350AAE
#2 0x7F7E8163B24F
#3 0x6E467A in __m_matel_registry_MOD_register_in_rf_pool
#4 0x6E4D82 in register_rfs_
#5 0x4F97D9 in __m_siesta_init_MOD_siesta_init
Falha de segmentação (imagem do núcleo gravada)

2018-04-04 16:46 GMT-03:00 Nick Papior <<email address hidden>
>:

> Your question #665219 on Siesta changed:
> https://answers.launchpad.net/siesta/+question/665219
>
> Status: Open => Answered
>
> Nick Papior proposed the following answer:
> Could you try without that part of the patch, it shouldn't (hopefully)
> be relevant.
>
> Otherwise, simply remove pseudo_init_constant call and use statements.
>
> --
> If this answers your question, please go to the following page to let us
> know that it is solved:
> https://answers.launchpad.net/siesta/+question/665219/+
> confirm?answer_id=11
>
> If you still need help, you can reply to this email or go to the
> following page to enter your feedback:
> https://answers.launchpad.net/siesta/+question/665219
>
> You received this question notification because you asked the question.
>

Revision history for this message
Leonardo Fonseca (fonseca65) said :
#14

Problem solved!!! I made a mistake in my first test after your latest
message. So, the patch solved the problem after I removed the
pseudo_init_constant declaration and calls.

Thanks a lot Nick!

2018-04-04 17:03 GMT-03:00 Leonardo Fonseca <
<email address hidden>>:

> Your question #665219 on Siesta changed:
> https://answers.launchpad.net/siesta/+question/665219
>
> Status: Answered => Open
>
> You are still having a problem:
> Ok. Without the pseudo_init_constant siesta compiles. However the problem
> is still there. The seg fault shows up when I run serial. If I run in
> parallel the jobs gets stuck in the same place but no error message is
> issued:
>
> reinit:
> -----------------------------------------------------------------------
> reinit: System Name: stoich
> reinit:
> -----------------------------------------------------------------------
> reinit: System Label: stoich
> reinit:
> -----------------------------------------------------------------------
> Siesta Version: siesta-4.1--736
> Architecture : unknown
> Compiler flags: mpifort -O2 -fPIC -ftree-vectorize
> PP flags : -DMPI -DFC_HAVE_ABORT -DCDF -DNCDF -DNCDF_4
> Libraries : /home/apascon/lib/librefblas.a
> /home/apascon/lib/libreflapack.a /home/apasconn/lib/libscalapack.a
> libncdf.a libfdict.a -lnetcdff -lnetcdf -lhdf5_hl -lhdf5 -lz
> PARALLEL version
> NetCDF support
> NetCDF-4 support
>
> * Running in serial mode with MPI
> >> Start of run: 4-APR-2018 16:57:41
>
> initatom: Reading input for the pseudopotentials and atomic orbitals
> ----------
>
> Reading PAOs and KBs from ascii files...
> Species number: 1 Atomic number: 78 Label: Pt
> Species number: 2 Atomic number: 8 Label: O
> Species number: 3 Atomic number: 73 Label: Ta
>
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
>
> Backtrace for this error:
> #0 0x7F7E82350467
> #1 0x7F7E82350AAE
> #2 0x7F7E8163B24F
> #3 0x6E467A in __m_matel_registry_MOD_register_in_rf_pool
> #4 0x6E4D82 in register_rfs_
> #5 0x4F97D9 in __m_siesta_init_MOD_siesta_init
> Falha de segmentação (imagem do núcleo gravada)
>
> 2018-04-04 16:46 GMT-03:00 Nick Papior <question665219@answers.
> launchpad.net
> >:
>
> > Your question #665219 on Siesta changed:
> > https://answers.launchpad.net/siesta/+question/665219
> >
> > Status: Open => Answered
> >
> > Nick Papior proposed the following answer:
> > Could you try without that part of the patch, it shouldn't (hopefully)
> > be relevant.
> >
> > Otherwise, simply remove pseudo_init_constant call and use statements.
> >
> > --
> > If this answers your question, please go to the following page to let us
> > know that it is solved:
> > https://answers.launchpad.net/siesta/+question/665219/+
> > confirm?answer_id=11
> >
> > If you still need help, you can reply to this email or go to the
> > following page to enter your feedback:
> > https://answers.launchpad.net/siesta/+question/665219
> >
> > You received this question notification because you asked the question.
> >
>
> --
> You received this question notification because you asked the question.
>

Revision history for this message
Nick Papior (nickpapior) said :
#15

Super! Great.

Thanks for your persistence and large gathering of information.

Revision history for this message
Nick Papior (nickpapior) said :
#16

Fixed

Can you help with this problem?

Provide an answer of your own, or ask Leonardo Fonseca for more information if necessary.

To post a message you must log in.