Problem with running fluidity on HECToR
Should i do the flredecomp before I copy it over to HECToR because I have tried it and it get Errors with it.
Error reading halo file mesh/tank_0.halo
Zero process file not found
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
Error message: Unable to read halos with name mesh/tank
-------
MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
with errorcode 16.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
-------
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
*** ERROR ***
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
Error message: Unable to read halos with name mesh/tank
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
*** ERROR ***
*** ERROR ***
Error message: Unable to read halos with name mesh/tank
Error message: Unable to read halos with name mesh/tank
-------
mpiexec has exited due to process rank 8 with PID 27371 on
node osito exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpiexec (as reported here).
-------
[osito:27362] 15 more processes have sent help message help-mpi-api.txt / mpi-abort
[osito:27362] Set MCA parameter "orte_base_
If I add it to the job file I think i need to use aprun instead of mpiexec?
I assume then I would start it with
aprun -n 16 -N 1
but then I am not sure how i add the flredecomp part.
Thanks
I am sure that I am doing some simple wrong.
I have a .flml file and the .msh, .geo and .geo~ files in my work directory of HECToR. I have written job script based on the example I found on the AMCG website.
#!/bin/bash --login
#PBS -N fluidity_run
#PBS -l mppwidth=16
#PBS -l mppnppn=1
#PBS -l walltime=12:00:00
#PBS -A n03-lb
module swap PrgEnv-cray PrgEnv-fluidity
# Change to the direcotry that the job was submitted from
cd $PBS_O_WORKDIR
# The following take a copy of the Fluidity Python directory and
# put it in the current directory. If we don't do this, we get import errors.
export WORKING_DIR=$(pwd -P)
cp -r /usr/local/
export PYTHONPATH=
# Set the number of MPI tasks
export NPROC=`qstat -f $PBS_JOBID | awk '/mppwidth/ {print $3}'`
# Set the number of MPI tasks per node
export NTASK=`qstat -f $PBS_JOBID | awk '/mppnppn/ {print $3}'`
aprun -n $NPROC -N $NTASK fluidity -l -v2 plume_tank.flml
# clean up the python directory
rm -rf python
However, when I submit the job I get error files such as :
*** ERROR ***
Error message: gmsh file mesh/tank_0.msh not found
Rank 0 [Tue Dec 18 15:43:00 2012] [c5-1c0s1n2] application called MPI_Abort(
I ran the command make before I copied the files over to HECToR.
The .o file gives
-------
*** marie Job: 1028017.sdb starts: 18/12/12 15:42:16 host: phase3 ***
*** marie Job: 1028017.sdb starts: 18/12/12 15:42:16 host: phase3 ***
*** marie Job: 1028017.sdb starts: 18/12/12 15:42:16 host: phase3 ***
*** marie Job: 1028017.sdb starts: 18/12/12 15:42:16 host: phase3 ***
User may access requested budget
Application 3262579 exit codes: 134
Application 3262579 resources: utime ~40s, stime ~0s
-------
Resources requested: mpparch=
Resources allocated: cpupercent=
*** marie Job: 1028017.sdb ends: 18/12/12 15:43:06 queue: par:16n_12h ***
*** marie Job: 1028017.sdb ends: 18/12/12 15:43:06 queue: par:16n_12h ***
*** marie Job: 1028017.sdb ends: 18/12/12 15:43:06 queue: par:16n_12h ***
*** marie Job: 1028017.sdb ends: 18/12/12 15:43:06 queue: par:16n_12h ***
and .e file
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
_pmiu_daemon(
[NID 02466] 2012-12-18 15:43:00 Apid 3262579: initiated application termination
_pmiu_daemon(
_pmiu_daemon(
~
I am sure that I have done something simple wrong but with all the reading and looking at information on HECToR I cannot understand what I need to change to make it work.
Any help is much appreciated.
Thanks
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Fluidity Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Marie Pears for more information if necessary.