Is matrix assembly really that slow?

Asked by Luis Linares

Hello,

I am pretty much impressed with FeniCS, but now I'm facing something I'm not sure is my fault or something FeniCS might improve upon. My setup is as follows:

element_type = "Nedelec 1st kind H(curl)"
element_order = 2
V_c = FunctionSpace(mesh, element_type, element_order) #mesh is in 3D, is read from a XML file
V = MixedFunctionSpace([V_c, V_c]) #Real part, imag. part
Er, Ei = TrialFunctions(V)
vr, vi = TestFunctions(V)

fr = Function(FunctionSpace(mesh, "DG", 0)) # Constant values throughout a cell
fi = Function(FunctionSpace(mesh, "DG", 0))
#Filled fr, fi somehow....

wf = -fr*(dot(Er, vr)+dot(Ei, vi))*dx + fi*(dot(Ei, vr) - dot(Er, vi))*dx \
    + dot(curl(Er), curl(vr))*dx + dot(curl(Ei), curl(vi))*dx

n = FacetNormal(mesh)
# Add surface term at excitation boundary
d_exc = some_integer_index
r_v = Expression(['x[{0:d}]'.format(i) for i in range(3)]) #This is to turn the position vector into an element for weak form
k_dir_c = Constant( k_dir.tolist() ) # k_dir is a 3-element numpy array ...
E0_c = Constant( E0.tolist() ) # ... and so is E0
# ... real part
#wf += dot(-k*cross(n, cross(Ei, n)) - k*sin(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vr)*ds(d_exc)
## ... imaginary part
#wf += dot(k*cross(n, cross(Er, n)) - k*cos(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vi)*ds(d_exc)

T = Function(V)
a, L = lhs(wf), rhs(wf)
problem = LinearVariationalProblem(a, L, T)

With a test mesh, I get a system of 460k DOFs, that takes around 6 mins to assemble when running on 8 cores. When running single-threaded, It takes a lot more. Is this expected or can I do anything to improve this situation?

Thanks,

Luis.

Question information

Language:
English Edit question
Status:
Solved
For:
DOLFIN Edit question
Assignee:
No assignee Edit question
Solved by:
Garth Wells
Solved:
Last query:
Last reply:
Revision history for this message
Best Garth Wells (garth-wells) said :
#1

On 7 June 2012 17:41, Luis Linares <email address hidden> wrote:
> New question #199726 on DOLFIN:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Hello,
>
> I am pretty much impressed with FeniCS, but now I'm facing something I'm not sure is my fault or something FeniCS might improve upon. My setup is as follows:
>
> element_type = "Nedelec 1st kind H(curl)"
> element_order = 2
> V_c = FunctionSpace(mesh, element_type, element_order) #mesh is in 3D, is read from a XML file
> V = MixedFunctionSpace([V_c, V_c]) #Real part, imag. part
> Er, Ei = TrialFunctions(V)
> vr, vi = TestFunctions(V)
>
> fr = Function(FunctionSpace(mesh, "DG", 0)) # Constant values throughout a cell
> fi = Function(FunctionSpace(mesh, "DG", 0))
> #Filled fr, fi somehow....
>
> wf = -fr*(dot(Er, vr)+dot(Ei, vi))*dx + fi*(dot(Ei, vr) - dot(Er, vi))*dx \
>    + dot(curl(Er), curl(vr))*dx + dot(curl(Ei), curl(vi))*dx
>
> n = FacetNormal(mesh)
> # Add surface term at excitation boundary
> d_exc = some_integer_index
> r_v = Expression(['x[{0:d}]'.format(i) for i in range(3)]) #This is to turn the position vector into an element for weak form
> k_dir_c = Constant( k_dir.tolist() ) # k_dir is a 3-element numpy array ...
> E0_c = Constant( E0.tolist() ) # ... and so is E0
> # ... real part
> #wf += dot(-k*cross(n, cross(Ei, n)) - k*sin(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vr)*ds(d_exc)
> ## ... imaginary part
> #wf += dot(k*cross(n, cross(Er, n)) - k*cos(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vi)*ds(d_exc)
>
> T = Function(V)
> a, L = lhs(wf), rhs(wf)
> problem = LinearVariationalProblem(a, L, T)
>
> With a test mesh, I get a system of 460k DOFs, that takes around 6 mins to assemble when running on 8 cores. When running single-threaded, It takes a lot more. Is this expected or can I do anything to improve this situation?
>

It should be much faster. Can you attach the mesh and make sure that
the code extract is complete (i.e. I can copy-paste to run it).

Garth

> Thanks,
>
> Luis.
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

--
Garth N. Wells
Department of Engineering, University of Cambridge
http://www.eng.cam.ac.uk/~gnw20

Revision history for this message
Luis Linares (luis-linares) said :
#2

Garth,

I apologize for taking so long to reply. The notification somehow got lost in an avalanche of advisor e-mails.

My problem size has decreased to about 316k DOF and single-threaded assembly now takes around 30 secs, so I think that's not too bad. It appears the reason for the reported slowdown is because I had wrongly generated meshes, that had overlapping elements. Even though the timing is reasonable now, I would like to note that in COMSOL, an identical problem (both in structure and approximate number of DOFs) is reported to have a lot faster assembly time, not more than ~4 secs. Is there a possible bottleneck in the assembly process, or is perhaps COMSOL just reporting as "assembly" only a subset of the assembly operations it performs?

Thanks!

Luis

Revision history for this message
Luis Linares (luis-linares) said :
#3

Thanks Garth Wells, that solved my question.

Revision history for this message
Johan Hake (johan-hake) said :
#4

Have you tried setting some optimization flags?

parameters["form_compiler"]["cpp_optimize"] = True

Also, you might want to change quadrature degree:

parameters["form_compiler"]["quadrature_degree"] = 2

This value you might want to experiment with.

Johan

On 06/17/2012 05:45 AM, Luis Linares wrote:
> Question #199726 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Status: Answered => Solved
>
> Luis Linares confirmed that the question is solved:
> Garth,
>
> I apologize for taking so long to reply. The notification somehow got
> lost in an avalanche of advisor e-mails.
>
> My problem size has decreased to about 316k DOF and single-threaded
> assembly now takes around 30 secs, so I think that's not too bad. It
> appears the reason for the reported slowdown is because I had wrongly
> generated meshes, that had overlapping elements. Even though the timing
> is reasonable now, I would like to note that in COMSOL, an identical
> problem (both in structure and approximate number of DOFs) is reported
> to have a lot faster assembly time, not more than ~4 secs. Is there a
> possible bottleneck in the assembly process, or is perhaps COMSOL just
> reporting as "assembly" only a subset of the assembly operations it
> performs?
>
> Thanks!
>
> Luis
>

Revision history for this message
Luis Linares (luis-linares) said :
#5

Johan,

Yes, after posting about this issue I somehow came across the cpp_optimize option. I have also set ["cpp_optimize_flags"] = '-O2 -funroll-loops', and all of this makes it go much better.

Regarding ["quadrature_degree"] = 2, I guess the experimentation comes from estimating the accuracy obtained for every set value, right?

Thanks for your suggestions.

Cheers,

Luis

Revision history for this message
Johan Hake (johan-hake) said :
#6

On 06/18/2012 08:31 AM, Luis Linares wrote:
> Question #199726 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Luis Linares posted a new comment:
> Johan,
>
> Yes, after posting about this issue I somehow came across the
> cpp_optimize option. I have also set ["cpp_optimize_flags"] = '-O2
> -funroll-loops', and all of this makes it go much better.
>
> Regarding ["quadrature_degree"] = 2, I guess the experimentation comes
> from estimating the accuracy obtained for every set value, right?

Yes, and others will be able to tell you more about this.

Johan

> Thanks for your suggestions.
>
> Cheers,
>
> Luis
>

Revision history for this message
Joachim Haga (jobh) said :
#7

If you haven't already,

  parameters["form_compiler"]["optimize"] = True

should also help (the reported instruction count for computing the element
tensor is reduced from 18M to 1.2M)

-j

On 18 June 2012 08:21, Johan Hake <email address hidden>wrote:

> Question #199726 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Johan Hake posted a new comment:
> Have you tried setting some optimization flags?
>
> parameters["form_compiler"]["cpp_optimize"] = True
>
> Also, you might want to change quadrature degree:
>
> parameters["form_compiler"]["quadrature_degree"] = 2
>
> This value you might want to experiment with.
>
> Johan
>
> On 06/17/2012 05:45 AM, Luis Linares wrote:
> > Question #199726 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/199726
> >
> > Status: Answered => Solved
> >
> > Luis Linares confirmed that the question is solved:
> > Garth,
> >
> > I apologize for taking so long to reply. The notification somehow got
> > lost in an avalanche of advisor e-mails.
> >
> > My problem size has decreased to about 316k DOF and single-threaded
> > assembly now takes around 30 secs, so I think that's not too bad. It
> > appears the reason for the reported slowdown is because I had wrongly
> > generated meshes, that had overlapping elements. Even though the timing
> > is reasonable now, I would like to note that in COMSOL, an identical
> > problem (both in structure and approximate number of DOFs) is reported
> > to have a lot faster assembly time, not more than ~4 secs. Is there a
> > possible bottleneck in the assembly process, or is perhaps COMSOL just
> > reporting as "assembly" only a subset of the assembly operations it
> > performs?
> >
> > Thanks!
> >
> > Luis
> >
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Revision history for this message
Martin Sandve Alnæs (martinal) said :
#8

The quadrature degree that is determined automatically attempts to be exact, and can become excessively high for complicated equations. We should probably do some tweaking there, or make it more visible and easier to control.