DOLFIN

Is matrix assembly really that slow?

Asked by Luis Linares on 2012-06-07

Hello,

I am pretty much impressed with FeniCS, but now I'm facing something I'm not sure is my fault or something FeniCS might improve upon. My setup is as follows:

element_type = "Nedelec 1st kind H(curl)"
element_order = 2
V_c = FunctionSpace(mesh, element_type, element_order) #mesh is in 3D, is read from a XML file
V = MixedFunctionSpace([V_c, V_c]) #Real part, imag. part
Er, Ei = TrialFunctions(V)
vr, vi = TestFunctions(V)

fr = Function(FunctionSpace(mesh, "DG", 0)) # Constant values throughout a cell
fi = Function(FunctionSpace(mesh, "DG", 0))
#Filled fr, fi somehow....

wf = -fr*(dot(Er, vr)+dot(Ei, vi))*dx + fi*(dot(Ei, vr) - dot(Er, vi))*dx \
+ dot(curl(Er), curl(vr))*dx + dot(curl(Ei), curl(vi))*dx

n = FacetNormal(mesh)
# Add surface term at excitation boundary
d_exc = some_integer_index
r_v = Expression(['x[{0:d}]'.format(i) for i in range(3)]) #This is to turn the position vector into an element for weak form
k_dir_c = Constant( k_dir.tolist() ) # k_dir is a 3-element numpy array ...
E0_c = Constant( E0.tolist() ) # ... and so is E0
# ... real part
#wf += dot(-k*cross(n, cross(Ei, n)) - k*sin(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vr)*ds(d_exc)
## ... imaginary part
#wf += dot(k*cross(n, cross(Er, n)) - k*cos(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vi)*ds(d_exc)

T = Function(V)
a, L = lhs(wf), rhs(wf)
problem = LinearVariationalProblem(a, L, T)

With a test mesh, I get a system of 460k DOFs, that takes around 6 mins to assemble when running on 8 cores. When running single-threaded, It takes a lot more. Is this expected or can I do anything to improve this situation?

Thanks,

Luis.

Question information

Language:: English Edit question

Status:: Solved

For:: DOLFIN Edit question

Assignee:: No assignee Edit question

Solved by:: Garth Wells

Solved:: 2012-06-17

Last query:: 2012-06-17

Last reply:: 2012-06-13

Link existing bug

Revision history for this message

Garth Wells (garth-wells) said on 2012-06-13:

On 7 June 2012 17:41, Luis Linares <email address hidden> wrote:
> New question #199726 on DOLFIN:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Hello,
>
> I am pretty much impressed with FeniCS, but now I'm facing something I'm not sure is my fault or something FeniCS might improve upon. My setup is as follows:
>
> element_type = "Nedelec 1st kind H(curl)"
> element_order = 2
> V_c = FunctionSpace(mesh, element_type, element_order) #mesh is in 3D, is read from a XML file
> V = MixedFunctionSpace([V_c, V_c]) #Real part, imag. part
> Er, Ei = TrialFunctions(V)
> vr, vi = TestFunctions(V)
>
> fr = Function(FunctionSpace(mesh, "DG", 0)) # Constant values throughout a cell
> fi = Function(FunctionSpace(mesh, "DG", 0))
> #Filled fr, fi somehow....
>
> wf = -fr*(dot(Er, vr)+dot(Ei, vi))*dx + fi*(dot(Ei, vr) - dot(Er, vi))*dx \
> + dot(curl(Er), curl(vr))*dx + dot(curl(Ei), curl(vi))*dx
>
> n = FacetNormal(mesh)
> # Add surface term at excitation boundary
> d_exc = some_integer_index
> r_v = Expression(['x[{0:d}]'.format(i) for i in range(3)]) #This is to turn the position vector into an element for weak form
> k_dir_c = Constant( k_dir.tolist() ) # k_dir is a 3-element numpy array ...
> E0_c = Constant( E0.tolist() ) # ... and so is E0
> # ... real part
> #wf += dot(-k*cross(n, cross(Ei, n)) - k*sin(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vr)*ds(d_exc)
> ## ... imaginary part
> #wf += dot(k*cross(n, cross(Er, n)) - k*cos(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vi)*ds(d_exc)
>
> T = Function(V)
> a, L = lhs(wf), rhs(wf)
> problem = LinearVariationalProblem(a, L, T)
>
> With a test mesh, I get a system of 460k DOFs, that takes around 6 mins to assemble when running on 8 cores. When running single-threaded, It takes a lot more. Is this expected or can I do anything to improve this situation?
>

It should be much faster. Can you attach the mesh and make sure that
the code extract is complete (i.e. I can copy-paste to run it).

Garth

> Thanks,
>
> Luis.
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

--
Garth N. Wells
Department of Engineering, University of Cambridge
http://www.eng.cam.ac.uk/~gnw20

On 7 June 2012 17:41, Luis Linares <question199726@answers.launchpad.net> wrote:
> New question #199726 on DOLFIN:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Hello,
>
> I am pretty much impressed with FeniCS, but now I'm facing something I'm not sure is my fault or something FeniCS might improve upon. My setup is as follows:
>
> element_type = "Nedelec 1st kind H(curl)"
> element_order = 2
> V_c = FunctionSpace(mesh, element_type, element_order) #mesh is in 3D, is read from a XML file
> V = MixedFunctionSpace([V_c, V_c]) #Real part, imag. part
> Er, Ei = TrialFunctions(V)
> vr, vi = TestFunctions(V)
>
> fr = Function(FunctionSpace(mesh, "DG", 0)) # Constant values throughout a cell
> fi = Function(FunctionSpace(mesh, "DG", 0))
> #Filled fr, fi somehow....
>
> wf = -fr*(dot(Er, vr)+dot(Ei, vi))*dx + fi*(dot(Ei, vr) - dot(Er, vi))*dx \
>    + dot(curl(Er), curl(vr))*dx + dot(curl(Ei), curl(vi))*dx
>
> n = FacetNormal(mesh)
> # Add surface term at excitation boundary
> d_exc = some_integer_index
> r_v = Expression(['x[{0:d}]'.format(i) for i in range(3)]) #This is to turn the position vector into an element for weak form
> k_dir_c = Constant( k_dir.tolist() ) # k_dir is a 3-element numpy array ...
> E0_c = Constant( E0.tolist() ) # ... and so is E0
> # ... real part
> #wf += dot(-k*cross(n, cross(Ei, n)) - k*sin(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vr)*ds(d_exc)
> ## ... imaginary part
> #wf += dot(k*cross(n, cross(Er, n)) - k*cos(dot(k_dir_c, r_v))*cross(n, cross(E0_c, n-k_dir_c)), vi)*ds(d_exc)
>
> T = Function(V)
> a, L = lhs(wf), rhs(wf)
> problem = LinearVariationalProblem(a, L, T)
>
> With a test mesh, I get a system of 460k DOFs, that takes around 6 mins to assemble when running on 8 cores. When running single-threaded, It takes a lot more. Is this expected or can I do anything to improve this situation?
>

It should be much faster. Can you attach the mesh and make sure that
the code extract is complete (i.e. I can copy-paste to run it).

Garth

> Thanks,
>
> Luis.
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.

-- 
Garth N. Wells
Department of Engineering, University of Cambridge
http://www.eng.cam.ac.uk/~gnw20

Revision history for this message

Luis Linares (luis-linares) said on 2012-06-17:

Garth,

I apologize for taking so long to reply. The notification somehow got lost in an avalanche of advisor e-mails.

My problem size has decreased to about 316k DOF and single-threaded assembly now takes around 30 secs, so I think that's not too bad. It appears the reason for the reported slowdown is because I had wrongly generated meshes, that had overlapping elements. Even though the timing is reasonable now, I would like to note that in COMSOL, an identical problem (both in structure and approximate number of DOFs) is reported to have a lot faster assembly time, not more than ~4 secs. Is there a possible bottleneck in the assembly process, or is perhaps COMSOL just reporting as "assembly" only a subset of the assembly operations it performs?

Thanks!

Luis

Revision history for this message

Luis Linares (luis-linares) said on 2012-06-17:

Thanks Garth Wells, that solved my question.

Revision history for this message

Johan Hake (johan-hake) said on 2012-06-18:

Have you tried setting some optimization flags?

parameters["form_compiler"]["cpp_optimize"] = True

Also, you might want to change quadrature degree:

parameters["form_compiler"]["quadrature_degree"] = 2

This value you might want to experiment with.

Johan

On 06/17/2012 05:45 AM, Luis Linares wrote:
> Question #199726 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Status: Answered => Solved
>
> Luis Linares confirmed that the question is solved:
> Garth,
>
> I apologize for taking so long to reply. The notification somehow got
> lost in an avalanche of advisor e-mails.
>
> My problem size has decreased to about 316k DOF and single-threaded
> assembly now takes around 30 secs, so I think that's not too bad. It
> appears the reason for the reported slowdown is because I had wrongly
> generated meshes, that had overlapping elements. Even though the timing
> is reasonable now, I would like to note that in COMSOL, an identical
> problem (both in structure and approximate number of DOFs) is reported
> to have a lot faster assembly time, not more than ~4 secs. Is there a
> possible bottleneck in the assembly process, or is perhaps COMSOL just
> reporting as "assembly" only a subset of the assembly operations it
> performs?
>
> Thanks!
>
> Luis
>

Revision history for this message

Luis Linares (luis-linares) said on 2012-06-18:

Johan,

Yes, after posting about this issue I somehow came across the cpp_optimize option. I have also set ["cpp_optimize_flags"] = '-O2 -funroll-loops', and all of this makes it go much better.

Regarding ["quadrature_degree"] = 2, I guess the experimentation comes from estimating the accuracy obtained for every set value, right?

Thanks for your suggestions.

Cheers,

Luis

Revision history for this message

Johan Hake (johan-hake) said on 2012-06-18:

On 06/18/2012 08:31 AM, Luis Linares wrote:
> Question #199726 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Luis Linares posted a new comment:
> Johan,
>
> Yes, after posting about this issue I somehow came across the
> cpp_optimize option. I have also set ["cpp_optimize_flags"] = '-O2
> -funroll-loops', and all of this makes it go much better.
>
> Regarding ["quadrature_degree"] = 2, I guess the experimentation comes
> from estimating the accuracy obtained for every set value, right?

Yes, and others will be able to tell you more about this.

Johan

> Thanks for your suggestions.
>
> Cheers,
>
> Luis
>

Revision history for this message

Joachim Haga (jobh) said on 2012-06-18:

If you haven't already,

parameters["form_compiler"]["optimize"] = True

should also help (the reported instruction count for computing the element
tensor is reduced from 18M to 1.2M)

-j

On 18 June 2012 08:21, Johan Hake <email address hidden>wrote:

> Question #199726 on DOLFIN changed:
> https://answers.launchpad.net/dolfin/+question/199726
>
> Johan Hake posted a new comment:
> Have you tried setting some optimization flags?
>
> parameters["form_compiler"]["cpp_optimize"] = True
>
> Also, you might want to change quadrature degree:
>
> parameters["form_compiler"]["quadrature_degree"] = 2
>
> This value you might want to experiment with.
>
> Johan
>
> On 06/17/2012 05:45 AM, Luis Linares wrote:
> > Question #199726 on DOLFIN changed:
> > https://answers.launchpad.net/dolfin/+question/199726
> >
> > Status: Answered => Solved
> >
> > Luis Linares confirmed that the question is solved:
> > Garth,
> >
> > I apologize for taking so long to reply. The notification somehow got
> > lost in an avalanche of advisor e-mails.
> >
> > My problem size has decreased to about 316k DOF and single-threaded
> > assembly now takes around 30 secs, so I think that's not too bad. It
> > appears the reason for the reported slowdown is because I had wrongly
> > generated meshes, that had overlapping elements. Even though the timing
> > is reasonable now, I would like to note that in COMSOL, an identical
> > problem (both in structure and approximate number of DOFs) is reported
> > to have a lot faster assembly time, not more than ~4 secs. Is there a
> > possible bottleneck in the assembly process, or is perhaps COMSOL just
> > reporting as "assembly" only a subset of the assembly operations it
> > performs?
> >
> > Thanks!
> >
> > Luis
> >
>
> --
> You received this question notification because you are a member of
> DOLFIN Team, which is an answer contact for DOLFIN.
>

Revision history for this message

Martin Sandve Alnæs (martinal) said on 2012-06-18:

The quadrature degree that is determined automatically attempts to be exact, and can become excessively high for complicated equations. We should probably do some tweaking there, or make it more visible and easier to control.

To post a message you must log in.

Ask a question

Edit question

DOLFIN

Is matrix assembly really that slow?

Question information

Related bugs

Related FAQ:

Subscribers