How to create, load, manipulate and render HwU histograms?

Created by Valentin Hirschi
Keywords:
HwU
Last updated by:
Rikkert Frederix

The HwU is a new standard for creating histograms in the fortran analysis (might it be at Fixed-order or after shower) which is very similar to the 'hplot' standard used before except that now a single histogram can store many different weights at once (and they can also be filled *at once*) so as to account for scale, PDF and MC uncertainty. The raw data file output is in the 'HwU' format, very basic and similar to the data section of the good old topdrawer.

There is also a generic, robust and flexible python module ('magraph.various.histograms') to handle these HwU output. One can load any of these 'HwU' output file, with 'histograms.HwUList(<HwU_data_file_path>)' which will store all histograms read in the resulting list HwUList. The HwU histograms contained in it can then be manipulated (addition, subtraction, multiplication and division) like if they were numbers and the statistical error is correctly propagated during these manipulations.
One can then output a HwUList simply with 'my_HwUList.output(<out_path_wihtout_extension>, format='gnuplot')' to create two files '<out_path_wihtout_extension>.HwU' and '<out_path_wihtout_extension>.gnuplot' which can be used to directly output plots by invoking gnuplot.

Notice that the title of the plots output by fortran can be semantic, i.e. appending it
'|<option_name>@<option_value>'
The possible option names and values are
'Type' or its shortcut 'T' with any string value. Typically one should set to things like 'LO', 'NLO' or 'CUT1', 'CUT2', etc..
'x_axis' or its shortcut 'X' with values in ['LIN','LOG']
'y_axis' or its shortcut 'Y' with values in ['LIN','LOG']

The last two are of course to specify the axis scales of the histogram to be plotted.

Finally, the scale and PDF uncertainty envelope computation is done automatically by the module when output. The y-range is also automatically optimally computed for each plot at the time of output.

Whenever you have one or several .HwU data file somewhere you can use the '__main__' of the python module 'madgraph/various/histograms.py' to plot its content, with

python histograms.py <(path_to)file1.HwU> <(path_to)file2.HwU> ... --out=<PlotOutputName>

This creates the "combined HwU" file <PlotOutputName>.HwU and the gnuplot card <PlotOutputName>.gnuplot that you can run with

gnuplot <PlotOutputName>.gnuplot

You'll notice that what happens is that all plots with the same name are put together in a group which is represented in the upper layout of a single histogram. There is then one page (with one such histogram) for each group identified by a given title (provided all histograms loaded with this titles have identical x-bin boundaries).
For each of these "grouped histogram" there is a middle inset which shows the relative uncertainties of the *first* histogram data in this group and a lower inset which shows the ratio between the *first two* histogram data in this group.

It is therefore of importance to know in which order the various histogram data is put into groups. It is first given by the order of the files specified, so ALL plots of file 1 will show up before those of file 2 in any group of histograms sharing the same name.
Then within a given files, the order will be given by the type attributed to the histograms. There is a default precedence setting, that you can alter with the option '--types=LO,<whatever_type2>,NLO,<whatever_type1>,etc..' when calling 'python histograms.py ...'.
Also, whatever type that isn't specified in the list when the '--type' option is present will be filtered out, hence rendering the plots less busy.

Here are two examples:

If you have two HwU files from a fixed order computation for mt_170.HwU and mt_160.HwU then

python histograms.py mt_170.HwU mt_160.HwU

Will show all four NLO, LO for mt=170, 160 curves on the upper layout, the scale uncertainty on NLO_170 on the middle inset and the K-factor NLO/LO for mt=170 on the lower inset. Doing

python histograms.py mt_170.HwU mt_160.HwU types=NLO

will limit ourselves to the NLO plots only and the lower inset will show the ratio mt_170/mt_160.

Now if you have a single file multiple_cuts.HwU with histograms for various sets of cuts (and that are booked with titles with types suffixes '|T@CUT1', '|T@CUT2', '|T@CUT3', etc... then doing:

python histograms.py multiple_cuts.HwU types=CUT4,CUT2

will show the ratio CUT4/CUT2 in the lower inset.

Notice that you can have more ratios than just one in the lower inset. If you set the option '--n_ratios=<i>' when calling the __main__ of 'histograms.py', then all ratios between the second and '(i+1)-th' histogram in the group will show.

Finally, you can disable the rendering of scale, pdf or statistical uncertainties with the options '--no_pdf','--no_scale', and/or '--no_stat'. You can find the full list of these options and some details using:

python histograms.py --help.