[ROOT-6977] Draw option "CONT4Z" not working properly for TGraph2D in logarithmic scale. Created: 17/Dec/14  Updated: 18/Dec/14  Resolved: 18/Dec/14

Status: Closed
Project: ROOT
Component/s: Graphics
Affects Version/s: 5.34/00
Fix Version/s: None

Type: Bug Priority: Low
Reporter: Dominik Martin Vilsmeier (Inactive) Assignee: Olivier Couet
Resolution: Clarified Votes: 0
Labels: None

Ubuntu 14.04 LTS

Attachments: PNG File TGraph2D_SCAT.png     PNG File TH2D_SCAT.png     PNG File graph2d.png     File graph2d_in_log_scale.c     PNG File hist2d.png     File hist2d_in_log_scale.c    


I'm trying to plot some data using TGraph2D in a logarithmic scale (respective x, linear y scale). However the graph is only drawn for x-values about a certain threshold. Below this threshold the canvas is blank.

When one changes to linear x scale one can observe a certain (blank) padding of the drawing of the graph towards the axes (on all sides). It looks like the padding value for minimum x corresponds to the threshold value in the log scale. I.e. the padding value is kept when changing to log scale. I tried changing to log scale before and after the graph is drawn. I'm not sure if this is the reason for the malfunction, just an idea.
When I change the axis range (using SetRangeUser) to [xmin/10, xmax] the graph reaches in linear x scale even below 0 (although there are no points < 0) and also in the log scale the threshold value changes but the log scale itself is changed in an odd way.

Trying the same with a TH2D works fine.

Please consider attachments for visualization.

Comment by Dominik Martin Vilsmeier (Inactive) [ 17/Dec/14 ]

Adding images of the scripts' output for visualization.

Comment by Olivier Couet [ 18/Dec/14 ]

May be the protection against 0 in log scale is done differently in both cases.
I will check.
Note that when you use CONT4, which is a histogram plotting option, a TH2 is also plotted. This TH2 is build from the graph2d.

Comment by Olivier Couet [ 18/Dec/14 ]

I see the difference..... In the case of the 2D histogram you book an histogram with non equidistant bins. Which is completely different from what TGraph2d does. When you plot a TGraph2D with option cont4z, a TH2D is created with equidistant bins and filled thanks to a linear interpolation on the delaunay triangle. Looking at how regular you distribution is I think a TH2 is better in your case.

Comment by Dominik Martin Vilsmeier (Inactive) [ 18/Dec/14 ]

Thank you for your reply! I have a few remarks:

1. In the case a TH2D with equidistant bins is generated and a interpolation (for the z values) is performed I still don't see how the left part of the drawing can be blank. The range of the histogram should still reach from xmin to xmax and every z-value should be linked to a (non-white) colour. So the left part should be filled with a colour as it is the case for the TH2D.

2. Why is a histogram with equidistant bins generated instead of a histogram with such bins that their centres represent the x- and y-values in the graph? Interpolation isn't needed then as the z-values in the graph correspond the ones in the histogram. Also for equidistant bins it is the case that the values are plotted at different x- and y-positions as they had originally in the graph (in case of originally non-equidistant x-/y-values).

3. Using the draw option "SCAT" (which is a TH2D draw option like CONT4) this left part is drawn also for the TGraph2D. This is better visible by setting the z-values in the scripts to for example 1/(x+y) (instead of x*y). However as one can see from the formula and the comparison with the TH2D the previously blank part is drawn incorrectly for the TGraph2D (I uploaded another two images).

Comment by Olivier Couet [ 18/Dec/14 ]

1. I was just pointing the two cases are not the same. The difference is not TH2 vs TGraph2D but equidistant bins vs non equidistant bins.The difference is clearly seen when you use the option BOX. In the case of equidistant bins the white part correspond the first bin stretched ..
Your non equidistant TH2 has many bins in that area where the equidistant one as only one ....

2. Your case is special, you have a regular grid. But the general case for graph2d is totally random points in the X Y plane. One Delaunay triangle may cover several bins. You can set more bins on the X axis. But that not will be as good as you TH2 with non equidistant bins. g2d->SetNpx(500);

3. You can try COL also .
3.1 also try specific TGraph2D options like TRI1 or P0 or combination of both.

Comment by Olivier Couet [ 18/Dec/14 ]

Also note that contours' algorithms use the bin center to draw the contour. That is why you have a white margin around a contour drawn with filled color. In case of log scale with equidistant bin the 1st bin is very wide and the margin is large.
Algorithms like COL box scat draw the complete bin.

Comment by Dominik Martin Vilsmeier (Inactive) [ 18/Dec/14 ]

1. I understand your point. Given a histogram with equidistant bins is generated and the contour plot is drawn at the bin centre this gives a bin size of (1000-0.1)/100 ~ 10, the first (not underflow) bin centre at 5 therefore. Thus I would expect the area for x < 5 to be blank but in the plot it is the area corresponding to x < ~12. Using a scatter plot also shows change in density along x within the area for x < 12. If this is one bin, does that mean that SCAT interpolates values within the bins and draws them accordingly?

2. Given a set of N points (x, y, z) assuming that non of the x and y values matches (i.e. they are distributed randomly in the x-y-plane) one can generate a histogram with N*N bins such that each bin centre corresponds to a (x, y)-pair and the bin content is set to the corresponding z-value. One does not need Delaunay triangulation then. As the method with generating a histogram with equidistant bin does not work very well I think something like the latter is favoured.

3. I already resolved my own work by using a TH2D (the scripts are only for demonstration). My point was, why are the draw options of TH2D offered for TGraph2D (with non-equidistant data points) if they give wrong outputs? Because I used a contour plot I got aware of something going wrong but If I used a scatter plot instead I might have not realized that my data is plotted in a wrong way (and the difference can be quite drastic as for the example images I attached before).

Comment by Dominik Martin Vilsmeier (Inactive) [ 18/Dec/14 ]

I have to add another remark for point 1:
I assumed when drawn the graph knows how many different x-values where fed in but as you said before having such a grid is a special case and the graph does not take this (or any grid-like structures) into account. So when generating the TH2D from the graph from what does it know how many bins there are in x- and in y-direction? Is it calculated from the number of Delaunay triangles along the different directions? However it obviously changes as one can observe for the y-axis where for the TH2D one has margin of 5 (which corresponds to 10 bins over y=0 .. 100 and contour drawn at the bin centre, as you said) but when looking at the TGraph2D plot the margin is a lot smaller (somewhat like 2) meaning there are actually a lot more y-bins than in the original data (that corresponds to the first bin respective x having actually a size greater than 10, something like 24 apparently, margin at 12), which is another source of errors.

Comment by Olivier Couet [ 18/Dec/14 ]

TGraph2D is based on Delaunay interpolation. We cannot go around as you suggested in point 2.
We will soon use a better implementation of it.

The CONT and SURF option work with bin center. Therefore what you see with equidistant bins is correct. Option like SCAT BOX COL are working with the complete bin. COL might be better in your case. TRI options are specific TGraph2D options.

In your case the best solution is TH2D with non equidistant bins

Seems to me there is no bug to fix.

Comment by Dominik Martin Vilsmeier (Inactive) [ 18/Dec/14 ]

Well you can call it a bug or something else but from my point of view one should either forbid the usage of TH2D draw options for TGraph2D if they are error prone in certain cases or one should fix the usage of them such that the output is correct. Otherwise users might run into trouble using those options.
I understand that TGraph2D is based on Delaunay triangulation but the moment you draw it with a TH2D option a TH2D seems to be actually generated and drawn (with equidistant bins as you said), so why not creating a TH2D with non-equidistant bins instead, such that the bin centres correspond to the values in the x-y-plane. The output then corresponds to the actual values.
Implementing how to find the bin low edges corresponding to the given bin centres might be of course non-trivial but definitely feasible.

Comment by Olivier Couet [ 18/Dec/14 ]

I have updated the TGraph2D help to clarify the difference between the TGraph2D drawing options and the TH2 ones. The contour you get is perfectly correct. And if you create a 2D histogram outside the TGraph2D context it will be the same result.
I am not sure it is easy to create a 2D histogram with non-equidistant bins, such that the bin centres correspond to the values in the x-y-plane. In the general case of random point in the XY plane that's a bit tricky.... i guess ... I have to think.
Then, if we have that algorithm, we do not need a TGraph2D anymore. This algorithm will simply be a new TH2D constructor. Each bin will be filled by the Z value corresponding to its center. That could be an interesting add-on to TH2D...

Comment by Dominik Martin Vilsmeier (Inactive) [ 18/Dec/14 ]

Well it is correct except this blank part on the left. But if you use a draw option which draws the whole bin, like SCAT or COL, the results actually differ (and I think this is the more problematic case because it is not necessarily obvious that what you get in the plot might not correspond to the underlying data anymore).
And I don't think you can drop TGraph2D in favour of having everything in TH2 because the two classes just have different purposes. TH2 is meant to collect events and gather them into bins while TGraph2D saves every event individually. The moment you want to draw a graph as a histogram the algorithm will come up with a certain binning but this might change later in the code when you add new events to the graph. A histogram on the other hand has a fixed binning and will only change the number of collected events. Putting it all in one class is confusing in my opinion and actually unpractical as you might want to use the two classes for different situations (i.e. for different purposes).

Comment by Olivier Couet [ 18/Dec/14 ]

I think the fact that CONT4 is filling the contours makes it confusing. In fact CONT4 is exactly like CONT1: It computes the contours lines going through the bins centres. CONT1 draws only the lines and CONT4 in addition fills the space between the lines. So it looks like a bit the option COL but it is totally different. In comparison all the other options: COL, BOX, SCAT etc draws always the entire bins. It is a totally different technique.

We thought a bit of your idea of creating automatically the bins of an non equidistant histogram. That does not fly in the general case where you have random points because most of the bins will be empty. In the general case you will need a TH2Poly like in the example $ROOTSYS/tutorials/math/kdTreeBinning.C

Your case is special: you have a regular grid.

Comment by Dominik Martin Vilsmeier (Inactive) [ 18/Dec/14 ]

I agree on that. Even when extrapolating the data to empty bins the whole plot would rather have an artificial character as most of the parts are post-generated data points.

But since drawing a TGraph2D with TH2 options for entire bin drawing might lead to distorted results in the plot I think it would be better to restrict usage of them. Users might be unaware of the effects of those methods on the plot of their data points.

And maybe it is worth implementing a class which accounts for regular data points arranged on a grid. The difference to TH2 would be that one can add grid lines (and points on those lines accordingly) "on the fly", i.e. having a flexible structure (in contrast to TH2 with a fixed binning). Drawing such a grid could then be realized by using a TH2 with non-equidistant bins. In case some grid lines have data points not been set one could indicate this by drawing for example some symbol on top of it (while in the common case one should be able to provide a whole grid of measured data points).
I think having data points arranged on a grid is a quite frequent situation and thus the availability of an extra class could be handy (to avoid the workaround with TH2).

Comment by Olivier Couet [ 18/Dec/14 ]

I have better documented the difference between direct drawing of TGraph2D and TH2 drawing options. Restricting the usage is not a solution because one can always do g2d->GetHistogram()->Draw().

As the drawing part of this post has been clarified and as you are requesting a new class I suggest I close this report and (because the title is misleading) and you post a new one with the description of the new class you need.

Generated at Tue Aug 09 22:15:28 CEST 2022 using Jira 8.22.6#822006-sha1:a60819604027c401cc97bed69f4574413f3aa3b8.