Pixel Reduction in HCA
Posted: Mon Nov 30, 2020 12:54 pm
Today I was asked by a student, how is it possible that the hierarchical clustering of a Raman image stops with the error message "dataset cannot be reduced because there are too many pixels exhibiting zero distance in the descriptor space".
At first I was puzzled, because a zero distance in the descriptor space is almost impossible in Raman spectra, as Raman spectra are normally quite noisy - which leads to non-zero distances of the pixels in the descriptor space. However a second thought revealed the solution....
The answer can be found by looking at the descriptor space. The students used a set of descriptors which were exclusively based on correlation descriptors (see the TC descriptor in the help file). This type of descriptor returns the correlation to a triangular template peak if the correlation is significant, or zero if it is not significant. As the Raman spectra are quite noisy the chances are high that any correlation in the empty areas of an image is not significant, resulting in zero descriptor values. And if there are only a few particles in the image most of the pixels will return zero TC values, which in turn leads to zero distances between the pixels.
The image below shows the situation: at the left you see the raw data at 1055 wavenumbers, at the right the TC data at the same wavelength is depicted. One can clearly see that the noisy background is converted to large areas of zero values (encoded in blue) when applying the TC descriptor.
Thus the amount of all-zero pixels may easily reach 40 to 50 percent of all analyzed pixels. Which means that you have to set the control "Percentile MinDist" in the HCA window to levels above this amount (actually, setting the control to values greater than 65 finally did it).
At first I was puzzled, because a zero distance in the descriptor space is almost impossible in Raman spectra, as Raman spectra are normally quite noisy - which leads to non-zero distances of the pixels in the descriptor space. However a second thought revealed the solution....
The answer can be found by looking at the descriptor space. The students used a set of descriptors which were exclusively based on correlation descriptors (see the TC descriptor in the help file). This type of descriptor returns the correlation to a triangular template peak if the correlation is significant, or zero if it is not significant. As the Raman spectra are quite noisy the chances are high that any correlation in the empty areas of an image is not significant, resulting in zero descriptor values. And if there are only a few particles in the image most of the pixels will return zero TC values, which in turn leads to zero distances between the pixels.
The image below shows the situation: at the left you see the raw data at 1055 wavenumbers, at the right the TC data at the same wavelength is depicted. One can clearly see that the noisy background is converted to large areas of zero values (encoded in blue) when applying the TC descriptor.
Thus the amount of all-zero pixels may easily reach 40 to 50 percent of all analyzed pixels. Which means that you have to set the control "Percentile MinDist" in the HCA window to levels above this amount (actually, setting the control to values greater than 65 finally did it).