I have done a quick simulation using purely Gaussian statistics to derive the rough magnitude of the flux bias vs. sigma. I generated 11,000 sources from a population that had a log N / log S slope of -1.5, cutoff at SNRtrue = 0.5. This should give accurate results for observed SNR > ~3.5. I use SNR as a direct proxy for flux, assuming a constant noise in this simulation. If this bothers you, multiply all SNR values by your favorite noise to get the flux in flux units rather than SNR units.
Plots of the measured flux vs. the true flux for all sources and sources with true snr between 2 and 10 show:
The histogram of the ratio of observed flux to true flux shows a clear asymmetry even in the SNR 6-7 bin, with 22% of the sources having a flux ratio above 1.3 vs. none of the sources having a flux ratio below 0.7. Even if all the sources had the highest theoretical flux uncertainties at SNR = 6, the lower edge of that bin, only 3.6% of the sources should be in either tail.
Taking the ratio of the median measured flux to the median true flux in flux bins, and subtracting 1, I computed the median flux overestimation as a function of SNR. There is a clear large mean flux overestimation of well over 5% below SNR = 7. Further, a smooth fit to the data shows that the flux overestimation in the SNR 7-8 bin is around 5%.
Another way to look at the flux overestimation is to compute the mean flux overestimation as a fraction of the quoted error. The mean flux overestimation is a full 50% of the quoted flux error for SNR = 6-7, making the quoted flux errors a seriously deficient measure of the true flux accuracy.
The derived log dN / log S plot shows the expected excess of sources starting at SNR ~ 6.
Remember that this simulation gives minimum values for the flux overestimation. All the sources of non-Gaussian noise only increase our actual flux overestimation.
I have used the above simulation to produce results for sources with the colors of galaxies. I assumed every single source had the following colors: J-H = 0.7 and H-K = 0.4. This is the "best case" for adding extra sources to the catalog by a multiband rule. Normalizing to J, this translates into a typical SNR ratio of 0.69 for H/J and 0.55 for K/J.
The reason this is a "best case" can be seen by considering the other extreme of the bluest stellar colors: J-H = 0.2 and H-K = 0.05. Again normalizing to J, this translates into a typical SNR ratio of 0.43 for H/J and 0.29 for K/J. With such lower SNR ratios at H and K, it makes it more unlikely for one of those bands to exceed any given threshold.
I used the simulation above to create the J population of sources, and then "observed" the H and K fluxes for those sources.
Although this may sound like it creates a bias in the simulation, the simulation procedure is actually exactly symmetric between the bands since all sources have the same colors. For example, you can think of the process to generate a single source as simply generating a point in a mythical SNR space not connected to any band, and then scaling that mythical SNR space to the actual SNR of J, H and K separately using the fixed colors.
The derived log dN / log S plot shows the expected excess of sources starting at SNR ~ 6 in all bands. The number of sources is converging to almost the same level independent of band at low SNR, since the number of observed SNR = 1 sources is dominated by sources boosted in flux by noise. Note that half of the simulated sources have SNR 0.50 - 0.79 at J and lower SNR at H and K, resulting in a slight excess of J sources relative to H and K. If the simulation had gone down to SNR of 0.01 at J, the number of sources found at SNR = 1 would have been nearly identical in every band.
I selected sources for the "catalog" using two rules:
The 9 added sources represents an increase of (4 ± 1)% in the number of sources in the "catalog". The reason for such a small number of sources is that a source with SNR = 7 at J has SNR = 4.8 at H and 3.8 at K. A source with SNR = 6 at J has SNR = 4.1 at H and 3.3 at K.
One can immediately see the source of the flux overestimation problem detailed below at H and K if a multiband threshold is used. If one uses only a single band threshold, sources are selected primarily, if not entirely, at J, and the H and K measurements are simply "carried along" and are unbiased. However, the additional sources selected from the multiband threshold have a serious flux overestimation problem. Only sources which have fluxes boosted by noise above SNR = 6, from their true fluxes of 4-5 at H and 3-4 at K, pass this multiband threshold.
Further, note that the amount of flux overestimation using a multiband threshold depends on the intrinsic flux of sources at those bands, relative to a single band threshold. For example, for our sources with highest SNR at J, the J threshold implies H fluxes of 4-5 SNR. Imposing a lower threshold at H (which is what is effectively done by the multiband threshold), the flux overestimation must be ~ 6 / (4-5), or 20-50%. Most of the sources will be at the lower threshold of 20%, since it is harder for noise to boost a source from 4 to 6 sigma than from 5 to 6 sigma. In the same way at K, the flux overestimation must be ~ 6 / (3-4), or 50-100%, with most of the sources at 50%.
I have plotted the J flux bias vs. J true flux, H flux bias vs. H, and K flux bias vs. K, where the flux bias is defined as the ratio of the observed flux to the true flux. In these plots, the sources selected from the multi-band rule are shown separately, with only the H thresholded sources plotted on the H plot and similarly for K. (In other words, the J plot shows in a separate color only the 8 additional sources which passed the SNR = 6 threshold at J, the H plot shows the 7 additional sources which passed the SNR = 6 threshold at H, and the K plot shows the 4 additional sources above SNR = 6 at K.) The plots show:
The thresholding J bias is simple to understand. At Jtrue = 7, only sources with positive noise excursions are allowed into the catalog by the single band threshold. Hence there must be a ~1 sigma high bias in observed fluxes at whatever threshold is picked for the catalog. At about 2 sigma above the threshold, or SNR ~ 9, this thresholding bias disappears. Below the threshold, the bias gets more severe. The observed flux must be ~50% high at SNRtrue ~ 7/1.5 = 4.7, and a factor of 2 high at SNR = 7/2 = 3.5, as observed.
This bias is well known. See Catalog Selection Should Be By SNR for further discussion. If, and that is a big if, the noise distribution in a survey is understood, this bias can be statistically corrected. Furthermore, this bias is negligible above SNR = 10, so those sources can be used with confidence.
The single "outlier" point in all 3 plots is a source with true fluxes of (3.1, 2.2 and 1.7) at (J, H and K), and observed fluxes of (7.2, 0.8 and 2.0). It is actually only an "outlier" at J, having a 4.1 sigma fluctuation upward there. At H and K, the fluctuations are -1.4 and +0.3 sigma. It looks like an outlier in the H and K plots only because the source is a much weaker source than the others, and hence its flux ratio and H and K has large error bars.
The flip side of the flux bias is of course the "missing" sources which were observed to fall below the threshold. Anything that is done to put fainter sources into the catalog will partially fill-in some of the missing sources, such as observed here.
For sources selected by the single band rule, which essentially means a J selection for all sources outside highly-extincted areas, both H and K show an unbiased flux distribution down to the lowest fluxes for which there are large number of sources, SNR ~ 4 at H and ~3 at K. (Recall that below those levels the uncertainty grows rapidly, and hence there is no real constraint on the flux bias below those levels in this simulation.)
It is another story altogether for sources selected by the multiband rule. The mean flux bias at H is almost 20% and at K is almost 50%, just as expected from the theoretical analysis above.
Hence for several reasons, I recommend that we do not use a multiband rule for catalog source selection:
Since only 4% more sources were added as a result of the multiband rule, it doesn't seem to me that the extra completeness is worth the additional biases in the catalog. Note the statement above that "sources above SNR = 10 can be used with confidence." A corollary is that the "carry along" bands can be used with confidence only as long as a multiband rule is not used.
Why muck up the catalog for 4% more sources that are all within 2 sigma of the catalog threshold? Let users who want to play in our trash box use the Reject File to obtain them, and hope that the users are sophisticated enough to properly deal with such sources.
http://spider.ipac.caltech.edu/staff/tchester/2mass/processing/flux_bias_by_snr.html
Comments and feedback: Tom Chester
Last update: 29 January 1999.