DEALING WITH CONFUSION EFFECTS IN 2MASS POINT SOURCE EXTRACTION
K. A. Marsh, IPAC
kam@ipac.caltech.edu
Oct 4, 1999
In regions of high source density, the noise model used in point-source extraction is significantly affected by the presence of unresolved confusing sources. Although such sources have approximately a power-law distribution of flux, the central limit theorem provides some assistance in treating their effects as an additional Gaussian noise term in the source extraction process.
Although confusion increases the photometric and position errors,
there are some advantages to ignoring its effects during the source
extraction process, and making a correction later.
The principal reason is that the confusion model can then take advantage
of statistics gathered from a region larger than the current scan-segment,
and will then be less susceptible to local anomalies.
INTRODUCTION
In regions of high source density, the 2MASS point-source extractor,
PROPHOT, often produces
sets of reduced chi squared values which are significantly
above unity. The reason is the neglect of confusion error in the noise
model. The question then arises: should we attempt to include confusion in
the noise model and thereby bring the reduced chi squared values back in line,
or should we leave the noise model alone and use the behavior of the reduced
chi squared
as an indicator of confusion? At some point, we do need to account for
the effect of confusion on photometric errors, and this memo discusses
the various considerations involved.
CONFUSION AS AN ADDITIONAL NOISE TERM
The variance of the background component in a single pixel of a 2MASS image
can be represented as:
where B is the background level [du], g is the pixel gain [counts/du],
N_R is the read noise, and sigma_c is the standard deviation
of confusion noise. The latter results from the presence of confusing
sources, most of which are faint unresolved Galactic stars. Although they
are mostly below the detection threshold, their collective effect is to produce
a noise-like term in the pixel values. This is not a problem at high
Galactic latitudes, but is a problem in dense parts of the Galactic Plane.
As an illustration, Figure 1 shows a plot of the estimated pixel
gain, g^,
as a function of segment number in a standard 2MASS scan through a region
of particularly high source density. The gain estimates were based on
Equation (1), obtaining sigma_nu via a trimmed average, but
neglecting sigma_c.
The estimated gain should have been constant
at approximately 8.2, but instead it increased gradually as the scan progressed.
The problem was that although the detected sources were excluded via the
trimmed average, the low-level confusing sources were not. Thus one would
expect the estimated gain to be most strongly reduced in the regions of
highest source density, and this is supported by comparison with the lower
plot which shows the corresponding number of extracted sources per segment.
IMPLICATIONS FOR SOURCE EXTRACTION
In principle, we can allow for the effects of confusion by incorporating
the extra term sigma_c^2 into the
noise model used in source extraction, and thereby
obtain photometric errors which reflect the
additional noise, and a set of reduced chi squared values
which are clustered
around unity. However, the non-Gaussianity and nonstationarity of the
confusion noise demand some care, as will now be discussed.
NON-GAUSSIANITY
Confusion noise tends to be distributed statistically as a power-law. However, if the source density is sufficiently large, the central limit theorem implies that the distribution of means (resulting from averaging by the point spread function) approximates a Gaussian. Confusion can then be legitimately included as an additional Gaussian noise term in the source extraction model.
One problem to be kept in mind is that, since a power-law does not fall off as rapidly as a Gaussian, there is a significant probability of large positive deviations which are spatially too sparse to be rescued by the central limit theorem. In order to prevent these sources from biasing the estimated background level, the locations of known detections can be masked out, and the remaining effects (caused, for example, by diffraction spikes from nearby bright sources) can be suppressed by trimming off the outliers in the averaging process.
Although it is desirable to do this trimming for the background estimation,
it would be undesirable for the profile-fit data in the central "data
circle," since trimming the latter would destroy the value of the reduced
chi squared as a goodness-of-fit indicator. Cases in which the latter
is elevated by discrete point-source confusion can be dealt with via "active"
deblending.
NONSTATIONARITY
A remaining problem is to find a suitable value of sigma_c to use. A local estimate (derived, for example, from the background annulus surrounding each extraction) is susceptible to: (1) the statistical nonstationarity of the confusion, and (2) artifacts such as the diffraction spikes from strong nearby sources. The former effect refers to the fact that since source density is position-dependent, one cannot always transfer the confusion statistics from one region to another.
The solution is to use a regional estimate which takes account of the large-scale variation of the confusion properties. It is, for example, clear from Figure 1 that the variation is sufficiently slow that it can be modeled. A similar situation would apply with respect to the radial distance from the center of a globular cluster.
In order to incorporate knowledge of larger-scale regional variations
into the local noise model, it would probably be easier to apply the
confusion corrections after the fact, rather than attempt to evaluate
the confusion noise model on the fly.
ERROR ADJUSTMENT
The only sources expected to be affected significantly by unresolved
background confusion are
the weak (background-limited) sources. For these sources, one can
express the photometric and position errors in the form:
where sigma_i represents the error in flux, x-location, and y-location,
for i = 1,2, & 3, respectively, and A is a matrix whose elements
are obtained from values of the PSF and its derivatives, evaluated about
the photometric solution. The quantity sigma_nu represents the
measurement noise.
Equation (2) implies that the errors in photometry and position can be adjusted by simple scaling, where the scaling factor represents the ratio of measurement noise with and without the confusion term. This ratio can be calculated after the fact, provided we know the confusion error, sigma_c, as a function of position in the scan. A simple model for the variation of this error can be obtained by fitting a smooth curve to the locally-estimated values, sigma_c^, for the complete set of extracted sources in a scan. This does, of course, require that we save the sigma_c^ values in the output PROPHOT source list. Alternately, one can also reconstruct the necessary information from the estimated pixel gain, g^. One advantage of saving g^, rather than sigma_c^, is that it would serve the dual purpose of enabling us to keep track of the true pixel gain of the electronics, from the values obtained in regions of low source density.