Presented at the 1995 DENIS Workshop on Detection of Galaxies and Other Extended Objects
Abstract
An algorithm has been developed and used to find galaxies in the 2MASS data. It uses the central surface brightness and parameterized radial profile in conjunction with the integrated magnitude to discriminate galaxies from the much more numerous stellar population. Additional tests reject contaminants such as double stars. Simulations have demonstrated that this galaxy algorithm performs with high completeness and reliability above integrated galaxy magnitudes of K = 13.5 mag. The reliability is limited currently by chance coincidences of three stars within a radius of 10 arcsec, which is important only below galactic latitudes of ~10 deg. Results for three surveyed fields at b=-26, 35 and 90 deg are presented, which are in agreement with these simulations.
Introduction
The Two Micron All Sky Survey (2MASS) is a ground-based, all-sky survey at the near-infrared wavelengths of J, H and Ks (1.25, 1.65 and 2.17 microns -- see Kleinmann {et.al.} 1994). The survey will reach a uniform limiting K magnitude of 14 (10 sigma) for point sources. We expect to detect over 100 million stars and 1 million galaxies.
Construction of the cameras and telescopes for 2MASS has already begun. A prototype single-band camera has been operational for several years, and has already acquired more data than the entire IRAS mission. 2MASS will begin survey operations in 1997 and begin releasing data in 1998. Operations will end in 2000, with all data products released, including one reprocessing, by 2002.
The galaxy catalog derived from 2MASS will provide the first all-sky photometric census of galaxies brighter than K = 13.5 mag, including galaxies in the optical ``zone of avoidance''. The catalog will have both a high completeness (> 95%) and a high reliability (> 95%) above b ~ 10 deg.
Achievement of these goals for the galaxy catalogs requires an algorithm that can effectively separate galaxies from the much more numerous stars, including double and triple chance coincidences of stars, observed by 2MASS.
This paper presents the algorithm that is currently used to find and characterize galaxies in the prototype camera data (section 2). Simulation has been used to produce artificial 2MASS images in order to evaluate the performance of the algorithm at various galactic latitudes, and results are presented in section 3. Three fields at galactic latitudes b=-26, 35 and 90 deg have been surveyed with the prototype camera and processed through the algorithm. The results are in agreement with the simulations, and are presented in section 4.
The Algorithm to Find and Characterize Galaxies
The galaxy processor is designed to find and characterize galaxies and other extended objects that are smaller than 0.8 arcmin, the size of the overlap region between two 2MASS scans. Only galaxies smaller than this size will be entirely contained on a single 2MASS scan. Larger galaxies will be flagged and their images clipped and saved, but parameters will not be derived for these galaxies.
Previously developed galaxy detection processors such as FOCAS, second moment tests, etc. have been applied to the prototype camera data, but our own algorithms give significantly better results. The reasons for this have not been explored; it may be due to the variation in the PSF caused by the undersampling of our data, or by the difference in backgrounds between the optical and the infrared. The infrared background is much higher, and varies much more than in the optical.
The input data for the galaxy algorithm are the coadded images resulting from combining the six dithered raw camera frames covering any given part of the sky. The coadd images have a pixel size of 1 arcsec, and are broken into individual units 512 pixels wide by 1024 pixels long. Coadd images overlap by 52 pixels, producing the same overlap in-scan as exists between scans. The galaxy algorithm works on one coadd image at a time.
The most critical operation for galaxy detection and parameterization is the correct determination of the coadd background. Individual sources with peaks above some threshold (eg, 3 sigma) are detected by a separate processor and passed to the galaxy processor. Pixels affected by these sources are masked off. The coadd is then 4 x 4 block-averaged.
The background is computed using a two-step iterative scheme. A one dimensional cubic polynomial is fit to each column of the coadd. A residual between the data and the fit is computed, from which those points whose residuals are less than 3 sigma are included in the next iterative fit. This rejection procedure is iterated three times. In this way, stars and small galaxies are excluded from the background determination. The second step consists of a cubic polynomial fit to each line of the coadd, where the line consists of solutions from the column 1-D fit in step 1. By coupling the column solution with the line solution, we are in effect fitting a 2-D surface to the coadd. The solution from this step is the background, which is then subtracted from the coadd. The cubic polynomial tracks only variations in the background larger than about 100-300 arcsec.
Two different algorithms are used to find galaxies. The first algorithm operates directly on the coadd, and finds ``normal'' galaxies that have a central peak greater than 3 sigma for an individual pixel. The second algorithm operates on the coadd after all point sources found by the point source processor and galaxies found by the first algorithm are masked off. It then block averages the masked-off coadd to find low surface brightness galaxies or galaxies that do not have a classic radial profile (if they exist).
Algorithm 1
The purpose of algorithm 1 is to find ``normal'' galaxies that have a classic galaxy profile (as opposed to a general amorphous blob). A set of classification operations that will distinguish galaxies from stars is performed on each detection passed to the galaxy processor.
Galaxies are discriminated from stars in two ways: their central surface brightness is significantly lower than that of stars, and their radial profile is significantly more extended than that of stars.
These discrimination parameters are computed by fitting the three-parameter function
A radius of 6 pixels is initially used for the fit. After the fit parameters are determined, an optimized aperture size is computed and the fit redone. An integrated magnitude for the source is obtained using the new aperture size.
The central surface brightness of the fit to each object, f0, is compared with the expected central surface brightness of a stellar source with the same total magnitude. The differential surface brightness, f0(star) - f0(object), is the first discriminant between galaxies and stars, abbreviated as delta(f0).
Analysis showed that alpha and beta were correlated and that the value of beta changed with source magnitude and the point spread function, rather than with galaxy type. However, the quantity alpha * beta, termed the ``radial shape discriminant'' (abbreviated as ``shape''), was found useful as the second discriminant between galaxies and stars. The dependence on the point spread function is removed by subtracting the value for point sources, and defines the ``differential shape''. This radial shape discriminant is not to be confused with the two-dimensional contours on the sky, and is not an intrinsic property of each galaxy due to its dependence on seeing, signal-to-noise, etc.
The parameters f0(star) and shape for point sources can be fully described by the FWHM of point sources from the point source processor and from numerical simulations, as well as from parameterization of real stars. The point source processor passes the FWHM of point sources to the galaxy processor, thus avoiding a double pass through the data for the galaxy algorithm.
Galaxies and other extended sources are selected from all detections by applying thresholds in delta (f0) and differential shape. Figure 1 shows the clear separation in these parameters between galaxies and stars for a field at the North Galactic Pole. The 2MASS galaxy list was verified using the POSS.
Much more effort is then spent on these smaller numbers of potential extended sources to validate and characterize them.
In order to distinguish double stars from truly extended objects, the radial profile fit is performed again on each potential extended source after masking off a 60 deg wedge, with the vertex anchored to the source peak. Six different fits are performed, using wedges that in turn cover a full circle around the source. After determining where the minimum shape occurs, a finer grid of wedge angular placements is used to determine the actual minimum shape for each source. Since the ``other'' star will be masked by one of these wedges, the minimum shape will be that of a point source, whereas galaxies almost never attain such a low minimum shape value due to their extent in all directions.
In order to attempt to distinguish triple stars from extended objects, a profile fit is performed for the two symmetric radial vectors extending along the elliptical major axis from the source peak, as well as 8 additional radial fits along the cardinal points and rays rotated 45 deg from the cardinal points.
Artifacts such as ``ghosts'' and persistence traces are rejected by algorithms that search for precise geometrical and intensity relationships to observed bright objects.
Double and some triple stars are rejected by applying thresholds in ``wedged'' shape versus shape and in the ``major-axis'' shape versus shape. A score is assigned to each remaining object that represents the probability of that object being a galaxy.
The key values computed for each object are the position, integrated magnitudes from a series of fixed-size apertures and an optimized elliptical aperture, central surface brightness, elliptical fit to the 3 sigma isophot, and the source extent: the differential shape, the wedge-masked shape, the major-axis shapes, and the elliptical shape. In addition, an image database will contain the coadd image of these extended sources as well as images of (pieces of) larger galaxies.
Algorithm 2
Algorithm 2 tries to recover low-surface brightness galaxies that might have been missed by algorithm 1. All stars found by the point source processor are masked off using a radius large enough to ensure that no flux remains in the coadd from them. All galaxies found by algorithm 1 are similarly masked off. The coadd is then block-averaged to several lower resolutions. A filter is then applied to the masked-off, block-averaged coadd that detects all sources with integrated flux above a 10 sigma threshold. Another filter will derive circular aperture magnitudes for various radii for all detections.
The sources found by algorithm 2 will be kept separately, as it is anticipated that the reliability of such detections will be low.
Both algorithms
After the source lists (from each algorithm) are band-merged, the refined position and source two-dimensional contours are used to consult the coadds to produce an integrated magnitude or upper limit in each band.
Completeness and Reliability of the algorithm from simulations
A galaxy simulator has been developed that uses the routines in IRAF to generate galaxies. The simulations include spirals and ellipticals with different central surface brightnesses, placed at random distances with random orientations. The parameters of the galaxies were selected to be representative of the range of surface brightnesses observed in the prototype camera fields analyzed to date. A stellar density is assumed, and stars of random brightness and positions are generated.
Fig. 2 shows that 2MASS only captures about 85% of the total flux from galaxies with an intrinsic beta of 4, virtually independent of total galaxy magnitude. This is a well-known effect that occurs because the flux from elliptical galaxies falls off very slowly with radius. However, it is exacerbated in the infrared due to the high sky brightness, compared to the optical. Thus in the following the magnitude of a galaxy with an intrinsic beta of 4 will be taken to be the total magnitude that can be observed, roughly 85% of its ``total'' integrated flux. Galaxies with an intrinsic beta of 1 have an observed integrated flux equal to their actual total integrated flux.
The results for completeness and reliability (see Chapter VIII of the IRAS Explanatory Supplement 1985 for definitions) are displayed in figures 3 and 4 for four different stellar densities, representative of galactic latitudes 5, 10, 25 and 90 deg for longitudes near 53 deg.
Figure 5 shows the calculated integral reliability due to the effects of double and triple stars on galaxies with K=13 mag as a function of galactic latitude. Also shown are the results from simulation for the current galaxy processor. Note that the reliability of the current galaxy processor follows closely the triple star prediction, verifying that it does an excellent job in discriminating double stars within 5 arcsec from galaxies and implying that triple stars are the main source of unreliability at low galactic latitudes.
As mentioned above, the reliability can be significantly improved by selecting only galaxies with diameters greater than 10 arcsec, since beyond that diameter 2MASS can distinguish the components of a triple star system. It should also be possible to distinguish many triple stars from galaxies, since visual examination of simulated coadds can readily tell the difference between them. Work on such an algorithm is in progress. In any case, only 15% of the sky has source densities higher than that producing a reliability of 30% at K=13 mag, a vast improvement over optical results. Follow-up observations, or comparisons with higher resolution images, can be used to eliminate objects falsely identified as extended.
Simulations show that galaxy positions are accurate to 1-2 arcsec (1 sigma). We do not yet have enough processed data to derive positional accuracies from a comparison with high accuracy optical galaxy positions.
Results from three observed fields
We have analyzed in detail three fields observed with the prototype camera: 1) about 36 square degrees at the North Galactic Pole; 2) about 10 square degrees in the Perseus-Pisces region at b ~ -25 deg; and 3) about 1 square degree in a single scan 6 deg x 50 arcsec centered on the globular cluster M92 at b~35 deg that was repeated 10 times.
The galaxy candidate lists from each field were compared to the POSS plates to determine the reliability of the processor and to produce the final list of galaxies.
The results clearly show that the completeness of the 2MASS galaxy catalog will be a significant improvement over all previous galaxy catalogs. In the two large fields, we find 4-6 times more galaxies than in the Zwicky catalog. Figure 6 shows the differential log N / log flux histograms for these two fields. The total number of galaxies brighter than K = 13.5 mag for the Perseus-Pisces region is 209 galaxies (21 galaxies per square degree), and for the NGP is 918 galaxies (26 galaxies per square degree). The M92 scan contained 26 galaxies brighter than K = 13.5 mag in about one square degree.
At the NGP, we have compared the 2MASS galaxy list with infrared observations of 103 galaxies from Recillas-Cruz {et.al.} (1990). The derived completeness from this comparison is in agreement with the expectations from Figure 3. We have also compared the 2MASS galaxy list with optical observations of 247 galaxies from Dressler (1980), and with 233 galaxies from the APS POSS E galaxy list. Within the uncertainty of the optical to infrared comparison, the derived completeness is also in agreement with Figure 3. More limited comparisons at b~26 deg give similar results.
The reliability of the galaxy processor has been determined by careful examination of the POSS for all three fields. The results are consistent with figure 4.
Photometry
Comparisons of 2MASS galaxy photometry with K-band aperture photometry obtained with a ``traditional" single-channel photometer (Recillas-Cruz {et.al.} 1990) shows that the 2MASS photometry is accurate to better than 0.1 mag (1 sigma) for galaxies brighter than K = 12 mag when fixed apertures are used.
For fainter galaxies, the photometric errors increases rapidly. Figure 7 shows the observed internal photometric error from the galaxies observed 10 times in the M92 field. The integrated magnitude plotted here is derived from an optimized aperture that is grown until the contribution of the next elliptical annulus becomes small.
The figure also shows the expected photometric error simply due to the aperture used to give the integrated magnitude. Although the results track the increased errors expected at faint levels, they are worse than the expected errors because galaxies, especially ellipticals, have significant flux at large distances from their center. Thus changes in the aperture used in each observation add a significant dispersion to the integrated magnitudes. This contribution can be decreased by using fixed apertures, but will never go below the expected photometric error due simply to the aperture size, unless one is content to capture less of the total flux.
References
Dressler, A. 1980, APJS 42 565.
Explanatory Supplement to IRAS Catalogs and Atlases
1985, C.A. Beichman, G. Neugebauer, H.J. Habing, P.E. Clegg, and T.J.
Chester, U.S. Government Printing Office.
Kleinmann, S.G. {et.al.} 1994, in {Infrared Astronomy With Arrays}, I. McLean ed. Kluwer AcademicPubl., p. 219.
Recillas-Cruz, E. {et.al.} 1990, AAP 229 64.