Access Options for Low SNR Sources in 2MASS Working Databases




I. Introduction

Discussion has begun about access to the 2MASS extracted source data for objects that do not satisfy the requirements of the preliminary or final Catalog generation. These will typically be low SNR objects, and have often been referred to as the "reject" files, in analogy to the low SNR data release from IRAS.

In this memo, I provide a brief discussion of the low SNR contents of the 2MASS Working Source Databases and outline a few options describing how part of all of those data could be release to the community.

Detection Thresholds and the Constituents of the Database

The detection thresholds currently used in 2MAPPS are set quite low to insure completeness. PIXPHOT/FIND detects maxima in the zero-sum 4" FWHM gaussian-convolved Atlas Images that are at least 3.0 times the estimated image noise level. PROPHOT then operates at the location of each of the detected maxima. The effective detection SNR threshold is ~3.5 in PROPHOT-space. The PROPHOT SNR is slightly higher than in the detection step for a number of reasons including the fact that PROPHOT performs a more optimal source measurement and local noise estimation. The detection thresholds were set with the objective of detecting approximately a factor of two more "sources" than are real. Therefore, a very large fraction of the entries in the Working Point Source Databases are noise detections.

Figures 1-3 show the distribution of PROPHOT magnitudes and extraction uncertainties for 598,657 "sources" in the Working Point Source DB within 5 degrees of the north galactic pole. For reference, the SNR=3:1, 4:1 and 7:1 lines are indicated along the uncertainty axis. The peaks of the extraction uncertainty distributions are indicative of the mean extraction SNR limits. The peaks occur at approximately 0.31 mags, in all three bands, or an SNR of ~3.5:1. The excess of "sources" seen in the peaks of the source count distribution plots are produced by the noise detections at low SNR.

Figure 1 - Distribution of J-band source magnitudes and photometric extraction (PROPHOT) uncertainties for all sources in the current 2MASS Point Source Working Database within 5 degrees of the north galactic pole.

Figure 2 - Distribution of H-band source magnitudes and photometric extraction (PROPHOT) uncertainties for all sources in the current 2MASS Point Source Working Database within 5 degrees of the north galactic pole.

Figure 3 - Distribution of Kssource magnitudes and photometric extraction (PROPHOT) uncertainties for all sources in the current 2MASS Point Source Working Database within 5 degrees of the north galactic pole.

The source count and uncertainty distributions are shown in black for all detections in each band, and in red for sources that have a detection in each respective band and at least one other band (multiband detections). Shown in blue are the distributions for all 2MASS detections that have an optical counterpart from USNO or ACT within 5 arcseconds. Detection in multiple 2MASS bands and association with optical counterparts is a good source reliability indicator, particularly at high latitude. The largest number of single-band sources are expected in the J-band because that is the most sensitive band 2MASS band when both system throughput and energy distributions of most sources is taken into account. Single band H and Ks detections should be relatively rare at high galactic latitudes, although not unphysical.

The current SNR limits set for the Catalogs (SNR>=7 in at least one band), were set to insure high reliability and to avoid flux overestimation biases for low SNR sources. Extensive work with the source lists has shown that the reliability (defined as the existence of the source, and not necessarily for flux accuracy) of fainter objects remains good down to SNR~5, and can be made even better by requiring multi-band detections, optical counterparts, etc.

II. Options

There is currently no set plan for the release to the community of lower SNR sources from the Working Databases, nor are there resources identified to do so. If we choose to release these data at some point, it is necessary to identify the scope of the that release, and the resources that will be required.

Below, I have generated a list of reject file release options that spans a range of sizes. Included with each release is an estimate of the associated additional cost in hardware and schedule for that option.

Schedule cost covers the additional time to prepare the data for release, including actual final product generation DB tasks. The schedule cost is quoted first as a factor relative to the time required for the production of the current Catalogs. A cost example is then given in parentheses for the reject file production of the Fall/Winter 1999 release. This schedule cost estimate includes the personnel costs only for the final product generation tasks, and does not included personnel time for final product analyses. I have assumed in all options except number 1, that the reject file release would be done in series with Catalog release, so that the Catalog releases would not be delayed. If the reject file releases were made in parallel to the Catalog releases, the schedule cost factors should be regarded as delay factors since the gross activity would be increased.

Hardware costs include primarily the cost of disk to manage the release preparation, and to serve the additional data volume to the community. Disk cost of the EMC2 system is high, currently about $0.38/MB. The EMC2 system currently serves both the internal working and production databases and the publicly served databases. IRSA is pursuing less expensive alternatives for the internal databases storage.

As a secondary release option, I also list the cost of tape if the release of the reject files was to take the form of a set of DLT tapes sent to users. Tape cost assumes 100 copies of a tape set at a cost of $100/tape with duplication. Raw data volume per tape is assumed to be 35GB, and we assume that ~4:1 compression is possible for the DB tables.

OptionSchedule cost factorDisk cost factorTape Cost
1. Lower Catalog SNR limit to SNR~5 (TBD)
- All other catalog generation processing the same
1.4 ($128k)1.4 ($23k)$10k
2. Reject file with no SNR threshold and with no catalog generation processing
- This essentially means opening the Working DB's to public access
1.1 ($5k)1.1 ($6k)$50k
3. Reject file with SNR<7, but with some or all of standard final product generation processing:
+ Overlap resolution
+ Position uncertainty updates
+ DB_MAPCOR
+ Extended Source Classification
+ Post-processing (Meteor streak finding, banding diagnostics, untracked seeing edits)
+ Catalog Reformatting
<4.0-5.0 ($480k-$600k)<4.0-5.0 ($228k)$50k

III. Discussion

Option 1

This first option involves lowering the SNR thresholds for Catalog Generation, keeping all other catalog generation preparatory tasks the same, and not having a separate "reject" file release. The cost of this option depends on the data volume increase which depends on the SNR level to which the threshold is dropped. For illustration purposes, I have shown the cost and schedule increase for an SNR=5:1 threshold.

Figure 4 shows the cumulative histogram of the fraction of all J-band detections for sources within 5 degrees of the north galactic pole, as a function of PROPHOT photometric uncertainty. This plot provides an approximation of the growth rate of the release catalog as a function of SNR since J-band is the most sensitive at high galactic latitude, and because the Catalog selection rules require the SNR threshold to be met in at least one band. The relative increase in the release volume from SNR=7:1 (sigma=0.155 mags) to SNR=6:1 (sigma=0.181 mags) is 1.17, and to SNR=5:1 (sigma=0.217 mags) is 1.40. Thus, dropping the threshold to SNR=5:1 would increase the data volume by 40%. Additional constraints, such as multibandedness, would decrease the net gains slightly. The final product generation schedule scales roughly linearly with the number of sources.

Figure 4 - Cumulative histogram of the fraction of sources with photometric extraction (PROPHOT) uncertainties less than a given value for all J-band detections within 5 degrees of the north galactic pole, in the current 2MASS Point Source Working DB.

Option 2

This option is technically the simplest to implement, but carries some baggage with it. In this case, we would release the full Working Database tables to the community, with no thresholding and no final product generation processing. Thus, all noise detections would be include, as would bright star artifacts, duplcate detections in scan overlap regions, etc.

There is no technical limitation that would prohibit pointing the public CATSCAN interface to the existing Working Point Source DB on the internal server. Thus, it would not be necessary to buy additional disk to carry the reject files. This is critical because the additional volume of disk that would be necessary is 3-4 times as large as needed for the Catalog release.

The hidden cost of this option is that if public access to the Working DB's on the internal server is allowed, performance will be poor. This will slow production of the Catalogs as well, since the final product generation tasks are run on the internal servers and will contend for disk access.

Option 3

This option is similar to option 1, but involves generating a separate reject file containing all sources that do not meet the Catalog thresholds, and performing some or all of the final product generation tasks for that file. Figure 4 shows that with no lower limit to the SNR for inclusion in this version of the reject file, the data volume would be 4-5 times the Catalog size. Final product processing scales roughly as the source number, so the preparation time follows accordingly. Most of the final product generation tasks run in parallel, so excluding one or more of them does not significantly impact the schedule. However, it does impact the personnel resources required to carry out the tasks, and analyze results.

This is the most expensive option in both schedule and resources. For the Fall/Winter release, nearly 600GB of disk would be needed to support the reject file, at a cost of $228k.

The cost of this option could be reduced by setting some lower SNR threshold on the reject file.


R. Cutri - IPAC
Last Update - 12 October 1999