TO: 2MASS Science Team 31 October, 1997

FROM: C. Beichman

SUBJECT: Plans for Data Release from Now to our First Major Release

This note addresses our first major release of 2MASS data 18-24 months from now. I also discuss the Headquarters request for an accelerated release of small amounts of data for scientific and public outreach. In what follows I have highlighted recommendations or major points for further debate in italics.

  1. First Major Release of 2MASS Data
  1. When Should We Release Data?

Over the years, our various proposals, project implementation plans, ERB reports, etc, have committed us to releasing substantial amounts of 2MASS data 18-24 months after the start of survey operations, with comparable amounts of data to follow at roughly 6-12 month intervals. Different documents quote different timescales, data volumes, and starting points. One of the most recent documents, the minimum success criteria document, describes a data release no later than June 1, 1999. Interestingly, the ERB advocated the latest release date, suggesting 24 month after the start of survey observations.

The northern survey was formally initiated in June 6, 1997, within six weeks of turning on the hardware, leading to release dates ranging from January 6 to June 6, 1999. Choosing between those two dates is a delicate matter with obvious political factors driving us to earlier release dates, perhaps as early as the January 1999, AAS meeting. However, I believe that valid technical considerations push us toward slightly later dates. First, a significant amount of data obtained between June and September 1997 will not be useful due to a large variety of teething problems including focus drifts, hot pixels, jumping J-band bias and quadrants, scheduling issues, RA drifts, etc. and to poor weather. Only with these problems behind us has it become possible to start the tuning and polishing of the software necessary to get Version 2.0 of the pipeline ready for its presently scheduled release date of February 15, 1998. Coupling our original 18 month estimate for a first release (in the 1993 proposal) with an effective October start date for the survey leads to a release date of April 1999, midway between the two bounds mentioned above.

Table 1 shows a proposed schedule in a bit more detail, starting with the endpoint of the release date and moving backwards to the present time. The timeline shows 4 months of analysis and final product preparation time that cannot safely be reduced. Any acceleration of the schedule would come at the expense of observation and processing of data from October and November of 1998. This cutback would eliminate the redundant coverage possible with observations obtained in October-November of 1997 and 1998. This redundancy will be a key analysis tool for characterizing and identifying problems with the data.

A release date of April 15, 1999, represents a reasonable compromise between political necessity, project commitments, and technical prudence.

Table 1. Key Dates in Proposed 2MASS Release Strategy
April 15, 1999Release data to the world.
March 15, 1999Final dataset available for final analysis, checkout, staging at IPAC and NSSDC.
December 15, 1998End of processing for data to be included in release. Begin 3 month checkout period. We need this period of analysis and checkout before finalizing the dataset. This analysis should include overlapping data from October-November 1997 and 1998.
October 15, 1998Last day for observations to be included in release
June 15, 1998Release small amount of sample data at IR Surveys Conference. Fully functional archive hardware and software ready for community access.
June 15, 1998Begin processing data from the Southern telescope.
April 15, 1998Begin tuning process with data from Southern telescope. Date depends on actual schedule for Southern telescope.
February 15, 1998Begin processing sky with Version 2.0 software.
November 15, 1997Agree to release strategy and products.

  1. What Do We Release?

The first release of the 2MASS survey should be of high quality, of a large amount of sky covering a variety of celestial environments, and should be as similar as possible in format and data selection criteria to all subsequent releases.

  1. Sky Coverage and Data Volume

I propose releasing about 15% of the Northern sky or roughly 5% of the total survey data volume. This corresponds to about 6 months of observations, or roughly 0.75-1.0 Tbyte of data, and will give the community an enormous amount of information to digest. It is probably a mistake to plan on releasing data from the Southern survey since the schedule for the entire Southern operation is highly uncertain. There are so many potential problems with a new camera, telescope and a very remote operations system, that it seems foolish to put the solution of those problems in series with our initial data release. If for political reasons it is deemed important to release a small amount of southern data, then we could choose regions that overlap with the Northern survey. These data would provide a valuable check on and serve as a concrete demonstration of the North-South uniformity of the survey. We might also consider releasing a small region that overlaps with any publicly available DENIS data.



Release about 6 months of Northern survey data, selected from the observing period from June 1997 to October 1998.

Release no data from the Southern Survey.

We must decide what subset of the data to release: large coherent blocks of sky (at high, medium, or low latitudes) or large numbers of scans chosen for temporal contiguity by the accident of the survey scheduling. While the first approach is more attractive scientifically, it will be easier to process, re-process, and assess the quality of data on a night-by-night basis. The fact that the survey scheduling favors observing contiguous blocks of scans goes some of the way toward the scientific desideratum of releasing large areas of sky.

  1. Release large temporal blocks of data (sequences of whole days scattered over the sky) or
  2. Release large, spatially coherent blocks of data (large blocks of sky scattered through many different days).

The Extended Source catalog is a special case. I suggest a point source density cutoff that ensures high galaxy reliability at meaningful flux levels. This will probably turn out to be something like N(K)<5,000 sources per sq. deg, or |b|>10 deg.

Release only reliable galaxies as determined by some threshold in source density or galactic latitude.

  1. What Data Products do We Release?

Table 2 lists the major 2MASS data products. In trying to decide what to release, we must choose between two competing philosophies: 1) release everything since this is a release of all of our data products for a limited piece of sky; or 2) release only what we fully understand. These pressures relate primarily to whether we release only catalogs of reliable sources or full source databases. The latter will contain good sources that are considerably fainter than the catalog limits, but will also contain considerable amounts of garbage. For the sake of kicking off a warm debate, I suggest the following:

Release Point Source and Extended Source Catalogs.

Magnitude limits should be set to ensure high reliability, considered separately for single band and multi-band sources.

Release Compressed and Full Resolution Images

Image headers must contain artifact information

Table 2. 2MASS Data Sets
ProductComment
Point Source Database (PSDB) Fainter magnitudes but lower reliability. Lots of chaff that could confuse the unwary. Lots of interesting extragalactic objects!
Point Source Catalog (PSC)Strict SNR/magnitude cutoff and quality criteria
Extended Source Database (ESDB) Fainter magnitudes but lower reliability. Lots of chaff
Extended Source Catalog (ESC) Strict SNR/magnitude cutoff and quality criteria
Extended Source Postage Stamps Full resolution image atlas for catalogued galaxies
Full Resolution Atlas ImagesEnormous data volume. Need header information to avoid reliability problems from persistence, etc.
Compressed Atlas ImagesTractable data volume, compromised photometry and astrometry. Need header information to avoid reliability problems from persistence, etc.
Explanatory SupplementWeb document (Ap.J. article?) describing 2MASS data processing, basic analysis, caveats, etc. Preliminary version must accompany initial release.

Release Galaxy Postage Stamps

Do Not release Point Source and Extended Source Databases!

Protect the user from the chaff, recognizing that many good sources will be visible in the images. These will be present in the databases, but not in catalog.

C. How Do We Release The Data?

I suggest releasing about 10-15% of the survey data. Table 3 describes a possible release scenario that uses IPAC for on-line catalog and compressed image access and the NSSDC for on-line full resolution 7 image access.

1) All data available only through on-line access? or

2) Make hard copy (tape, CD, DVD) available of the entire dataset to interested user?


  1. What Compromises Do We Make in this Release?

There should be little need to compromise any of the Level One specifications of the survey. The major area of potential compromise stems from the lack of homogenization of the photometric and astrometric properties of the survey; this procedure will be possible only after a large fraction of the sky has been observed and maximum advantage can be taken of the overlapping coverage (scan-to-scan and North-South). However, our astrometry is already on the global Hipparcos reference frame and should be quite uniform. Similarly, the individual calibrators from Persson are probably good at the 1-2% level in terms of global uniformity. Point sources should satisfy or exceed the requirements in all other important areas --- astrometry, completeness, reliability. Our knowledge of extended sources will probably lag behind that of point sources. Thus we might consider scaling back slightly on our limiting magnitude or sky coverage, i.e. avoid confused regions, to ensure that we publish only highly reliable galaxies.

E. What Resources do We Need?

  1. People

Careful analysis of the data is the key to releasing a high quality product. Active participation by the 2MASS science team, particularly the Core Project Teams, working in conjunction with IPAC and UMASS personnel is essential to the timely release of the 2MASS data. Fall of 1998 will be a very active period of analysis and data checkout. We should also develop a group of outside community scientists to serve as beta testers to work with us on the final products before we release them.


Strong science team participation is essential in the months before data release.


2. Access Software

IPAC will need to have a relatively complete data archive system capable of accessing a few hundred GByte of source and image data. A preliminary version of the image server and database access is available now. A fully functional version is scheduled for release in January, 1998. This is a critical milestone that must be adhered to since it is likely that one or two more releases will be necessary before we can be sure that the access hardware software and hardware are up to the task of supporting first the science team analysis and eventually outside users. The volume of data the science team will have to access will be considerably greater than that of the eventual release since the released data will be a subset in terms of magnitude limit, spatial coverage, and temporal coverage.

3. Hardware

We will need adequate production capacity (2 and possibly 3 Sun Enterprise 450's) to handle both telescopes and the existing backlog. We have the budget required to buy these machines. The acquisition schedule will be dictated by the start of Southern survey operations.

Table 3. Initial Release Scenario for 2MASS Products
Full resolution image data1 Tbyte On-line at NSSDC and available through IPAC Archive system
Galaxy Postage Stamps

(50x50 pixels)

2 GByteOn-line at IPAC and available through IPAC Archive system
Compressed images50 GByte On-line at IPAC and available through IPAC Archive system.
Point Source Catalog: 50 million objects
Short Form

(150 bytes per source)

7.5 GByteOn-line at IPAC.
Full records

(525 bytes per source)

26 GByteOn-line at IPAC
Extended Source Catalog: 100,000 galaxies
Short Form

(200 bytes per source)

0.02 GByteOn-line at IPAC
Full records

(2500 bytes per source)

0.2 GByteOn-line at IPAC

We will need adequate database access. We have identified resources from NASA HQ outside of the 2MASS budget to purchase a high performance disk system and a dedicated multi-processor server(s?) to handle the access requests through Informix and an IPAC designed Web interface. This system will be on-line early in 1998.

The NSSDC must be ready to receive and provide access to the 2MASS image archive to support the release. We are actively pursuing an MOU with the NSSDC in this area.

II. Early Release Data

We have been asked by NASA Headquarters to consider an early release of small amounts of 2MASS data to build up interest in 2MASS among the user community and among the general public. We should adopt two ground rules in acceding to this request: 1) release no data that would damage the reputation of the survey; 2) minimize extra work that would distract UMASS and IPAC personnel and the 2MASS science team from their primary responsibility of releasing the best possible survey in the shortest amount of time. In determining which data to release and when to release it, we must decide what are the reasons for early release of data:

  1. Headquarters wants us to.
  2. Introduce the scientific community to the 2MASS data products and get feedback about the specific data formats, user tools, etc. that would be useful in developing our plans for full data release. This needs to happen rapidly so that we can incorporate any comments into our long range release plans.

Table 4. Proposed Early Release for Different Products
PurposeVolume Proposed TimescaleData Quality Issues Comment
Popular InterestGrowing atlas of pretty pictures (100+) Available now. Just need to add a link to the public Web page. No photometric data or catalog information provided Web picture atlas using GIF/JPEG images.
Introduction for community, feedback on formats, tools Very small volumes of Point and Extended Source catalogs, full resolution images January AASGood. Cut at high SNR to ensure reliability. Hand checked fields A few well chosen fields in variety of environments will suffice. ASCII text fields on Web!
Follow-up observations.

Long term community support

~1 night of high quality data including PSC, ESC, full (?) and compressed images June IR Surveys conferenceHigh reliability, Relaxed requirements on global photometry Range of galactic latitudes. Data accessible via Web tools.

  1. Put 2MASS data into the hands of the science community to enable timely follow-up observations with other ground-based and NASA assets (AXAF, NICMOS on HST, etc). This needs a modest amount of data (1 to a few nights in a variety of celestial environments) so that people interested in full scope of 2MASS data can test their ideas against our data. Since this is a real release of science data, the overall quality should be good. The one area we could relax without significant risk would be global photometric uniformity.
  2. Generate community interest in and advocacy for 2MASS data so that if and when funding problems arise, there will be strong advocacy for continuing 2MASS. To a great extent developing this community support will come from satisfying #3 with good quality data.
  3. Generate popular interest in 2MASS by releasing pretty pictures. This is easy to do in a timely manner using the Web to make GIF-type images of interesting fields available.

Table 4 outlines three different data releases and the relevant timescales be consistent with these goals.

What resources are required to accomplish these releases?

Popular release of pretty pictures: Science liaison postdoc (funded by HQ; no candidate yet identified). In the interim, Roc is doing a great job of making pictures from fields suggested by various science team members. Point your browser at The 2MASS Image Gallery ,

(http://spider.ipac.caltech.edu/staff/roc/2mass/images/images.html).

Small data release at January AAS: IPAC staff plus one or two members of science team to assist in hand-checking of the data. Need one science team member each for point sources and galaxies.

Modest data release at Surveys conference: IPAC staff plus one or two members of science team to assist in hand-checking of the data. Need one or more science team member each for point sources and galaxies. Full database access capability via IR archive system at IPAC must be ready for use by outside community. I envision some science team members resident at IPAC for a week or more during the spring to work in their area of responsibility.