T. Jarrett, IPAC
(980820)
The 2MASS project will release a small set of data to the public this fall (1998). The data are comprised of science (survey) scans from the RTB night of 971116n.
The following plots show the expected completeness and reliability for the "extended" source catalog corresponding to this data release. The section "Classification" below gives the gory/tedious details of what/how this is done. To skip over to the results, click on Results.
DATA
The night of 971116n consists of 80 full 6 degree length scans (a.k.a "science" or "survey" scans). Nearly all of the scans cross areas of relatively low stellar source density (the exception being scans 102 to 109, which have "moderate" stellar number density). A block of scans, 21 to 28 (27 is the worst), are affected to one degree or another by a VERY bright star, beta Peg. The "artifact" count is rather high for this block of scans.
The 971116n results presented here were run with the latest 2MAPPS pipeline software (version 2.1), processed Aug 19, 1998. Extended source candidates are identified and extracted by the 2MAPPS subsytem GALWORKS. An additional set of routines are used to "classify" the candidates as either "extended", "galaxy", or false extended sources (which is a vague classification at best). These routines are referred to as the 'automated' classification processor.
Note: I have pulled the data from the online (rodan:/o2/OPS) version instead of the database version (because the DB is not yet ready).
Classification
In order to properly assess of the performance of GALWORKS and the automated classification routines on the 971116n data set, each extended source candidate was visually examined (along with the DSS) and classified accordingly (e.g., galaxy, star, double star, artifact, etc). The C & R is measured by comparing the "eye" classification (which is presumably the best we can do) with the automated classification (discussed below). The completeness refers to the "internal" completeness (NOT the absolute completeness, which is difficult to measure), while the reliability is the differential comparison between "true" and "false" classifications.
The automated classification process are performed in two different ways. The first generates what we call the "extended" classification (or "flag") and the second generates the "galaxy" classification (or flag).
"Extended" Classification
"Extended" source classification is a broad catagory that may include both galaxies, galactic fuzzy objects (e.g., nebulae, HII regions, etc), multiple groupings of stars (e.g., triple stars, globular clusters, etc), or just about anything else that is not a solitary single. However, in general we would like to eliminate as many double and triple stars from this catagory since they are rarely considered "extended" by the astronomical community. Nevertheless, the "extended" catagory of sources should NOT be confused with the "galaxy" catagory of sources -- which by design (level-1 specs) are to represent real extragalactic objects and of whose catalog is to be as reliable as possible (>98%). More on "galaxy" classification is given below.
"Extended" source classification is performed by applying a threshold cut to the star-galaxy discrimination parameter "wsh" (see Star - Galaxy Discrimination Parameters for more information). We intentionally wanted to keep this operation simple so that the user can understand how the "extended class" catalog was generated from the extended source database. "Wsh" is an excellent parameter to cull out double stars (and to a lesser degree, triple stars).
The threshold is a simple linear function between JHK mag and "wsh". The mags refer to the fixed circular radius = 7" aperture photometry.
K_thresh = 4.0; Kmag > 14.0
H_thresh = 4.0; Hmag > 14.5
J_thresh = 4.0; Jmag > 15.2
e.g., for Jwsh >= J_thresh, then classify as "extended"
We generate 1 flag for each band. The flag has the following meaning:
We then compute a band-merged flag by weight-averaging the three individual band flags:
It is the weight-averaged "extended" flag that will be used to compute the C & R of the 971116n "extended" catalog (the equivalent "galaxy" weight-averaged flag will be used to compute C&R for the "galaxy" catalog; see below).
"Galaxy" Classification
Now we come to "galaxy" source classification. Here we really mean extragalactic fuzzy objects (re: galaxies). The level-1 specs require that we find >90% of the extended "galaxies" with >98% reliability (Note: by "extended" we mean that the measured "sh" score is >= 10.0). This spec is quite challenging and thus requires that we use all of the information that GALWORKS kindly provides (nearly a dozen star-galaxy parameters and such). One way to combine all of this information is to solve a decision tree (or neural net) for all parameters simultaneously, finding the optimum combination of the said parameters in the N-space that they comprise. This operate is very complicated and, unfortunately, rather unintuitive -- it very much is a "blackbox" operation. Nevertheless, the techniques do yield rather promising results. For now, we will generate the "galaxy" classification by employing an oblique decision tree method (OBDT).
A discussion of applying OBDT to 2MASS extended source data are given here Application of Oblique Decision Trees to the GALWORKS Extended Source Candidates .
What we have done here is to use effectively 10 pieces of information per band per source:
The decision trees are generated by using data from all of the 2MASS RTB sets (comprising some 20000 objects, covering >200 sq. deg, painting both low, moderate and high source densities). We have generated separate trees for each band and for "low", "moderate" and "high" sources densities, defined as:
Each tree has in fact two solutions, "pruned" and "unpruned". The latter is the full glory OBDT, including all sub-branches and hyper-planes. "Unpruned" tree do have the danger of over-fitting the data, so both sets should be tested appropriately.
So given a tree (which is a function of the density, band, and whether it is pruned or not), we generate 1 flag for each band. The flag has the following meaning:
We then compute a band-merged flag by weight-averaging the three individual band flags:
It is the weight-averaged "galaxy" flag that will be used to compute the C & R of the 971116n "galaxy" catalog.
Results
There are some 4000 extended source candidates from 971116n.
About 300 of which are of the "unknown" catagory, 54 of the
"artifact" catagory and some small number that are
"not-verified". This leaves about 3700 sources for automated
classification and comparison with the "eye" classification.
The following plots show the C & R results as a function of
weight-averaged flag threshold for the following mag bins:
For the "extended" flag, the best threshold (here we want to optimize
completeness) appears to be about 1.3, giving C>95% and R>90%.
For the "galaxy" flag, the best threshold (here we want to optimize
both C & R) appears to be about 1.4, giving C>95% and R>97%.
So, given a thresholds of 1.3 & 1.4, the following table gives the C & R results
for all mag bins.
C & R using "extended" classification flag: threshold = 1.3 14.5 <= J < 15.0
The mag corresponds to the fixed circ radius=7" aperture photometry.
For the weight-averaged flag threshold, values near 1.0 correspond to
highly probable galaxies, while values near 2.0 correspond to
highly probable "false" galaxies (i.e., double stars, etc).
14.0 <= H < 14.5
13.0 <= K < 13.5
C & R corresponding to the "extended" flag
solid line == C
dashed line == R
class_lim == weight-averaged "extended" flag threshold
C & R corresponding to the "galaxy" flag
solid line == C
dashed line == R
class_lim == weight-averaged "galaxy" flag threshold
| mag- | mag+ | LIM | nTG | ng | Cj | nTb | nb | Rj | nTG | ng | Ch | nTb | nb | Rh | nTG | ng | Ck | nTb | nb | Rk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5.00 | 9.00 | 1.30 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 |
| 9.00 | 10.00 | 1.30 | 0 | 0 | .00 | 1 | 0 | .00 | 0 | 0 | .00 | 1 | 0 | .00 | 1 | 1 | 1.00 | 2 | 0 | 1.00 |
| 10.00 | 11.00 | 1.30 | 0 | 0 | .00 | 6 | 0 | .00 | 3 | 3 | 1.00 | 11 | 0 | 1.00 | 7 | 7 | 1.00 | 11 | 0 | 1.00 |
| 11.00 | 12.00 | 1.30 | 6 | 6 | 1.00 | 13 | 0 | 1.00 | 30 | 30 | 1.00 | 20 | 0 | 1.00 | 49 | 49 | 1.00 | 23 | 0 | 1.00 |
| 12.00 | 13.00 | 1.30 | 44 | 44 | 1.00 | 31 | 1 | .98 | 191 | 186 | .97 | 50 | 1 | .99 | 348 | 340 | .98 | 66 | 1 | 1.00 |
| 13.00 | 13.50 | 1.30 | 100 | 99 | .99 | 33 | 0 | 1.00 | 254 | 251 | .99 | 89 | 0 | 1.00 | 620 | 615 | .99 | 106 | 2 | 1.00 |
| 13.50 | 14.00 | 1.30 | 193 | 187 | .97 | 97 | 2 | .99 | 666 | 656 | .98 | 154 | 17 | .97 | 1291 | 1240 | .96 | 248 | 48 | .96 |
| 14.00 | 14.50 | 1.30 | 419 | 415 | .99 | 131 | 16 | .96 | 1179 | 1130 | .96 | 406 | 82 | .93 | 382 | 363 | .95 | 322 | 66 | .85 |
| 14.50 | 15.00 | 1.30 | 895 | 880 | .98 | 380 | 84 | .91 | 368 | 355 | .96 | 192 | 49 | .88 | 17 | 15 | .88 | 170 | 34 | .31 |
| 15.00 | 15.50 | 1.30 | 934 | 893 | .96 | 250 | 48 | .95 | 24 | 19 | .79 | 32 | 2 | .90 | 0 | 0 | .00 | 7 | 0 | .00 |
| 15.50 | 16.00 | 1.30 | 124 | 106 | .85 | 13 | 0 | 1.00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 |
C & R using "galaxy" classification flag: threshold = 1.4
| mag- | mag+ | LIM | nTG | ng | Cj | nTb | nb | Rj | nTG | ng | Ch | nTb | nb | Rh | nTG | ng | Ck | nTb | nb | Rk |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5.00 | 9.00 | 1.40 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 |
| 9.00 | 10.00 | 1.40 | 0 | 0 | .00 | 1 | 0 | .00 | 0 | 0 | .00 | 1 | 0 | .00 | 1 | 1 | 1.00 | 2 | 0 | 1.00 |
| 10.00 | 11.00 | 1.40 | 0 | 0 | .00 | 6 | 0 | .00 | 3 | 3 | 1.00 | 11 | 0 | 1.00 | 7 | 7 | 1.00 | 11 | 0 | 1.00 |
| 11.00 | 12.00 | 1.40 | 6 | 6 | 1.00 | 13 | 0 | 1.00 | 30 | 30 | 1.00 | 20 | 0 | 1.00 | 49 | 49 | 1.00 | 23 | 0 | 1.00 |
| 12.00 | 13.00 | 1.40 | 44 | 44 | 1.00 | 31 | 0 | 1.00 | 191 | 190 | .99 | 50 | 0 | 1.00 | 348 | 347 | 1.00 | 66 | 0 | 1.00 |
| 13.00 | 13.50 | 1.40 | 100 | 100 | 1.00 | 33 | 0 | 1.00 | 254 | 253 | 1.00 | 89 | 0 | 1.00 | 620 | 613 | .99 | 106 | 2 | 1.00 |
| 13.50 | 14.00 | 1.40 | 193 | 192 | .99 | 97 | 0 | 1.00 | 666 | 658 | .99 | 154 | 6 | .99 | 1291 | 1257 | .97 | 248 | 16 | .99 |
| 14.00 | 14.50 | 1.40 | 419 | 413 | .99 | 131 | 1 | 1.00 | 1179 | 1139 | .97 | 406 | 25 | .98 | 382 | 342 | .90 | 322 | 27 | .93 |
| 14.50 | 15.00 | 1.40 | 895 | 882 | .99 | 380 | 18 | .98 | 368 | 337 | .92 | 192 | 18 | .95 | 17 | 11 | .65 | 170 | 10 | .52 |
| 15.00 | 15.50 | 1.40 | 934 | 884 | .95 | 250 | 35 | .96 | 24 | 17 | .71 | 32 | 8 | .68 | 0 | 0 | .00 | 7 | 2 | .00 |
| 15.50 | 16.00 | 1.40 | 124 | 106 | .85 | 13 | 3 | .97 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 | 0 | 0 | .00 |