Subject: Parameter Prioritization Date: Mon, 06 Oct 97 17:01:07 +0100 From: schneide@wilt.phast.umass.edu To: Tom Jarrett CC: schneide@wilt.phast.umass.edu, tchester@ipac.caltech.edu, huchra@cfa.harvard.edu, skrutski@phast.umass.edu, roc@ipac.caltech.edu This isn't quite what Mike asked for, but here is my take on the parameters in the list that Roc prepared. ---Steve ---------------------------------------------------------------------- GALWORKS -- Parameter Prioritization In reviewing the various parameters that need to be tuned for GALWORKS operation, I think it is useful to sort them by function and their basic order in the processing scheme. This is my first pass at doing that, and I'm embedding a number of comments and questions about each. The parameters listed by Roc are marked by ">" in the first column. Issues I think are important are marked by "***". Many of the questions are rhetorical in the sense that I don't disagree with the values listed, but I just want to make sure that I understand how the particular parameter values are being chosen. Please let me know if there are misplaced or misinterpreted parameters! I. INPUT DATA The first question is, what parts of the image are we going to examine? This is partly an issue related to the geometry of the frames, and partly tied to confusion issues, mostly caused by bright stars. These are mostly associated with tunable parameters, but there are some additional issues that fall in this category that also need to be addressed: A. Frame Geometry > dx,dy from edge to work: Edge_lim = 10 We are working within 10 pixels of the frame edge. Since the frames nominally overlap by 10% (25 pixels), this should guarantee that an object will always be picked up on a frame or its neighbor, but this needs to be confirmed: ***Is the requirement on minimum frame overlap adequate to guarantee that no sources will be missed? ***Is the "dy" requirement consistent with the plan to make the frames handled by GALWORKS non-overlapping? ----- answer from t. jarrett: To the best of my memory, I do not think we have every run into a situation where we have lost sources on the edge when they are > dx = 10 pix or so. for dy= 10, this implies that we will "dupe" galaxies along the top/bottom edge of the coadd. the db will have to decide which apparition is better (i.e., use the one with dy maximized). -------- There is another postprocessing parameter that feeds back into the geometry question, so I will raise it here: > fraction area limit of bright galaxy processing > minimum ratio between gal size and gal appearance on coadd: BGALRATIO = 0.75 > (i.e., at least 75% of gal must be on coadd) ***Are there pathological circumstances where this will force a galaxy to be omitted? [For example, suppose a galaxy sits halfway between frames, 12 pixels from the edge of each. If the galaxy is 50 pixels in diameter, would it be dropped?] -------- answer from t. jarrett: you mean if the pointing were such that two adjacent scans are not overlapping exactly by 10%? in which case, you could get situations where a galaxy is too close to the edge of both scans; the ideal situation is where the gal might be dx=12 on one scan, but dx = 38 from the other. Just to make sure we are on the same page, the BGALRATIO refers to known galaxies in my input catalog. Even though the gal might be missed for processing, its pieces are cut out and the object is noted in a file; I can't wait to see what happens when we process the scans with Maffei I and II (as well as M31, which I also requested). Will we detect it? What if it is near the edge of the coadd -- will BGALRATIO fail for its assumed size (which is huge; but not to the eye!)? I bet we will have to tinker with the params once we try this experiment. So I think this parameter is still subject to change or even modification. --------- ---------- comment from t. chester: this relates to the max. size of gals that we were supposed originally to process, set by the overlap region, of 0.8'. we will definitely be incomplete for larger gals, unless you really want to report mags for gals that are, in the limit, only 50% on a given coadd. i think we have to be very careful here, since the potential for embarrassment is high. i foresee a slew of papers pointing on the "bias" in the mags reported by 2mass...... see: http://spider.ipac.caltech.edu/staff/tchester/2mass/facts/survey_properties.html#cvssize tj says this parameter is only for known biggals - what prevents working on only half a galaxy that is not in the known biggal catalog? -------------- > maximum size of known gal to process (diam in arcmin): BGALDIAM ***I would guess that this ought to depend on the previous value. If a large galaxy falls entirely on a frame, it would be useful to run it through the processor for comparison to other measurements. Partial galaxies are not so useful. ---------- answer from t. jarrett: this parameter is not need if we use BGALRATIO, which automatically eliminates BIG galaxies. BGALDIAM was the original cut param at the known galaxy list; then m51 came along and we invented BGALRATIO so that big galas like m51 (which bychance, just happen to land in the central region of the scan) can be processed ----------- B. Bright Star Masking >mag limit for bright star masking: BRIGHTLIM = 9.0 (func of stellar num dens) > >other bright star masking parameters: > mask radius (function of star brightness, and stellar num density) > npersistence masks (function of star brightness) > diffraction spike length/width (function of star brightness) > horizontal stripe masks (function of star brightness) ***Rather than making these all tuneable parameters, I wonder if there isn't a simpler approach: If the various features scale roughly linearly with the source brightness, it should be possible to generate, in effect, a high dynamic range PSF (or better, a "worst case scenario" PSF). The pixels that are masked would then just be those that exceed a fixed fraction of the local value of "Sigma" (based on both pixel and confusion noise). This would require then only a single tunable parameter. --------- answer from t. jarrett: having trouble visualizing this and extrapolating the implementation thereof; how do we build this PSF? Does it resemble in any sense the actual PSF (from prophot).? My guess, not in the least does it look like the psf. Bright stars are truely diabolical. We are currently building a "look up" table of relevant param values, such as blanking confusion radius, spike length, and horizontal stripe blanking. Please see my html doc for preliminary results on this analysis: http://spider.ipac.caltech.edu/staff/jarrett/2mass/3chan/brightune/btune.html Notice the scatter in the plots. This is due to the uncertainty in the R1 mag -- saturation and such. Very tough problem. I have also fooled around with a method by which we fit a radial probile to the object -- compute median in annular rings. Then determin confusion radius from the point at twhich the profile drops into the noise. But, this as anything else, requires tuning. And it requires CPU -- lots of it in the plane. I think I prefer look up tables. At least we will be consistent. -------- Thus I will also list the next parameter here as well, although it feeds into many other aspects of GALWORKS more directly: > stellar density limit at which confusion noise is used instead of poisson noise ***I would suggest making this a smooth function by simply quadratically adding the two noise levels. --------- answer from t. jarrett: sounds doable. --------- C. Individual Frame/Pixel Rejection ***Some of the spurious sources with which GALWORKS has had problems are associated with temporary features that I would have expected the pre-GALWORKS processing to eliminate in the frame comparisons. For example, meteor trails and the horizontal stripes associated with bright stars should not appear in all six frames. Is there some way to avoid these problems before they arise? This question is not associated with a GALWORKS tunable parameter, but it one I have been asking about for awhile, and have not yet gotten a clear answer. Perhaps it associated with tunable parameters in one of the other processors. ------------ answer from t. jarrett: according to Gene, most meteor trails are not fixable. They live in a param space (talking about frames now, 6 pixels per piece of sky) not unlike bright galaxies; by fixing meteors, you also fix big gals (like M51) -- very horrible mess it leaves. by carefully tuning, some trails (the fainter ones) may be washed away -- still an open issue. This is Gene stuff -- so you may want to direct question to him. But my take is, we are going to have to live with streaks ... not a good thing. All is not lost however, since meteor streaks leave behind plenty of clues that they are there -- something the DB should be able to catch. Even Gene may be able to see them with some cleaver searching of his "solos" file. -------------- II. DETECTION Masking should eliminate a small enough area that it is not a global problem, just a local one, but if the detection thresholds are too high, we will be irretrievably incomplete. The desire here is to make the detection thresholds as liberal as possible given data-storage constraints, so that subsequent re-processing can recover and discard sources as our knowledge base grows. As much as it is possible, we want to try to guarantee that local effects like stellar number density do not change the character of the sample being extracted, or at least that an extractable subset exists that is uniform in nature. The detection parameters are probably the most critical when it comes to meeting the Level-1 specs. Fortunately, Tom Jarrett has done extensive testing and evaluation of the various parameters here, so the effects of different parameter choices are pretty well understood. A. Sources Considered > detection threshold (nsigma) ***Is nsigma based on confusion noise at low Galactic latitude? ---------- answer from t. jarrett: yes, the noise is determined from the histogram of the coadd pix values. the confusion noise (or lack thereof) is contained in this distribution ---------- > initial score thresholds > P_scorelim = score limit for preliminary mxdn scores = 0.00 > C_scorelim = score limit for mxdn, sh, & wsh tests = 2.00 > T_scorelim = score limit for trip killers: vmean, vint, sctrip = 2.00 > R_scorelim = score limit for r23 = 0.00 ***Are we going to use a decision tree approach to integrate the use of these parameters (after training it on a well-studied subset)? or are we going to stick with a simple parameter cutoff approach? ------------ answer from t. jarrett: both; for galworks, we apply the thresholds given above; they eliminate most of the stars without loss of gaalaxies; later, in a post proc phase, we will apply some kind of decision tree analysis to cull out the remaining stars, double and trip stars. It still remains to be seen if OBDT can do the job -- Chester thinks they are a bit dubious, since the nature of our data seems to not match the power of DTs (but, this is only a hunch .. not verified as of yet). A lot of work remains in this arena. Chester will have a word or two to say about reliability and construction of catalogs. ------------- B. Sources Kept for Further Processing >mag limits for gal candidacy (function of stellar num density) > Klim = 14.25 > Hlim = 14.75 > Jlim = 15.5 ***Are these, in effect, point-source magnitude estimates that determine the initial cut, or are they some type of aperture magnitude? -------- answer from t. jarrett: both; early in galworks, all we have are "stellar" mags -- obviously not good for gaalaxies; at this point, I relax the thesholds above, something like 0.5 mag. later in galworks, we have a suite of aper mags to choose from for cuts in flux space -- we then strictly apply the thresholds given above. -------- ***Why do these parameters grow closer to the level-1 sensitivity limits as we go from K (0.75 mag difference) to J (0.5 mag difference)? ---------------- answer from t. jarrett: no good reason; historical reasons mostly; K limit is really too high, but we wanted to look at everything even if faint! we will end up chucking most of the faint things anyway (certainly these will not end up in the cat); this brings up a diff but related issue; GALWORKS runtime is substantial -- some think it is too long! JLIM,HLIM,KLIM are directly related to run time, so we may even need to lower the thresholds ... not clear at this time. certainly for high source density we want to lower the thresholds, which is what is done -- note this does not affect the lev-1 spec ---------------- C. LCSB Processor A similar set of parameters determine > LSB detection thresholds > LSB mag limits > LSB blocked-snr limits for gal candidacy ***Not all of the individual parameters for algorithm 2 are listed here, and they have not been as thoroughly tested as in algorithm 1. Is a decision tree being seriously considered here? I would argue in favor of a fixed set of cutoff criteria to keep the interpretation simpler. ---------- answer from t. jarrett: You are correct, I have only mentioned a couple. The ones I think are important surface in my html docs when I address algor 2 issues. I do not plan on running a DT. I think these candidates should go to a separate catalog where they can be looked at by courageous people :) The cutoffs you mention have not been determined yet, other than the mag limits and the BSNR limit = 3 or 4. ------------ III. BACKGROUND MANIPULATION Removal of and substitution for individual pixels is needed to improve the measurement accuracy in the presence of confusion. Unlike masking, which simply eliminates portions of the sky, background manipulation runs the risk of introducing subtle biases into the data. Again, this has been quite thoroughly tested, but there are a few questions about its operation. > mag limits for star subtraction and blanking > radius of star subtraction / blanking (function of star brightness) > radius of isophotal substitution area ***A question of definitions: to me blanking refers to marking of an area to be isophotally substituted, while subtracting refers to removal of the portion of the flux attributed to the confusing source. It sounds like you are defining these differently. --------- answer from t. jarrett: yes, I have done both and have finally settled on blanking with isophotal substitution; why? subtraction has always been very difficult to perform robustly; that is, under a diverse set of circumstances it can give wacky and desctuctive results. isophotal substitution, on the other hand, seems to work more robustly (I have tested this on a wide range of stellar nuber density); the user can still recover whatever flux I have blanked and substituted for -- this info is written to a file. Also, the user will have access to the postage stamps, which can be manipulated at will. ---------- ***Are masked areas isophotally substituted for sources whose edges read into the mask? ----------- answer from t. jarrett: not sure I understand this; can you rephrase the question? ---------- > stellar density limit at which to attempt galaxy deblends > deblend_density = 3.1 (default) > deblend_density = 0.1 > stellar density limit at which stars are blanked before ellipse fitting > ellfit_density = 3.5 ***What are the units here? ----------- answer from t. jarrett: log stars per sq. degr for K brighter than 14th deblend_density = 0.1 means that we never attempt deblend; galaxy deblending is tricky stuff and you can make things worse; so I am hesitant to turn it on without plenty of analysis first; ------------ ***Are the two values Roc quotes for deblend_density for different parts of the processing? --------- answer from t. jarrett: ellfit_density referrs to the operation of determining the ellip params for a galaxy (using 3-sigma isophot); do we blank stars first, or perform op first: turns out, for low stellar density (most of the sky), you want to perform op first; why? because of the nature of stellar detection (not galworks) plenty of galaxies have multiple detections (i,e., pieces of the galaxy, actually associated with the gal); if we blank, we screw up the profile of the gal -- not good for low source density where stars are rare for high stell density, stars are everywhere and can significantly affect the true gal profile -- better to blank them away ----------- > LSB SNR limits for star blanking ***Why is this different than that for Algorithm 1? -------- answer from t. jarrett: got to set the lims higher because these guys are fainter at least in a smallish apertures -- remember they integrate up over a bigger aperture to something worthwhile -------- > blanking radius after object processed (function of curve of growth) ***Is this based on some criterion that the surface brightness drops below a particular fraction of the local noise? --------- answer from t. jarrett: it is now based on the Kron radius , which is related to the true size of the galaxy; I blank some factor bigger than this radius; the curve of growth did exactly what you note above; but it was never all that robust. ----------- IV. DATA PROCESSING Last of all, the data are processed in various ways, and I suspect that there is a lot of potential for debate here. The choices here are to some degree arbitrary (in that we will effectively be defining our own measurement system). The important thing is that, as much as possible, we minimize biases as a function of magnitude and location on the sky. >mags limits for ellipse fitting; otherwise use "super" coadd vals > Klim_ellipt = 13.5 > Hlim_ellipt = 13.9 > Jlim_ellipt = 14.2 ***Are these pre-aperture fitting magnitudes? (I'm confused about when the magnitude is being fit vs. when the aperture is being fit.) ---------- answer from t. jarrett: radius = 10 aperture mag ---------- ***Why are these changing so much relative to the different bands' sensitivity limits? ------ answer from t. jarrett: no good reason; again historical and based on results as I see them in analysis; there is no firm foundation for these lims (but I do know that, for example, K > 13.5, the ellip fit will be terrible; similar for the H and J lims ) perhaps SNR would be a better param to use ... but SNR for what aperture, or for what annulus? any ideas here Steve? Maybe sims are the way to go here ... --------- > mag limits at which to jettison K as fiducial ***If the previous question is saying we're measuring the K ellipses down to the catalog limit, then we would seem to be using K over the entire range of K sensitivity. --------- answer from t. jarrett: not sure what you mean; K to 13.5 is not the entire range of sensitivity, we go 0.5 mag fainter (but not reliably, of course); can you rephrase the note? ------------ > n-sigma contour for ellipse fitting (default = 3) ***As I have said before, I think it would be more useful to have isophotal values--hopefully at two levels. I know this runs into problems at low Galactic latitude, but we need consistency more than anything else for T-F applications. -------- answer from t. jarrett: absolutely; we have not written this business in stone; 3-sigma was always a prototype limit; If I recall correctly, you were advocating isophtal vals of 19.5 mag/asec**2 and 20 ?? Can you verify? If we have more than one value, than we are talking about nearly doubling our photometric measurements!! I'm sure you did not really mean this. Steve, this is a great time to clear this stuff up and come to some definite decision. I'll go with an isophotal value, just specify which one. Does John have an opinion on this matter? --------- --------- comment from t. chester: because the background level varies by a factor of 2 or so, the noise varies by sqrt(2) or so. hence we could live with using the isophotal contour that corresponds roughly to on average snr=3, probably around 19th mag. tj has determined that you want to use something near that value, since if you go to brighter mags, you are dominated by psf, and lower mags will be dominated by noise. --------- >minima/maxima limits for profile fitting and scale length measurement > radmax = 15.0 > radmin=4.0 > radmax_elfit= 12.0 > radmax_wedge=8.0 ***Can you give a little more detail about these parameters? Is this saying that the same radial range is used regardless of the, e.g., isophotal size of the galaxy? --------- answer from t. jarrett: this is for the scoring business; the radial profiles are fit in many different ways; the lims above set some boundaries; they are inspired by star-gal discrimination (whre the radial profiles are small); big galaxies are not a prob for star-gal discrim, so we don't need big radial ranges. does this answer your q? these params do not require tuning (this was done years ago) --------- > SNR limit at which to extract postage stamp > LSB SNR limits for stamp extraction ***These parameters are basically driven by storage space, right? I would say that they might as well be the same as the official sensitivity limits. ---------- answer from t. jarrett: yup; and you are probably right about setting the final thresh limits lsbs, if you are not careful, can totally dominate the counts and the disk space ! (and they are all false!! at least, relatively speaking) ----------- > curve of growth convergence criteria > a) flux growth > b) stellar contamination ***These seem relatively arbitrary, so long as the resulting magnitudes are flagged according to the stopping criterion. ---------- answer from t. jarrett: no longer do curve of growth; not robust ---------- > petrosian ratio to compute petrosian radius and flux ***It's not clear that we have enough dynamic range to get a useful Petrosian radius and flux. I'd like to hear John's comments on this issue. V. MISCELLANEOUS > mag limit for bright star processing (algor 3) ***Can you explain what this parameter and algorithm do? --------- answer from t. jarrett: aha, you noticed! The algor is discussed in the sds , it is hidden in the appendix. No work has been done since the CDR, so there is much to do here. Below is lifted from the sds. EXTENDED BRIGHT STAR DETECTION It is possible to detect emission surrounding bright stars if the star and its associated reflected light (e.g., diffraction spikes) are properly subtracted from the coadd first. The algorithm described here (a.k.a algorithm 3) is designed to find bright emission around (but well beyond the PSF of) bright stars. 1. Sort bright list according to flux 2. subimage: 160 X 160 3. blank all nearby brighter objects; blank radius = 25 4. blank all nearby fainter stars (but still bright stars); blank radius = 20 5. blank area centered on bright source, radius = 20 6. blank all sources with a local maxima and peak flux > 3*sigma; blank radius = 6 7. blank diagonals (diffract spikes) 8. compute stats for surviving pixels in subimage mean and standard deviation of the mean, standard deviation of mean from zero (mean - 0) >> (submean, subsigma_1, subsigma_2) 9. parameters: sub_ratio1 = 100 * [ (subsigma_1 / sigma) - 1] sub_ratio2 = 100 * [ (subsigma_2 / sigma) - 1] 10. Apply criteria sub_ratio1 > 25, or sub_ratio2 > 20 -------------