International Federation of Digital Seismograph Networks

Thread: Proposal for FDSN Waveform Quality Metrics

None
Started: March 1, 2016, 2:17 p.m.
Last activity: March 23, 2016, 7:32 a.m.
Reinoud Sleeman
March 1, 2016, 2:17 p.m.
Dear WG-II members,



based on the initial proposal to standardize FDSN waveform quality metrics and the

feedback from Rick Benson, Florian Haslinger, Dan Auerbach, Doug Neuhauser and

Lion Krischer (thanks all) please find this updated, extended document (including some

basic definitions and illustrations) that covers a number of your suggestions:





* added data header/quality indicator as field to properly/uniquely identify time series records



* as indicated by Florian and Doug the time tolerance epsilon of '0' would be an overkill.

It is set now to 50% of the sampling interval.



* time window notation repaired, e.g. [t0, t1)



* definition of the ms_timing_correction metrics: here I followed Doug's suggestion to use
the "time correction" (field 16 in the fixed header).



* all metrics that were defined as "the number of records" that match some criteria has been changed

to percentage of data matching that criteria



* renaming of the metrics based on mini-SEED record header



* the discussion on using the data quality code Q to indicate that (agreed) quality metrics
have been calculated is interesting but seems IMHO not so relevant and practical to use. The

current definitions of D, R, Q and M are not very clear and therefore do not provide useful
information. The manual says that for Q some processes have been applied to the data, but (a) it
is not clear what processes and (b) whether this has altered the data or not. Therefore,

I would prefer to identify QC parameters for each datastream that is uniquely defined by network,

station, location, channel and quality code (R,D,Q,M) and NOT to use Q to identify that QC

parameters are calculated and available



Please provide me with your feedback within 3 weeks, otherwise I assume you agree with these metrics

definitions.



Thank you and best regards,

Reinoud



  • Tim Ahern
    March 23, 2016, 7:32 a.m.
    Hello Reinoud

    IRIS agrees with the valuable coordination role WGII can play in this but has one major concern with your proposal. I have communicated this concern to you earlier in an email.

    This relates to the naming of the parameters. Some are simply too long. Rather than trying to completely identify what the metric is by a long name, IRIS thinks it is much preferable to have readily available documentation that covers the details. As an example ms__data_quality_flags__bit_0__amplifier_saturation clearly identifies my objection. IRIS is approaching this effort as something that both data centers and users will need easy access to. While computers don’t really care about the length of a field, users do as they sometimes provide this information as input that they hand enter. In this area I believe shorter is much preferred.

    Currenlty IRIS uses the nomenclature amplifier_saturation whereas your recommendation uses ms__data_quality_flags__bit_0__amplifier_saturation
    I think it would be much preferred to leave the details as to where or how the metric is obtained to the documentation and not include everything in the name of the metric. Also while the current implementation of IRIS in MUSTANG does pull the metric from the miniSEED FSDH that is fully documented elsewhere and does not need to be over specified in the metric’s name.

    IRIS has many terabytes of metrics already calculated and we shared our design specification, including metric parameter names a long time ago with EIDA. I was surprised that there wasn’t better communication of this issue well in advance of the WGII meeting in Prague and before we had invested so much effort within MUSTANG. I have checked with Rob Casey, the lead MUSTANG developer at the DMC and it is a non-trivial task to map metric names in all the places they are embedded in the MUSTANG system, services, and documentation and will have to be a fairly low priority for IRIS. I will not commit to adopting these lengthy names at IRIS if that is how the FDSN wishes to go. I am not at all clear as to how difficult changing metric names would be for EIDA. I am not even sure how many groups in EIDA have the metrics system up and running since I have never seen any documentation related to metrics within EIDA.

    Cheers
    Tim Ahern

    Director of Data Services
    IRIS

    IRIS DMC
    1408 NE 45th Street #201
    Seattle, WA 98105

    (206)547-0393 x118
    (206) 547-1093 FAX




    On Mar 1, 2016, at 6:19 AM, Reinoud Sleeman <reinoud.sleeman<at>knmi.nl> wrote:

    Dear WG-II members,

    based on the initial proposal to standardize FDSN waveform quality metrics and the
    feedback from Rick Benson, Florian Haslinger, Dan Auerbach, Doug Neuhauser and
    Lion Krischer (thanks all) please find this updated, extended document (including some
    basic definitions and illustrations) that covers a number of your suggestions:


    · added data header/quality indicator as field to properly/uniquely identify time series records

    · as indicated by Florian and Doug the time tolerance epsilon of '0' would be an overkill.
    It is set now to 50% of the sampling interval.

    · time window notation repaired, e.g. [t0, t1)

    · definition of the ms_timing_correction metrics: here I followed Doug's suggestion to use
    the "time correction" (field 16 in the fixed header).

    · all metrics that were defined as "the number of records" that match some criteria has been changed
    to percentage of data matching that criteria

    · renaming of the metrics based on mini-SEED record header

    · the discussion on using the data quality code Q to indicate that (agreed) quality metrics
    have been calculated is interesting but seems IMHO not so relevant and practical to use. The
    current definitions of D, R, Q and M are not very clear and therefore do not provide useful
    information. The manual says that for Q some processes have been applied to the data, but (a) it
    is not clear what processes and (b) whether this has altered the data or not. Therefore,
    I would prefer to identify QC parameters for each datastream that is uniquely defined by network,
    station, location, channel and quality code (R,D,Q,M) and NOT to use Q to identify that QC
    parameters are calculated and available

    Please provide me with your feedback within 3 weeks, otherwise I assume you agree with these metrics
    definitions.

    Thank you and best regards,
    Reinoud


    <Proposal definition QC metrics - v1.4.pdf>
    ----------------------
    FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)

    Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
    Update subscription preferences at http://www.fdsn.org/account/profile/