Participants in the discussion on the future of miniSEED
International data exchange in earthquake seismology has been effective over
decades because, among other reasons, SEED format has been mostly static.
There are places in SEED into which everything, albeit awkwardly in some
cases, has to fit. This creates a format that everybody may grouse about
equally, but lives within. As a result, there has been a remarkable level of
data sharing across networks. We were one of the early participants in the
design of miniSEED, and as a manufacturer, we have supplied equipment
embracing the advantages of a documented, common, and efficient format. It
has been gratifying to see seismology benefit so greatly over recent years,
helped along by the ability to share high-quality data. After such a long,
successful run, a few of the formats capabilities need refreshing, but the
design remains sound.
There appear to be two main independent objectives in the present drive to
update miniSEED:
1. Extend representations of certain format elements, such as network
and location codes to accommodate growing needs.
2. Dropping some information now in the archive as defined entities in
favor of sanitizing the information permanently retained to fit an idealized
rendition of the data recorded by field equipment. As a by-product, the
extensible, documented blockette system would be replaced by opaque
data.
Point 1 can be argued is clearly needed, although whether it is necessary to
do a wholesale rewrite of MSEED handling software worldwide to accomplish
this goal is a worthy topic of discussion. These goals could be accommodated
within the existing format, for example, by definition of new blockettes to
contain the extended identifiers. For example, reserved values could be used
for the existing network and location codes to indicate the presence of
extended identifiers. Such an approach would be forward and backward
compatible, and impose minimal changes on existing global infrastructure. I
understand some FDSN members have voiced a similar opinion that minimally
invasive changes could be developed that would address the requirements.
Point 2 is, as a matter of design philosophy, not a good idea. For an
archival format, as much information as possible about the recording
environment and the equipment should be maintained and documented, not
filtered out - for potential use decades from now. Some of the proposals, in
the spirit of extensibility, propose moving some information that is now
fully enumerated in the published SEED format specification into opaque
headers - what might be called the information gray market.
The objective of Point 2 is essentially to strip the published format down
to some clean bones, and neither mandate nor even define data structures
that may be pertinent to only one class of equipment in the formats
definition. This is a nice idea from a data centers view, since all the
burden of interpreting any information that might have its formal
specification decommissioned would be pushed onto the user. Its a bad idea
from the point of view of future integrity and maximum usefulness of the
archive, since opaque data is likely to be undocumented, poorly
documented, or even omitted altogether as data are passed from archive to
archive over time. A diversity of information should be supported, and
defined in the archival format. The solution to managing information that
may be important to interpretation or future harvesting is not to eliminate
the information, but to document it. For an analog, imagine WWSSN
seismograms that have no writing on the back. Some of the comments in email
threads appear to agree with the point that more information pertinent to
the recording environment, not less, is better in an archival format.
Of course changing the format in a non-backward-compatible way, as proposed
in the changes driven by Point 2, does risk blowing up a lot of things that
work now. Is it worth it? Ultimately all format definitions are arbitrary.
Much of what is being proposed is effectively an arbitrary rearrangement. If
this were 1988, the cost would be minimal. Now, frankly, to arbitrarily
change fundamental aspects of the design of what has been one of the most
successful collaborative undertakings in earthquake seismology seems at
least unnecessary, if not a wholly unproductive use of resources. Everyones
infrastructure will not be simplified, but complicated by the major
bifurcation in the format used to exchange data worldwide. Every tool will
have to support not one, but both formats. This will not necessarily make
things better, but it will make work. A measured approach to solve the
actual problems, such as inadequate namespaces for certain format elements,
might address the task in a simple, direct, and efficient way that does not
create an enduring burden.
In a spirit of collaboration, we have responded to a number of the proposed
specific points in the relevant email threads. In general, however, we are
opposed to a redesign that would result in non-backward compatibility. We
would support a working group, and would be happy to serve, to develop an
approach to incorporate necessary changes, while retaining as much backward
compatibility as possible.
Best Regards
Dr. Joseph M. Steim
President
Quanterra, Inc.
steim<at>quanterra.com <steim<at>quanterra.com>
Thank you for your thoughtful comments and for your participation in the straw man based process so far. The feedback from equipment manufacturers and other data producers is very valuable.
Your Point 2 about dropping information, reforming it and relegating it to a so-called "gray market" is important. Swayed by your previous feedback and change proposals we at the IRIS DMC created our own change proposal (#15) to transform the opaque data headers to optional data headers, where some are defined and some not. The intention is to provide a mechanism to be used for both FDSN defined flags/information and allow undefined flags to be inserted efficiently; in the end more capable than blockettes and addressing the opaque/gray market concern. This is the intended process with the straw man model, and demonstrates that it is working. If an interested party would like to retain the blockette structure, that too could be proposed and discussed.
The straw man should not be judged as a completed definition, it has not undergone even the first revision. I agree that arbitrary changes that deviate from past practices should be scrutinized carefully, and minimized, unless there are compelling reasons to change. Perhaps, for some, the initial straw man included too many changes from current usage, but I am optimistic that it will move in the direction of consensus. Trying to answer the overarching question of whether the end result would be worth the change cost is premature. We do not know what it is yet. We understand your position to be that any change that is not backwards compatible is not worth it, which is perfectly valid and will continue to influenced the process.
The discussion so far has already gathered more input and thoughts, across a broad audience, on the future of miniSEED than I have ever experienced in FDSN exchanges. This has been mostly constructive, valuable, and will inform any future discussions.
Best regards,
Chad
On Aug 23, 2016, at 10:06 AM, Joseph Steim <steim<at>quanterra.com> wrote:
Participants in the discussion on the future of miniSEED
International data exchange in earthquake seismology has been effective over decades because, among other reasons, SEED format has been mostly static. There are places in SEED into which everything, albeit awkwardly in some cases, has to fit. This creates a format that everybody may grouse about equally, but lives within. As a result, there has been a remarkable level of data sharing across networks. We were one of the early participants in the design of miniSEED, and as a manufacturer, we have supplied equipment embracing the advantages of a documented, common, and efficient format. It has been gratifying to see seismology benefit so greatly over recent years, helped along by the ability to share high-quality data. After such a long, successful run, a few of the format’s capabilities need refreshing, but the design remains sound.
There appear to be two main independent objectives in the present drive to update miniSEED:
1. Extend representations of certain format elements, such as network and location codes to accommodate growing needs.
2. Dropping some information now in the archive as defined entities in favor of sanitizing the information permanently retained to fit an idealized rendition of the data recorded by field equipment. As a by-product, the extensible, documented “blockette” system would be replaced by “opaque” data.
Point 1 can be argued is clearly needed, although whether it is necessary to do a wholesale rewrite of MSEED handling software worldwide to accomplish this goal is a worthy topic of discussion. These goals could be accommodated within the existing format, for example, by definition of new blockettes to contain the extended identifiers. For example, reserved values could be used for the existing network and location codes to indicate the presence of extended identifiers. Such an approach would be forward and backward compatible, and impose minimal changes on existing global infrastructure. I understand some FDSN members have voiced a similar opinion that minimally invasive changes could be developed that would address the requirements.
Point 2 is, as a matter of design philosophy, not a good idea. For an archival format, as much information as possible about the recording environment and the equipment should be maintained – and documented, not filtered out - for potential use decades from now. Some of the proposals, in the spirit of extensibility, propose moving some information that is now fully enumerated in the published SEED format specification into opaque headers - what might be called the information “gray market”.
The objective of Point 2 is essentially to strip the published format down to some clean bones, and neither mandate nor even define data structures that may be pertinent to only one class of equipment in the format’s definition. This is a nice idea from a data center’s view, since all the burden of interpreting any information that might have its formal specification decommissioned would be pushed onto the user. It’s a bad idea from the point of view of future integrity and maximum usefulness of the archive, since “opaque” data is likely to be undocumented, poorly documented, or even omitted altogether as data are passed from archive to archive over time. A diversity of information should be supported, and defined in the archival format. The solution to managing information that may be important to interpretation or future harvesting is not to eliminate the information, but to document it. For an analog, imagine WWSSN seismograms that have no writing on the back. Some of the comments in email threads appear to agree with the point that more information pertinent to the recording environment, not less, is better in an archival format.
Of course changing the format in a non-backward-compatible way, as proposed in the changes driven by Point 2, does risk blowing up a lot of things that work now. Is it worth it? Ultimately all format definitions are arbitrary. Much of what is being proposed is effectively an arbitrary rearrangement. If this were 1988, the cost would be minimal. Now, frankly, to arbitrarily change fundamental aspects of the design of what has been one of the most successful collaborative undertakings in earthquake seismology seems at least unnecessary, if not a wholly unproductive use of resources. Everyone’s infrastructure will not be simplified, but complicated by the major bifurcation in the format used to exchange data worldwide. Every tool will have to support not one, but both formats. This will not necessarily make things better, but it will make work. A measured approach to solve the actual problems, such as inadequate namespaces for certain format elements, might address the task in a simple, direct, and efficient way that does not create an enduring burden.
In a spirit of collaboration, we have responded to a number of the proposed specific points in the relevant email threads. In general, however, we are opposed to a redesign that would result in non-backward compatibility. We would support a working group, and would be happy to serve, to develop an approach to incorporate necessary changes, while retaining as much backward compatibility as possible.
Best Regards
Dr. Joseph M. Steim
President
Quanterra, Inc.
steim<at>quanterra.com <steim<at>quanterra.com>
As we have mentioned and detailed in various emails to this list, we
also are deeply concerned about the potential disruption and
deterioration of services to users due to a non-backwards compatibility
and many changes. Therefore, our position remains that in order to
optimize the process and usage of resources, before getting into the
single items of the proposal we should get a better understanding of
what we really need and want from an extension to existing SEED, how
this can be designed, which are the expected rollout plans and what will
be the implications for all users. Without having this clearly laid down
it is difficult to understand and evaluate if the changes we are
proposing are worth the efforts they will imply throughout our community.
We appreciate the support by the FDSN Chair for a meeting in late 2016
and this is also clear from the ongoing discussion. The aim should be
not to discuss two alternative proposals but rather to discuss how we
can reach the goal of maintaining a widely accepted format by addressing
as far as possible shortcomings of the current mini-SEED format, will be
supported by a wide section of the community, and be actively embraced
by data centers and end users. The meeting we proposed should include an
extensive discussion on what we really need from an extended or new
format and how we get there with a commonly agreed strategy.
As stated in the initial strawman the main driving motivation behind
this effort is the need to expand the network code to satisfy the always
growing number of demands: “Many FDSN members recognize that the current
two-character network code needs to expand. The miniSEED format is a
fixed length format and expanding the network code would render the
format incompatible with the current release. Such a small, but
disruptive change affords the opportunity to consider other changes to
the format, allowing the FDSN to address historical issues and create a
new foundation for current and future use.”
Therefore we proposed a pragmatic way to immediately solve this issue
with a cost effective solution. Still our proposal can accommodate a
number of other issues mentioned in the strawman as listed at the bottom
of the present e-mail [1].
Before moving forward with this process and iterations we would like to
invite everybody to carefully think about the general purpose of the
changes without being biased by the technical comments or change
proposals on the strawman. This can be done by setting up a dedicated
Working Group (as suggested by J. Steim) or in a dedicated meeting as we
proposed earlier. Indeed the dedicated meeting can be the fundamental
planning forum for this Working Group. In both cases the EIDA member
institutions are ready to actively contribute.
ORFEUS is ready to organize the meeting in Europe (possible location and
and date will be communicated later) and travel costs for up to 5 or 6
participants from other continents can be covered/sponsored by ORFEUS or
by the hosting Institute in Europe. A tentative agenda can be posted
here and discussed within the next days. The intention is not to have
two competing proposals, but to discuss and agree jointly the pathway to
the adoption and rollout of an extended or new standard that should not
be driven only by the urgent need for additional network codes.
1. Expand the network code.
MS 2.5: Include expanded network code in b1002. Replace network code
in fixed header by "99" or another reserved code.
2. Add a miniSEED version field.
MS 2.5: Probably not needed, but can be included in b1002.
3. Add a data version field.
MS 2.5: Include data version field in b1002.
4. Move important Blockette details into fixed section of the header.
MS 2.5: Not applicable, MS 2.4 blockettes will be kept.
5. Simplify & improve the record start time.
MS 2.5: Not applicable, MS 2.4 time structure will be kept (millisecond
resolution is already supported by blockette 1001).
6. Combine and drop bit flags.
MS 2.5: Not applicable, MS 2.4 bit flags will be kept.
7. Eliminate the time correction field.
MS 2.5: Not applicable, MS 2.4 time correction field will be kept.
8. Forward compatibility mapping.
MS 2.5: Trivial -- since MS 2.5 is a superset of MS 2.4, any MS 2.4 file
is also an MS 2.5 file.
9. General compression and opaque data encodings.
MS 2.5: In MS 2.4, encodings 1..5 (general), 10..18 (FDSN networks) and
30..33 (older networks) are defined. Proposed new encodings 50, 51, 52
and 100 can be added, but should be used only in special cases when
compatibility is not an issue.
10. Add CRC field for validating integrity.
MS 2.5: Include CRC field in b1002. CRC should be calculated over the
entire record, with the CRC bytes assumed to be zero for purposes of the
calculation.
11. Expand the channel codes.
MS 2.5: Include expanded channel code in b1002. Replace channel code in
fixed header by a reserved value.
12. Expand the location identifier.
MS 2.5: Include expanded location identifier in b1002. Replace location
identifier in fixed header by a reserved value.
13. Fixed-point data sample encoding.
MS 2.5: See 9.
14. No SEED 2.4 blockettes, include support for opaque headers.
MS 2.5: Not applicable, MS 2.4 blockettes will be kept. Opaque headers,
though already supported by b2000, could be added to b1002 as well.
15. Eliminate sequence numbers.
MS 2.5: Not applicable, sequence numbers will be kept.
16. Eliminate the timing quality field.
MS 2.5: Not applicable, the timing quality field will be kept.
17. Variable record lengths.
MS 2.5: Not applicable. This is the only addition of MS3 that cannot be
implemented in MS 2.5. On the other hand, the proposal of variable
length records is rather controversial anyway and there are voices
against it.
On 23.08.2016 19:06, Joseph Steim wrote:
Participants in the discussion on the future of miniSEED
International data exchange in earthquake seismology has been effective
over decades because, among other reasons, SEED format has been mostly
static. There are places in SEED into which everything, albeit awkwardly
in some cases, has to fit. This creates a format that everybody may
grouse about equally, but lives within. As a result, there has been a
remarkable level of data sharing across networks. We were one of the
early participants in the design of miniSEED, and as a manufacturer, we
have supplied equipment embracing the advantages of a documented,
common, and efficient format. It has been gratifying to see seismology
benefit so greatly over recent years, helped along by the ability to
share high-quality data. After such a long, successful run, a few of the
format’s capabilities need refreshing, but the design remains sound.
There appear to be two main independent objectives in the present drive
to update miniSEED:
1. Extend representations of certain format elements, such as
network and location codes to accommodate growing needs.
2. Dropping some information now in the archive as defined
entities in favor of sanitizing the information permanently retained to
fit an idealized rendition of the data recorded by field equipment. As a
by-product, the extensible, documented “blockette” system would be
replaced by “opaque” data.
Point 1 can be argued is clearly needed, although whether it is
necessary to do a wholesale rewrite of MSEED handling software worldwide
to accomplish this goal is a worthy topic of discussion. These goals
could be accommodated within the existing format, for example, by
definition of new blockettes to contain the extended identifiers. For
example, reserved values could be used for the existing network and
location codes to indicate the presence of extended identifiers. Such an
approach would be forward and backward compatible, and impose minimal
changes on existing global infrastructure. I understand some FDSN
members have voiced a similar opinion that minimally invasive changes
could be developed that would address the requirements.
Point 2 is, as a matter of design philosophy, not a good idea. For an
archival format, as much information as possible about the recording
environment and the equipment should be maintained – and documented, not
filtered out - for potential use decades from now. Some of the
proposals, in the spirit of extensibility, propose moving some
information that is now fully enumerated in the published SEED format
specification into opaque headers - what might be called the information
“gray market”.
The objective of Point 2 is essentially to strip the published format
down to some clean bones, and neither mandate nor even define data
structures that may be pertinent to only one class of equipment in the
format’s definition. This is a nice idea from a data center’s view,
since all the burden of interpreting any information that might have its
formal specification decommissioned would be pushed onto the user. It’s
a bad idea from the point of view of future integrity and maximum
usefulness of the archive, since “opaque” data is likely to be
undocumented, poorly documented, or even omitted altogether as data are
passed from archive to archive over time. A diversity of information
should be supported, and defined in the archival format. The solution to
managing information that may be important to interpretation or future
harvesting is not to eliminate the information, but to document it. For
an analog, imagine WWSSN seismograms that have no writing on the back.
Some of the comments in email threads appear to agree with the point
that more information pertinent to the recording environment, not less,
is better in an archival format.
Of course changing the format in a non-backward-compatible way, as
proposed in the changes driven by Point 2, does risk blowing up a lot of
things that work now. Is it worth it? Ultimately all format definitions
are arbitrary. Much of what is being proposed is effectively an
arbitrary rearrangement. If this were 1988, the cost would be minimal.
Now, frankly, to arbitrarily change fundamental aspects of the design of
what has been one of the most successful collaborative undertakings in
earthquake seismology seems at least unnecessary, if not a wholly
unproductive use of resources. Everyone’s infrastructure will not be
simplified, but complicated by the major bifurcation in the format used
to exchange data worldwide. Every tool will have to support not one, but
both formats. This will not necessarily make things better, but it will
make work. A measured approach to solve the actual problems, such as
inadequate namespaces for certain format elements, might address the
task in a simple, direct, and efficient way that does not create an
enduring burden.
In a spirit of collaboration, we have responded to a number of the
proposed specific points in the relevant email threads. In general,
however, we are opposed to a redesign that would result in non-backward
compatibility. We would support a working group, and would be happy to
serve, to develop an approach to incorporate necessary changes, while
retaining as much backward compatibility as possible.
Best Regards
Dr. Joseph M. Steim
President
Quanterra, Inc.
steim<at>quanterra.com <steim<at>quanterra.com>
Helmholtz Centre Potsdam
GFZ German Research Centre For Geosciences
Public Law Foundation State of Brandenburg
Telegrafenberg, 14473 Potsdam
House A3 Room 207 http://geofon.gfz-potsdam.de/
Thanks for including me on these e-mail exchanges. As a long time producer
and intense user of SEED data, and as having participated in the
discussions that led to the birth of SEED/mini-SEED as a global data
exchange format for broadband seismology within the FDSN 30 years ago, I
recognize that SEED may seem very clumsy given the evolution of computer
languages and in particular object-oriented coding, as well as other
shortcomings such as the two letter network code limitation. Some
improvements/enhancements should certainly be considered.
When SEED was designed, the focus was to standardize a format that would
contain all the necessary and accurate information to fully understand the
data, for the benefit of high quality science. It went along with effort at
developing standards for the quality of the broadband instrumentation,
which are also still relevant.
If the format under discussion was for a completely new type of data
acquired for different purposes than the original purpose of SEED, then
only would it be justified to "start from scratch". Why break it if it
works so well?
I particularly wish to support two of the points made by Joe Steim.
1- Because it is now so widely and effectively used, and serves the purpose
of the users that depend on the data for their research, any changes going
forward MUST be backward compatible. Otherwise, this will create havoc in
the user community, that could halt progress in funded science by several
months if not a year for a large international community, for no compelling
reason. That translates into a huge amount of unnecessary frustration, as
well as substantial financial costs.
2- I am really alarmed at any suggestion of reducing the amount of
information to be included in the metadata. Time and again, someone
discovers a problem with some older data, that can only be understood if
one digs deep into the metadata, and if the information is there, the data
are still useful. Should we then be throwing away such data, that may have
great value because they uniquely correspond to some original
source-station path, or an event that was not previously considered
"interesting"?
Surely, there must be ways to address some of the shortcomings of SEED
without discarding information, and in a backward compatible way!
Regards
Barbara Romanowicz
On Tue, Aug 23, 2016 at 10:06 AM, Joseph Steim <steim<at>quanterra.com> wrote:
Participants in the discussion on the future of miniSEED
International data exchange in earthquake seismology has been effective
over decades because, among other reasons, SEED format has been mostly
static. There are places in SEED into which everything, albeit awkwardly in
some cases, has to fit. This creates a format that everybody may grouse
about equally, but lives within. As a result, there has been a remarkable
level of data sharing across networks. We were one of the early
participants in the design of miniSEED, and as a manufacturer, we have
supplied equipment embracing the advantages of a documented, common, and
efficient format. It has been gratifying to see seismology benefit so
greatly over recent years, helped along by the ability to share
high-quality data. After such a long, successful run, a few of the format’s
capabilities need refreshing, but the design remains sound.
There appear to be two main independent objectives in the present drive to
update miniSEED:
1. Extend representations of certain format elements, such as
network and location codes to accommodate growing needs.
2. Dropping some information now in the archive as defined entities
in favor of sanitizing the information permanently retained to fit an
idealized rendition of the data recorded by field equipment. As a
by-product, the extensible, documented “blockette” system would be replaced
by “opaque” data.
Point 1 can be argued is clearly needed, although whether it is necessary
to do a wholesale rewrite of MSEED handling software worldwide to
accomplish this goal is a worthy topic of discussion. These goals could be
accommodated within the existing format, for example, by definition of new
blockettes to contain the extended identifiers. For example, reserved
values could be used for the existing network and location codes to
indicate the presence of extended identifiers. Such an approach would be
forward and backward compatible, and impose minimal changes on existing
global infrastructure. I understand some FDSN members have voiced a similar
opinion that minimally invasive changes could be developed that would
address the requirements.
Point 2 is, as a matter of design philosophy, not a good idea. For an
archival format, as much information as possible about the recording
environment and the equipment should be maintained – and documented, not
filtered out - for potential use decades from now. Some of the proposals,
in the spirit of extensibility, propose moving some information that is now
fully enumerated in the published SEED format specification into opaque
headers - what might be called the information “gray market”.
The objective of Point 2 is essentially to strip the published format down
to some clean bones, and neither mandate nor even define data structures
that may be pertinent to only one class of equipment in the format’s
definition. This is a nice idea from a data center’s view, since all the
burden of interpreting any information that might have its formal
specification decommissioned would be pushed onto the user. It’s a bad idea
from the point of view of future integrity and maximum usefulness of the
archive, since “opaque” data is likely to be undocumented, poorly
documented, or even omitted altogether as data are passed from archive to
archive over time. A diversity of information should be supported, and
defined in the archival format. The solution to managing information that
may be important to interpretation or future harvesting is not to eliminate
the information, but to document it. For an analog, imagine WWSSN
seismograms that have no writing on the back. Some of the comments in email
threads appear to agree with the point that more information pertinent to
the recording environment, not less, is better in an archival format.
Of course changing the format in a non-backward-compatible way, as
proposed in the changes driven by Point 2, does risk blowing up a lot of
things that work now. Is it worth it? Ultimately all format definitions are
arbitrary. Much of what is being proposed is effectively an arbitrary
rearrangement. If this were 1988, the cost would be minimal. Now, frankly,
to arbitrarily change fundamental aspects of the design of what has been
one of the most successful collaborative undertakings in earthquake
seismology seems at least unnecessary, if not a wholly unproductive use of
resources. Everyone’s infrastructure will not be simplified, but
complicated by the major bifurcation in the format used to exchange data
worldwide. Every tool will have to support not one, but both formats. This
will not necessarily make things better, but it will make work. A measured
approach to solve the actual problems, such as inadequate namespaces for
certain format elements, might address the task in a simple, direct, and
efficient way that does not create an enduring burden.
In a spirit of collaboration, we have responded to a number of the
proposed specific points in the relevant email threads. In general,
however, we are opposed to a redesign that would result in non-backward
compatibility. We would support a working group, and would be happy to
serve, to develop an approach to incorporate necessary changes, while
retaining as much backward compatibility as possible.
Best Regards
Dr. Joseph M. Steim
President
Quanterra, Inc.
steim<at>quanterra.com
Thanks to everyone that provided feedback regarding a new version of miniSeed. We think this was very valuable and will help inform any process moving forward. This feedback included both support and resistance to the concept. Due to the lack of support for the current approach, we are not going to continue our current approach. We do believe strongly that the current miniSeed needs to be looked at closely so that we can continue making it a viable format as we move forward. The current version has many weaknesses that need addressing directly and not a workaround. We encourage WG II to consider an alternative to what we have been promoting for discussion at the next FDSN meetings in Kobe.
Cheers and thanks
Tim Ahern
Director of Data Services
IRIS
IRIS DMC
1408 NE 45th Street #201
Seattle, WA 98105