5. Dealing with data-quality flags
2.1
CTD range checks
(This section is an expansion of Example 1 shown by
?"setFlags,ctd-method"
.)
The oce package provides a dataset
library
(oce)(oce)
data
(ctdRaw)(ctdRaw)
that contains some clearly anomalous values that are revealed clearly
in a summary plot (Figure 1):
plot
(ctdRaw)(ctdRaw)
As Figure 1 shows, ctdTrim
removes most of the anomalous
data by examining the variation of the pressure signal over time, it may
also be of interest to see how well simple range checks can perform in
cleaning up the data. Salinity certainly cannot be negative, but in an
oceanographic setting it is common to relax that criterion somewhat,
perhaps insisting that Absolute Salinity \(S_A\) exceed \(25\)g/kg. This value might work in other
situations as well, and the same could be said of an upper limit of
\(40\)g/kg. Similarly, it might make
sense to bound temperature between, say, \(-2^\circ\)C and \(40^\circ\)C for application throughout much
of the world ocean.
These criteria can be supplied to setFlags
in various
ways, but the simplest is to create logical vectors, e.g.
<-
with
(ctdRaw[["data"
]], salinity <
25
|
40
<
salinity) badS(ctdRaw[[]], salinitysalinity)<-
with
(ctdRaw[["data"
]], temperature <
-
2
|
40
<
temperature) badT(ctdRaw[[]], temperaturetemperature)
In the above, with
has been used to avoid inserting
salinity
and temperature
in the namespace, but
it would also be common to use e.g.
S <-
ctdRaw[["salinity"
]]ctdRaw[[]]
T <-
ctdRaw[["temperature"
]]ctdRaw[[]]
<-
(S <
25
|
40
<
S) |
(T <
-
2
|
40
<
T) bad(SS)(TT)
Since the goal here is to illustrate setting multiple flags, the
badS
and badT
values will be used. The first
step is to copy the original data, so that the flag operations will not
alter ctdRaw
:
<-
ctdRaw qcctdRaw
Work flow is best documented if a flag scheme is established, and the
“WHP CTD exchange” scheme is a reasonable choice; using
<-
initializeFlagScheme
(qc, "WHP CTD"
) qc(qc,
stores a note on the scheme in the metadata
slot of
qc
. Importantly, however, it does not store any
flag values. The next step is to initialize flag values. For example, to
set up flag storage for salinity and temperature, use e.g.
<-
initializeFlags
(qc, "salinity"
, 2
) qc(qc,<-
initializeFlags
(qc, "temperature"
, 2
) qc(qc,
to create storage, and initialize all entries to the
"acceptable"
value (which, for this flag scheme, is the
number 2
). Once storage has been initialized for a given
variable, further calls to initializeFlags
will have no
effect, apart from a warning.
Note that it is entirely possible to use initializeFlags
with a numerical value for its third argument. One advantage of using
initializeFlagScheme
is that it clarifies code, but the
bigger advantage is that it embeds the scheme within the object, so that
a second analyst could examine it later and be clear on the meanings of
the numerical codes. This is very important, because there is not much
agreement within the oceanographic community on which flag scheme to
use, for many data types (Argo being an exception).
At this stage, individual data can be flagged with
setFlags
. This function can be called any number of times.
Continuing along with our example, we may mark bad salinities with
<-
setFlags
(qc, "salinity"
, badS, value=
"bad"
) qc(qc,, badS,
We can see that the flag got inserted by using
summary(qc)
, but for brevity here another method is:
names
(qc[["flags"
]])(qc[[]])
#> [1] "salinity" "temperature"
Now, temperature flags may be inserted with
<-
setFlags
(qc, "temperature"
, badT, value=
"bad"
) qc(qc,, badT,
Readers ought to use summary(qc)
to get more details of
how flags were handled, and how many bad salinities and temperatures
were flagged.
If qc
is plotted with plot(qc)
, the results
will match those of plot(ctdRaw)
. This is because setting
flags has no effect on plots, because it alters flags but not
data. One more step is required to test whether this procedure
has cleaned up the data significantly: we must “handle” the flags,
using
<-
handleFlags
(qc, flags=
list
(c
(1
, 3
:
9
))) qch(qc,)))
Comparing Figure 1 with the summary plot for qch
(Figure
2), constructed with
plot
(qch)(qch)
shows significant improvements. The downcast and the upcast can be
seen quite clearly now, although there appears to be an issue of low
salinity at the turnaround point. Setting a flag for pressure increase
with time will isolate the downcast somewhat, although some smoothing
will be required. Another issue related to the path of the instrument is
that it may have been held below the surface for a while to equilibrate.
Again, a flag could be set up to remove such data. However, it ought to
be noted that ctdTrim
can be used to address issues
relating to instrument movement.