Title: | Diversity Dynamics using Fossil Sampling Data |
---|---|
Description: | Functions to describe sampling and diversity dynamics of fossil occurrence datasets (e.g. from the Paleobiology Database). The package includes methods to calculate range- and occurrence-based metrics of taxonomic richness, extinction and origination rates, along with traditional sampling measures. A powerful subsampling tool is also included that implements frequently used sampling standardization methods in a multiple bin-framework. The plotting of time series and the occurrence data can be simplified by the functions incorporated in the package, as well as other calculations, such as environmental affinities and extinction selectivity testing. Details can be found in: Kocsis, A.T.; Reddin, C.J.; Alroy, J. and Kiessling, W. (2019) <doi:10.1101/423780>. |
Authors: | Adam T. Kocsis [cre, aut] , John Alroy [aut] , Carl J. Reddin [aut] , Wolfgang Kiessling [aut] , Deutsche Forschungsgemeinschaft [fnd], FAU GeoZentrum Nordbayern [fnd] |
Maintainer: | Adam T. Kocsis <[email protected]> |
License: | CC BY 4.0 |
Version: | 0.8.3 |
Built: | 2024-11-22 10:14:45 UTC |
Source: | https://github.com/divdyn/r-package |
This function will return the preferred environment of the taxa, given the distribution of occurrences.
affinity( x, tax, bin, env, coll = NULL, method = "binom", alpha = 1, reldat = NULL, na.rm = FALSE, bycoll = FALSE, output = "levels" )
affinity( x, tax, bin, env, coll = NULL, method = "binom", alpha = 1, reldat = NULL, na.rm = FALSE, bycoll = FALSE, output = "levels" )
x |
|
tax |
|
bin |
|
env |
|
coll |
|
method |
|
alpha |
|
reldat |
|
na.rm |
|
bycoll |
|
output |
|
Sampling patterns have an overprinting effect on the frequency of taxon occurrences in different environments. The environmental affinity (Foote, 2006; Kiessling and Aberhan, 2007; Kiessling and Kocsis, 2015) expresses whether the taxa are more likely to occur in an environment, given the sampling patterns of the dataset at hand. The function returns the likely preferred environment for each taxon as a vector. NA
outputs indicate that the environmental affinity is equivocal based on the selected method.
The following methods are implemented:
'majority'
: Environmental affinity will be assigned based on the number of occurrences of the taxon in the different environments, without taking sampling of the entire dataset into account. If the taxon has more occurrences in environment 1, the function will return environment 1 as the preferred habitat.
'binom'
: The proportion of occurrences of a taxon in environment 1 and environment 2 will be compared to a null model, which is based on the distribution of all occurrences from the stratigraphic range of the taxon (in x
or if provided, in reldat
). Then a binomial test is run on with the numbers of the most likely preference (against all else). The alpha
value indicates the significance of the binomial tests, setting alpha
to 1
will effectively switch the testing off: if the ratio of occurrences for the taxon is different from the ratio observed in the dataset, an affinity will be assigned. This is the default method. If an environment is not sampled at all in the dataset to which the taxon's occurrences are compared to, the binomial method returns NA
for the taxon's affinity.
References
Foote, M. (2006). Substrate affinity and diversity dynamics of Paleozoic marine animals. Paleobiology, 32(3), 345-366.
Kiessling, W., & Aberhan, M. (2007). Environmental determinants of marine benthic biodiversity dynamics through Triassic–Jurassic time. Paleobiology, 33(3), 414-434.
Kiessling, W., & Kocsis, Á. T. (2015). Biodiversity dynamics and environmental occupancy of fossil azooxanthellate and zooxanthellate scleractinian corals. Paleobiology, 41(3), 402-414.
If output="levels"
, a named vector, values corresponding to affinities.
data(corals) # omit values where no occurrence environment entry is present, or where unknown fossils<-subset(corals, stg!=95) fossilEnv<-subset(fossils, bath!="uk") # calculate affinities aff<-affinity(fossilEnv, env="bath", tax="genus", bin="stg", alpha=1, coll="collection_no")
data(corals) # omit values where no occurrence environment entry is present, or where unknown fossils<-subset(corals, stg!=95) fossilEnv<-subset(fossils, bath!="uk") # calculate affinities aff<-affinity(fossilEnv, env="bath", tax="genus", bin="stg", alpha=1, coll="collection_no")
This function will return the basic sampling summaries of a dataset
binstat( x, tax = "genus", bin = "stg", coll = NULL, ref = NULL, noNAStart = FALSE, duplicates = NULL, xexp = NULL, indices = FALSE )
binstat( x, tax = "genus", bin = "stg", coll = NULL, ref = NULL, noNAStart = FALSE, duplicates = NULL, xexp = NULL, indices = FALSE )
x |
|
tax |
|
bin |
|
coll |
|
ref |
|
noNAStart |
(logical) Useful when the dataset does not start from bin no. 1, but positive integer bin numbers are provided. Then |
duplicates |
|
xexp |
( |
indices |
( |
Secondary function of the package that calculates a number of sampling related variables and diversity estimators for each bin.
In contrast to the (divDyn
) function, the bins are treated independently in this function.
The function also returns the maximum subsampling quota for OxW subsampling
(subtrialOXW
) with a given xexp
value.
By setting total
to FALSE
(default), the following results are output:
occs
: The number of occurrences in each time bin.
colls
: The number of collections in each time bin.
xQuota
: The maximum quota for OxW subsampling (subtrialOXW
) with the given xexp
value.
The number of occurrences in each collection is tabulated, and is raised to the power of xexp
.
The xQuota
value is the sum of these values across all collections in a time slice.
refs
: The number of references in each time bin.
SIBs
: The number of Sampled-In-Bin taxa in each time bin.
occ1
: The number of taxa in each time bin, that occur in only 1 collection.
ref1
: The number of taxa in each time bin, that occur in only 1 reference.
occ2
: The number of taxa in each time bin, that occur in exactly 2 collections.
ref2
: The number of taxa in each time bin, that occur in exactly 2 references.
u
: Good's u, coverage estimator based on the number of single-collection taxa (occ1).
uPrime
: Good's u, coverage estimator based on the number of single-reference taxa (ref1).
chao1occ
: Chao1 extrapolation estimator, based on the the number of single-collection and two-collection taxa (occ1).
chao1ref
: Chao1 extrapolation estimator, based on the the number of single-reference and two-reference taxa (occ2).
A data.frame with rows corresponding to bin entries.
data(corals) # slice-specific sampling basic <- binstat(corals, tax="genus", bin="stg") # subsampling diagnostic subStats <- subsample(corals, method="cr", tax="genus", FUN=binstat, bin="stg", q=100,noNAStart=FALSE) # maximum quota with xexp more <- binstat(corals, tax="genus", bin="stg", coll="collection_no", xexp=1.4)
data(corals) # slice-specific sampling basic <- binstat(corals, tax="genus", bin="stg") # subsampling diagnostic subStats <- subsample(corals, method="cr", tax="genus", FUN=binstat, bin="stg", q=100,noNAStart=FALSE) # maximum quota with xexp more <- binstat(corals, tax="genus", bin="stg", coll="collection_no", xexp=1.4)
This basic function replaces groups of values in a vector with single values with the help of a key object.
categorize(x, key, incbound = "lower")
categorize(x, key, incbound = "lower")
x |
|
key |
|
incbound |
|
Online datasets usually contain overly detailed information, as enterers intend to conserve as much data in the entry process, as possible. However, in analyses some values are treated to represent the same, less-detailed information, which is then used in further procedures. The map
function allows users to do this type of multiple replacement using a specific object called a 'key'
.
A key
is an informal class and is essentially a list
of vectors
. In the case of character
vectors as x
, each vector element in the list
corresponds to a set of entries in x
. These will be replaced by the name of the vector
in the list
, to indicate their assumed identity.
In the case of numeric
x
vectors, if the list
elements of the key
are numeric
vectors with 2 values, then this vector will be treated as an interval. The same value will be assigned to the entries that are in this interval (Example 2). If x
contains values that form the boundary of an interval, than either only the one of the two boundary values can be considered to be in the interval (see the incbound
argument to set which of the two).
The elements of key
are looped through in sequence. If values of x
occur in multiple elements of key
, than the last one will be used (Example 3).
Examples of this data type have been included (keys
) to help process Paleobiology Database occurrences.
A vector with replacements.
# Example 1 # x, as character set.seed(1000) toReplace <- sample(letters[1:6], 15, replace=TRUE) # a and b should mean 'first', c and d 'second' others: NA key<-list(first=c("a", "b"), second=c("c", "d"), default=NA) # do the replacement categorize(toReplace, key) # Example 2 - numeric entries and mixed types # basic vector to be grouped toReplace2<-1:16 # replacement rules: 5,6,7,8,9 should be "more", 11 should be "eleven" the rest: "other" key2<-list(default="other", more=c(5,10),eleven=11) categorize(toReplace2, key2) # Example 3 - multiple occurrences of same values # a and b should mean first, a and should mean 'second' others: NA key3<-list(first=c("a", "b"), second=c("a", "d"), default=NA) # do the replacement (all "a" entries will be replaced with "second") categorize(toReplace, key3)
# Example 1 # x, as character set.seed(1000) toReplace <- sample(letters[1:6], 15, replace=TRUE) # a and b should mean 'first', c and d 'second' others: NA key<-list(first=c("a", "b"), second=c("c", "d"), default=NA) # do the replacement categorize(toReplace, key) # Example 2 - numeric entries and mixed types # basic vector to be grouped toReplace2<-1:16 # replacement rules: 5,6,7,8,9 should be "more", 11 should be "eleven" the rest: "other" key2<-list(default="other", more=c(5,10),eleven=11) categorize(toReplace2, key2) # Example 3 - multiple occurrences of same values # a and b should mean first, a and should mean 'second' others: NA key3<-list(first=c("a", "b"), second=c("a", "d"), default=NA) # do the replacement (all "a" entries will be replaced with "second") categorize(toReplace, key3)
This function will take a vector of binomial names with various qualifiers of open nomenclatures, and removes them form the vector entries. Only the the genus and species names will remain.
cleansp( x, debug = FALSE, collapse = "_", subgenera = TRUE, misspells = TRUE, stems = TRUE )
cleansp( x, debug = FALSE, collapse = "_", subgenera = TRUE, misspells = TRUE, stems = TRUE )
x |
|
debug |
|
collapse |
|
subgenera |
|
misspells |
|
stems |
|
This version will keep subgenera, and will not assign species to the base genus. The following qualifiers will be omitted: "n.", "sp.", "?", "gen.", "aff.", "cf.", "ex gr.", "subgen.", "spp" and informal species designated with letters. Entries with "informal" and "indet." in them will also be invalidated.
A data.frame or character vector.
Adam T. Kocsis, Gwenn Antell. Adam T. Kocsis wrote the main body of the function, subroutines called by the misspells
and stems
are the modified work of Gwen Antell.
examp <- c("Genus cf. species", "Genus spp.", "Family indet.", "Mygenus yourspecies", "Okgenus ? questionsp", "Genus (cf. Subgenus) aff. species") cleansp(examp)
examp <- c("Genus cf. species", "Genus spp.", "Family indet.", "Mygenus yourspecies", "Okgenus ? questionsp", "Genus (cf. Subgenus) aff. species") cleansp(examp)
Example dataset to illustrate the package's basic functionalities.
data(corals)
data(corals)
A data.frame
with 29775 observations and 38 variables:
genus
Genus names of the occurrences. Cross referenced with a compiled table, the simplified version of this can be found in the supplementary material of Kiessling and Kocsis (2015).
collection_no
The number of the collection of the occurrence in the PaleoDB.
family
Family name of the occurrence.
abund_value
Abundance value.
abund_unit
Unit of abundance values.
reference_no
The reference number of the occurrence in the PaleoDB.
life_habit
The lifestyle of the occurring taxon.
diet
The diet of the occurring taxon.
country
Country of occurrence.
geoplate
Plate id of the occurrence.
lat
Present day latitude of the occurrence.
lng
Present day longitude of the occurrence.
paleolat
Reconstructed paleolatitude of the occurrence.
paleolng
Reconstructed paleolongitude of the occurrence.
period
Period of origin.
epoch
Epoch of origin.
subepoch
Subepoch of origin.
stage
Geologic stage of the embedding rocks.
early_interval
Early interval name registered in the PaleoDB dynamic time scale.
late_interval
Late interval name registered in the PaleoDB dynamic time scale.
max_ma
Maximum estimated age based on the PaleoDB dynamic time scale.
min_ma
Minimum estimated age based on the PaleoDB dynamic time scale.
stg
Bin number in the stage-level timescale stages
.
ten
Bin number in the PaleoDB 10 million year resolution timescale tens
.
env
Environment of the occurrence: reefal (r)
, non-reefal (nr)
or unknown (uk
), based on keys
.
lith
Substrate of the occurrence: carbonate (c)
, siliciclastic (s)
or unknown (uk
), based on keys
.
latgroup
Latitude of the occurrence: tropical (t)
or non-tropical (nt)
.
bath
Inferred depth of the occurrence: deep (deep)
, shallow (shal)
or unknown (uk
), based on keys
.
gensp
The binomen of the occurrence.
ecology
Symbiotic status of the occurring coral: zooxanthellate (z)
or azooxanthellate (az
, including apozooxanthellates).
ecologyMostZ
Symbiotic status of the occurring coral, incorporating the uncertainty of inferred symbiotic status. This variable includes assignment with the maximum likely number of zooxanthellate genera.
ecologyMostAZ
Symbiotic status of the occurring coral, incorporating the uncertainty of inferred symbiotic status. This variable includes assignment with the maximum likely number of azooxanthellate genera.
growth
Growth type of the coral: colonial
or solitary
.
integration
Integration of corallites from the scale of 0 to 4. solitary
corals are marked with 0s.
This particular dataset was used in a study by Kiessling and Kocsis (2015). All occurrences of Scleractinia were downloaded from the Paleobiology Database (PaleoDB, https://paleobiodb.org/) on 23 September 2014, originally comprising 32420 occurrences. They were than cross-checked with data from Corallosphere (used be accessible at http://corallosphere.org
). See the article text for details.
References
Kiessling, W., & Aberhan, M. (2007). Environmental determinants of marine benthic biodiversity dynamics through Triassic–Jurassic time. Paleobiology, 33(3), 414-434.
This function calculates various metrics from occurrence datasets in the form of time series.
divDyn( x, tax, bin = NULL, age = NULL, revtime = FALSE, breaks = NULL, coll = NULL, ref = NULL, om = NULL, noNAStart = FALSE, data.frame = TRUE, filterNA = FALSE )
divDyn( x, tax, bin = NULL, age = NULL, revtime = FALSE, breaks = NULL, coll = NULL, ref = NULL, om = NULL, noNAStart = FALSE, data.frame = TRUE, filterNA = FALSE )
x |
|
tax |
|
bin |
|
age |
|
revtime |
|
breaks |
|
coll |
|
ref |
|
om |
|
noNAStart |
(logical) Useful when the entries in the |
data.frame |
|
filterNA |
|
The following variables are produced:
bin
: Bin number, or the numeric identifier of the bin.
tThrough
: Number of through-ranging taxa, taxa that have first occurrences before, and last occurrences after the focal bin.
tOri
: Number of originating taxa, taxa that have first occurrences in the focal bin, and last occurrences after it.
tExt
: Number of taxa getting extinct. These are taxa that have first occurrences before the focal bin, and last occurrences in it.
tSing
: Number of stratigraphic singleton (single-interval) taxa, taxa that only occur in the focal bin.
t2d
: Number of lower two timers (Alroy, 2008; 2014), taxa that are present in the i-1th and the ith bin (focal bin).
t2u
: Number of upper two timers (Alroy, 2008; 2014), taxa that are present in the ith (focal) and the i+1th bin. (Alroy, 2008; 2014)
tGFu
: Number of upper gap-fillers (Alroy, 2014), taxa that occurr in bin i+2 and i-1, but were not found in i+1. (Alroy, 2014)
tGFd
: Number of lower gap-fillers (Alroy, 2014), taxa that occurr in bin i-2 and i+1, but were not found in i-1. (Alroy, 2014)
t3
: Number of three timer taxa (Alroy, 2008; 2014), present in bin i-1, i, and i+1. (Alroy, 2008; 2014)
tPart
: Part timer taxa (Alroy, 2008; 2014), present in bin i-1,and i+1, but not in bin i.
extProp
: Proportional extinctions including single-interval taxa: (tExt + tSing) / (tThrough + tOri + tExt + tSing).
oriProp
: Proportional originations including single-interval taxa: (tOri + tSing) / (tThrough + tOri + tExt + tSing).
extPC
: Per capita extinction rates of Foote (1999). -log(tThrough/(tExt + tThrough)). Values are not normalized with bin lengths. Similar equations were used by Alroy (1996) but without taking the logarithm.
oriPC
: Per capita origination rates of Foote (1999). -log(tThrough/(tOri + tThrough)). Values are not normalized with bin lengths. Similar equations were used by Alroy (1996) but without taking the logarithm.
ext3t
: Three-timer extinction rates of Alroy (2008). log(t2d/t3).
ori3t
: Three-timer origination rates of Alroy (2008). log(t2u/t3).
extC3t
: Corrected three-timer extinction rates of Alroy (2008). ext3t[i] + log(samp3t[i+1]).
oriC3t
: Corrected three-timer origination rates of Alroy (2008). ori3t[i] + log(samp3t[i-1]).
divSIB
: Sampled-in-bin diversity (richness), the number of genera sampled in the focal bin.
divCSIB
: Corrected sampled-in-bin diversity (richness). divSIB/samp3t*totSamp3t, where totSamp3t is total three-timer sampling completeness of the dataset (Alroy, 2008).
divBC
: Boundary-crosser diversity (richness), the number of taxa with ranges crossing the boundaries of the interval. tExt + tOri + tThrough.
divRT
: Range-through diversity (richness), all taxa in the interval, based on the range-through assumption. (tSing + tOri + tExt + tThrough).
sampRange
: Range-based sampling probability, without observed range end-points (Foote), (divSIB - tExt - tOri- t-Sing)/tThrough
samp3t
: Three-timer sampling completeness of Alroy (2008). t3/(t3+tPart)
extGF
: Gap-filler extinction rates of Alroy(2014). log((t2d + tPart)/(t3+tPart+tGFu))
oriGF
: Gap-filler origination rates of Alroy(2014). log((t2u + tPart)/(t3+tPart+tGFd))
E2f3
: Second-for-third extinction propotions of Alroy (2015). As these metrics are based on an algorithmic approach, for the equations please refer to the Alroy (2015, p. 634, right column and Eq. 4)). See source code (https://github.com/divDyn/r-package) for the exact implementation, found in the Metrics
function in the diversityDynamics.R file.
O2f3
: Second-for-third origination propotions of Alroy (2015). Please see E2f3
.
ext2f3
: Second-for-third extinction rates (based on Alroy, 2015). Transformed to the usual rate form with log(1/(1-E2f3)).
ori2f3
: Second-for-third origination rates (based on Alroy, 2015). Transformed to the usual rate form with log(1/(1-O2f3)).
References:
Foote, M. (1999) Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids. Paleobiology 25, 1–115. doi:10.1017/S0094837300020236.
Alroy, J. (2008) Dynamics of origination and extinction in the marine fossil record. Proceedings of the National Academy of Science 105, 11536-11542. doi: 10.1073/pnas.0802597105
Alroy, J. (2014) Accurate and precise estimates of origination and extinction rates. Paleobiology 40, 374-397. doi: 10.1666/13036
Alroy, J. (2015) A more precise speciation and extinction rate estimator. Paleobiology 41, 633-639. doi: 10.1017/pab.2015.26
A data.frame object, with every row corresponding to a time bin.
# import data data(corals) data(stages) # calculate metrics of diversity dynamics dd <- divDyn(corals, tax="genus", bin="stg") # plotting tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,230)) lines(stages$mid, dd$divRT, lwd=2) # with omission of single reference taxa ddNoSing <- divDyn(corals, tax="genus", bin="stg", om="ref", ref="reference_no") lines(stages$mid, ddNoSing$divRT, lwd=2, col="red") # using the estimated ages (less robust) - 10 million years # mean ages corals$me_ma <- apply(corals[, c("max_ma", "min_ma")], 1, mean) # ages reverse the direction of time! set ages to TRUE in this case ddRadio10 <- divDyn(corals, tax="genus", age="me_ma", breaks=seq(250,0,-10)) lines(ddRadio10$me_ma, ddRadio10$divRT, lwd=2, col="green") # legend legend("topleft", legend=c("all", "no single-ref. taxa", "all, estimated ages"), col=c("black", "red", "green"), lwd=c(2,2,2), bg="white")
# import data data(corals) data(stages) # calculate metrics of diversity dynamics dd <- divDyn(corals, tax="genus", bin="stg") # plotting tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,230)) lines(stages$mid, dd$divRT, lwd=2) # with omission of single reference taxa ddNoSing <- divDyn(corals, tax="genus", bin="stg", om="ref", ref="reference_no") lines(stages$mid, ddNoSing$divRT, lwd=2, col="red") # using the estimated ages (less robust) - 10 million years # mean ages corals$me_ma <- apply(corals[, c("max_ma", "min_ma")], 1, mean) # ages reverse the direction of time! set ages to TRUE in this case ddRadio10 <- divDyn(corals, tax="genus", age="me_ma", breaks=seq(250,0,-10)) lines(ddRadio10$me_ma, ddRadio10$divRT, lwd=2, col="green") # legend legend("topleft", legend=c("all", "no single-ref. taxa", "all, estimated ages"), col=c("black", "red", "green"), lwd=c(2,2,2), bg="white")
Function to generate range data from an occurrence dataset.
fadlad( x, tax, bin = NULL, age = NULL, revtime = FALSE, na.rm = TRUE, diffbin = TRUE )
fadlad( x, tax, bin = NULL, age = NULL, revtime = FALSE, na.rm = TRUE, diffbin = TRUE )
x |
|
tax |
|
bin |
|
age |
|
revtime |
|
na.rm |
|
diffbin |
|
The function will output First and Last Appearance Dates of the taxa in the dataset. Keep in mind that incomplete sampling will influence these data and will make the ranges appear shrunken.
The following variables are produced:
row.names
attribute: The names of the taxa.
FAD
: First appearance dates in time bin nmbers or ages.
LAD
: Last appearance dates in time bin numbers or ages.
duration
: The durations of taxa in bin numbers or ages.
A data.frame, with rows corresponding to tax
entries.
data(corals) # binned data flBinned <- fadlad(corals, tax="genus", bin="stg") # using basic bin lengths flDual <- fadlad(corals, tax="genus", age=c("max_ma", "min_ma")) # single age esimate data(stages) corals$mid <- stages$mid[corals$stg] flSingle <- fadlad(corals, tax="genus", age="mid")
data(corals) # binned data flBinned <- fadlad(corals, tax="genus", bin="stg") # using basic bin lengths flDual <- fadlad(corals, tax="genus", age=c("max_ma", "min_ma")) # single age esimate data(stages) corals$mid <- stages$mid[corals$stg] flSingle <- fadlad(corals, tax="genus", age="mid")
The function will loop through a vector and will substitute NA
values with the value it last encountered or replaced.
fill(x, forward = TRUE, inc = 0)
fill(x, forward = TRUE, inc = 0)
x |
|
forward |
|
inc |
|
NA
s won't be substituted when they are the first values the loop encounters.
A logical vector.
# forward, replace with previous dummy<- c(TRUE, FALSE, NA, TRUE, FALSE, NA) fill(dummy) # forward, replace with previous+1 dummy2 <- c(1,NA, 3, 1, 2, NA, NA, 9, NA,3) fill(dummy2, inc=1) # backward, replace with previous in loop direction fill(dummy2, inc=0, forward=FALSE)
# forward, replace with previous dummy<- c(TRUE, FALSE, NA, TRUE, FALSE, NA) fill(dummy) # forward, replace with previous+1 dummy2 <- c(1,NA, 3, 1, 2, NA, NA, 9, NA,3) fill(dummy2, inc=1) # backward, replace with previous in loop direction fill(dummy2, inc=0, forward=FALSE)
Geographic range as a function of a set of coordinates or sample/site/cell membeships.
georange(x, lng = NULL, lat = NULL, loc = NULL, method = "co")
georange(x, lng = NULL, lat = NULL, loc = NULL, method = "co")
x |
|
lng |
( |
lat |
( |
loc |
( |
method |
( |
Multiple estimators of geographic ranges are implemented based on coordinates or cell identifiers. The function outputs a vector of the results based on the calculation methods specified in methods
.
A numeric vector with geographic ranges (multiple methods).
data(corals) # select a taxon from a certain time slice bitax <- corals[corals$stg==69 & corals$genus=="Microsolena",] georange(bitax, lng="paleolng", lat="paleolat", method="co")
data(corals) # select a taxon from a certain time slice bitax <- corals[corals$stg==69 & corals$genus=="Microsolena",] georange(bitax, lng="paleolng", lat="paleolat", method="co")
This function includes some indices that characterize a species-abundance/occurrence distribution.
indices(x, samp = NULL, method = NULL)
indices(x, samp = NULL, method = NULL)
x |
either a |
samp |
( |
method |
( |
This set is not complete and does not intend to supercede additional R packages (e.g. vegan). However, some metrics are presented here as they are not
implemented elsewhere or because they are invoked more frequently. The following entries can be added to the method
argument of the function, which are
also named accordingly in the output table/vector.
"richness"
: The number of sampled species.
"shannon"
: The Shannon entropy.
dom
: The Berger-Parker dominance index, the proportion of occurrences in the time bin that belong to the most frequent taxon.
"hill2"
: The second order Hill number (Jost, 2006; q=2), which will be calculated by default. You can specify additional Hill numbers with adding "hillXX"
to the method
argument, such as "hill3"
for (q=3). The first Hill number is defined as the exponentiad version of Shannon entropy (Eq. 3 in Jost, 2006).
"squares"
: The 'squares' richness estimator of J. Alroy (2018).
"chao2"
: The Chao2 estimator for incidence-based data.
"SCOR"
: The Sum Common Species Occurrence rate of Hannisdal et al. (2012). This method will only be calculated if the occurrence entries (vector)
a collection vector is provided (see examples).
A named numeric vector.
Alroy, J. 2018. Limits to species richness in terrestrial communities. Ecology Letters.
Hannisdal, B., Henderiks, J., & Liow, L. H. (2012). Long-term evolutionary and ecological responses of calcifying phytoplankton to changes in atmospheric CO2. Global Change Biology, 18(12), 3504–3516. https://doi.org/10.1111/gcb.12007
Jost, L. (2006). Entropy and diversity. Oikos, 113, 363–375. https://doi.org/10.1111/j.2006.0030-1299.14714.x
# the coral data data(corals) # Pleistocene subset plei <- corals[corals$stg==94,] # calculate everything pleiIndex<-indices(plei$genus, plei$coll)
# the coral data data(corals) # Pleistocene subset plei <- corals[corals$stg==94,] # calculate everything pleiIndex<-indices(plei$genus, plei$coll)
Lists of entries treated as indicators of similar characteristics
data(keys)
data(keys)
A list
of 7 list
s:
tenInt
A list
of vector
s. Entries in the early_interval
and late_interval
variables of PaleoDB downloads indicate the collections' positions in the dynamic time scale. These entries were linked to 10 million year-resolution time scale stored in tens
. These links were compiled using a download from the FossilWorks website (used to be http://www.fossilworks.org/
), on 08 June, 2018. You can check the lookup table stratkeys
here. This is version 0.9.2
stgInt
A list
of vector
s. Entries in the early_interval
and late_interval
variables of PaleoDB downloads indicate the collections' positions in the dynamic time scale. These entries were linked to stage-resolution time scale stored in stages
. See binInt
for version information.
These entries are reliable only in the Post-Ordovician!
reefs
A list
of vector
s. Entries in the environment
field of the PaleoDB download indicate information regarding the likely reefal origin of carbonatic rocks. See the vignette ('§PhaneroCurve') on the exact use of these data. v0.9.
lith
A list
of vector
s. Entries in the lithology1
field of the PaleoDB download indicate information regarding the substrate of the embedding rocks. This key maps the entries to siliciclastic
, "carbonate"
or "unknown"
substrates. v0.9.
lat
A list
of vector
s. Entries in the paleolat
field of the PaleoDB download indicate information regarding paleolatitude of the occurrences. This key maps the entries to "tropical"
or "non-tropical"
latitudes. v0.9.
grain
A list
of vector
s. Entries in the lithology1
field of the PaleoDB download indicate information regarding the grain sizes of the depositional environment. This key maps the entries to "coarse"
, "fine"
or "unknown"
grain sizes. v0.9.
depenv
A list
of vector
s. Entries in the environment
field of the PaleoDB download indicate information regarding the onshore-offshore nature of the depositional environment. This key maps the entries to "onshore"
, "offshore"
or "unknown"
environment. v0.9.3
Entries in the stratigraphic, lithological and environment fields of current Paleobiology Database downloads are too numerous to form the basis of analyses without transformations.
This variable includes potential groupings of entries that represent similar characteristics. These objects can be used by the categorize
function to create new variables of stratigraphic, environmental and lithological information.
Stratigraphic assignments are based on the download of collection data from Fossilworks (used to be http://www.fossilworks.org/
) and the dynamic time scale of the Paleobiology Database, written by J. Alroy. The assignment of numeric values were done by A. Kocsis. Environmental variables were grouped by W. Kiessling.
The function takes a variable x
(e.g. a vector or a list object), and reorders it to best match the dates provided in a vector y
.
matchtime(x, y, ...) ## S4 method for signature 'numeric' matchtime(x, y, index = FALSE, ...) ## S4 method for signature 'character' matchtime(x, y, index = FALSE, ...) ## S4 method for signature 'list' matchtime(x, y, index = FALSE, ...)
matchtime(x, y, ...) ## S4 method for signature 'numeric' matchtime(x, y, index = FALSE, ...) ## S4 method for signature 'character' matchtime(x, y, index = FALSE, ...) ## S4 method for signature 'list' matchtime(x, y, index = FALSE, ...)
x |
Object to be reordered to match |
y |
( |
... |
Additional arguments passed to class-specific methods. |
index |
( |
An object of the class as x
or a numeric
vector.
# original vector orig <- 1:10 # target values targ <- c(5.1,4.2, 3.4, 2.7, 2.3) # how do the two series match the best? matchtime(orig, targ)
# original vector orig <- 1:10 # target values targ <- c(5.1,4.2, 3.4, 2.7, 2.3) # how do the two series match the best? matchtime(orig, targ)
This function takes an occurrence dataset and reformats it to a table that can be used as input for logistic models.
modeltab( x, tax, bin, taxvars = NULL, rt = FALSE, singletons = FALSE, probs = NULL )
modeltab( x, tax, bin, taxvars = NULL, rt = FALSE, singletons = FALSE, probs = NULL )
x |
|
tax |
|
bin |
|
taxvars |
|
rt |
|
singletons |
|
probs |
|
Every entry in the output table corresponds to one cell in the bin
/tax
matrix. This function omits duplicates and concatenates two logical
vectors (response variables) to the occurrence dataset:
The ori
vector is TRUE
in the interval when the taxon first appeared, and FALSE
in all others. The ext
vector is TRUE
in the interval the taxon appeared for the last time, and FALSE
in the rest.
The true date of extinction and origination is unknown, therefore these events can only be expressed as probabilities. The argument probs
allows the replacement of a binary response with two probability values, which are based on the apparent sampling patterns. For extinctions, when probs
is set to "samp3t"
, the response parameter for extinctions in the last bin of appearance is set to the three-timer sampling compelteness of the following bin. Assuming that the taxon'as range offset is not larger than a whole bin, if the taxon did not go extinct in the bin in which it appeared the last time, it is assumed to be going extinct in the following bin, and the remainder (1 - sampling completeness) is assigned to that bin. The pattern is reversed for originations. For probs="sampRange"
, the range-based completeness measures are applied in a similar fashion. For Phanerozoic-scale analyses, a whole bin difference between apparent event and the actual event is reasonable. See more in Reddin et al. 2021. Note that the response probabilities are set to missing values (NA
s) when the probabilities cannot be calculated. The variable ext
is also set to NaN
for the early virtual extension of the range, and ori
is treated the same for the late-extension.
References:
Reddin, C. J., Kocsis, Á. T., Aberhan, M., & Kiessling, W. (2021). Victims of ancient hyperthermal events herald the fates of marine clades and traits under global warming. Global Change Biology, 27(4), 868–878. https://doi.org/10.1111/gcb.15434
A data.frame with binary response variables.
# load necessary data data(corals) # simple table modTab<-modeltab(corals, bin="stg", tax="genus", taxvars=c("ecology", "family")) # probabilities for extinction modeling modTab2 <- modeltab(corals, bin="stg", tax="genus", probs="samp3t") # only extinction response (omit virtual origination extensions) extTab <- modTab2[!is.nan(modTab2$ext), ] # only extinction response (omit virtual extinction extensions) oriTab <- modTab2[!is.nan(modTab2$ori), ]
# load necessary data data(corals) # simple table modTab<-modeltab(corals, bin="stg", tax="genus", taxvars=c("ecology", "family")) # probabilities for extinction modeling modTab2 <- modeltab(corals, bin="stg", tax="genus", probs="samp3t") # only extinction response (omit virtual origination extensions) extTab <- modTab2[!is.nan(modTab2$ext), ] # only extinction response (omit virtual extinction extensions) oriTab <- modTab2[!is.nan(modTab2$ori), ]
Function to quickly omit single-collection and single-reference taxa.
omit( x, om = "ref", tax = "genus", bin = "bin", coll = NULL, ref = NULL, filterNA = FALSE )
omit( x, om = "ref", tax = "genus", bin = "bin", coll = NULL, ref = NULL, filterNA = FALSE )
x |
|
om |
|
tax |
|
bin |
|
coll |
|
ref |
|
filterNA |
|
The function returns a logical
vector, with a value for each row. TRUE
values indicate rows to be omitted, FALSE
values indicate rows to be kept. The function is embedded in the divDyn
function, but can be called independently.
A logical vector.
# omit single-reference taxa data(corals) data(stages) toOmit <- omit(corals, bin="stg", tax="genus", om="ref", ref="reference_no") x <- corals[!toOmit,] # within divDyn # plotting tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,230)) # multiple ref/slice required ddNoSing <- divDyn(corals, tax="genus", bin="stg", om="binref", ref="reference_no") lines(stages$mid, ddNoSing$divRT, lwd=2, col="red") # with the recent included (NA reference value) ddNoSingRec <- divDyn(corals, tax="genus", bin="stg", om="binref", filterNA=TRUE,ref="reference_no") lines(stages$mid, ddNoSingRec$divRT, lwd=2, col="blue") # legend legend("topleft", legend=c("no single-ref. taxa", "no single-ref. taxa,\n with recent"), col=c("red", "blue"), lwd=c(2,2))
# omit single-reference taxa data(corals) data(stages) toOmit <- omit(corals, bin="stg", tax="genus", om="ref", ref="reference_no") x <- corals[!toOmit,] # within divDyn # plotting tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,230)) # multiple ref/slice required ddNoSing <- divDyn(corals, tax="genus", bin="stg", om="binref", ref="reference_no") lines(stages$mid, ddNoSing$divRT, lwd=2, col="red") # with the recent included (NA reference value) ddNoSingRec <- divDyn(corals, tax="genus", bin="stg", om="binref", filterNA=TRUE,ref="reference_no") lines(stages$mid, ddNoSingRec$divRT, lwd=2, col="blue") # legend legend("topleft", legend=c("no single-ref. taxa", "no single-ref. taxa,\n with recent"), col=c("red", "blue"), lwd=c(2,2))
This function plots the changing shares of categories in association with an independent variable.
parts( x, b = NULL, ord = "up", prop = FALSE, plot = TRUE, col = NULL, xlim = NULL, border = NULL, ylim = c(0, 1), na.valid = FALSE, labs = TRUE, labs.args = NULL, vertical = FALSE )
parts( x, b = NULL, ord = "up", prop = FALSE, plot = TRUE, col = NULL, xlim = NULL, border = NULL, ylim = c(0, 1), na.valid = FALSE, labs = TRUE, labs.args = NULL, vertical = FALSE )
x |
|
b |
( |
ord |
|
prop |
|
plot |
|
col |
|
xlim |
|
border |
|
ylim |
|
na.valid |
|
labs |
|
labs.args |
|
vertical |
|
This function is useful for displaying the changing proportions of a category as time progresses. Check out the examples for the most frequent implementations.
To be added: missing portions are omitted in this version, but should be represented as gaps in the polygons.
The function has no return value.
# dummy examples # independent variable slc<-c(rep(1, 5), rep(2,7), rep(3,6)) # the categories as they change v1<-c("a", "a", "b", "c", "c") # 1 v2<-c("a", "b", "b", "b", "c", "d", "d") # 2 v3<-c("a", "a", "a", "c", "c", "d") #3 va<-c(v1, v2,v3) # basic function plot(NULL, NULL, ylim=c(0,1), xlim=c(0.5, 3.5)) parts(slc, va, prop=TRUE) # vertical plot plot(NULL, NULL, xlim=c(0,1), ylim=c(0.5, 3.5)) parts(slc, va, col=c("red" ,"blue", "green", "orange"), xlim=c(0.5,3.5), labs=TRUE, prop=TRUE, vertical=TRUE) # intensive argumentation plot(NULL, NULL, ylim=c(0,10), xlim=c(0.5, 3.5)) parts(slc, va, ord=c("b", "c", "d", "a"), col=c("red" ,"blue", "green", "orange"), xlim=c(0.5,3.5), labs=TRUE, prop=FALSE, labs.args=list(cex=1.3, col=c("black", "orange", "red", "blue"))) # just the values parts(slc, va, prop=TRUE,plot=FALSE) # real example # the proportion of coral occurrences through time in terms of bathymetry data(corals) data(stages) # time scale plot tsplot(stages, shading="series", boxes="sys", xlim=c(250,0), ylab="proportion of occurrences", ylim=c(0,1)) # plot of proportions cols <- c("#55555588","#88888888", "#BBBBBB88") types <- c("uk", "shal", "deep") parts(x=stages$mid[corals$stg], b=corals$bath, ord=types, col=cols, prop=TRUE,border=NA, labs=FALSE) # legend legend("left", inset=c(0.1,0), legend=c("unknown", "shallow", "deep"), fill=cols, bg="white", cex=1.4)
# dummy examples # independent variable slc<-c(rep(1, 5), rep(2,7), rep(3,6)) # the categories as they change v1<-c("a", "a", "b", "c", "c") # 1 v2<-c("a", "b", "b", "b", "c", "d", "d") # 2 v3<-c("a", "a", "a", "c", "c", "d") #3 va<-c(v1, v2,v3) # basic function plot(NULL, NULL, ylim=c(0,1), xlim=c(0.5, 3.5)) parts(slc, va, prop=TRUE) # vertical plot plot(NULL, NULL, xlim=c(0,1), ylim=c(0.5, 3.5)) parts(slc, va, col=c("red" ,"blue", "green", "orange"), xlim=c(0.5,3.5), labs=TRUE, prop=TRUE, vertical=TRUE) # intensive argumentation plot(NULL, NULL, ylim=c(0,10), xlim=c(0.5, 3.5)) parts(slc, va, ord=c("b", "c", "d", "a"), col=c("red" ,"blue", "green", "orange"), xlim=c(0.5,3.5), labs=TRUE, prop=FALSE, labs.args=list(cex=1.3, col=c("black", "orange", "red", "blue"))) # just the values parts(slc, va, prop=TRUE,plot=FALSE) # real example # the proportion of coral occurrences through time in terms of bathymetry data(corals) data(stages) # time scale plot tsplot(stages, shading="series", boxes="sys", xlim=c(250,0), ylab="proportion of occurrences", ylim=c(0,1)) # plot of proportions cols <- c("#55555588","#88888888", "#BBBBBB88") types <- c("uk", "shal", "deep") parts(x=stages$mid[corals$stg], b=corals$bath, ord=types, col=cols, prop=TRUE,border=NA, labs=FALSE) # legend legend("left", inset=c(0.1,0), legend=c("unknown", "shallow", "deep"), fill=cols, bg="white", cex=1.4)
Visualization of occurrence data
ranges( dat, bin = NULL, tax = NULL, xlim = NULL, ylim = c(0, 1), total = "", filt = "include", occs = FALSE, labs = FALSE, decreasing = TRUE, group = NULL, gap = 0, labels.args = NULL, ranges.args = NULL, occs.args = NULL, total.args = NULL )
ranges( dat, bin = NULL, tax = NULL, xlim = NULL, ylim = c(0, 1), total = "", filt = "include", occs = FALSE, labs = FALSE, decreasing = TRUE, group = NULL, gap = 0, labels.args = NULL, ranges.args = NULL, occs.args = NULL, total.args = NULL )
dat |
|
bin |
( |
tax |
( |
xlim |
( |
ylim |
( |
total |
( |
filt |
( |
occs |
( |
labs |
( |
decreasing |
( |
group |
( |
gap |
( |
labels.args |
( |
ranges.args |
( |
occs.args |
( |
total.args |
( |
This function will draw a visual representation of the occurrence dataset. The interpolated ranges will be drawn, as well as the occurrence points.
The function has no return value.
# import data(stages) data(corals) # all ranges - using the age uncertainties of the occurrences tsplot(stages, boxes="sys", xlim=c(250,0)) ranges(corals, bin=c("max_ma", "min_ma"), tax="genus", occs=FALSE) # or use single estimates: assign age esimates to the occurrences corals$est<-stages$mid[corals$stg] # all ranges (including the recent!!) tsplot(stages, boxes="sys", xlim=c(250,0)) ranges(corals, bin="est", tax="genus", occs=FALSE) # closing on the Cretaceous, with occurrences tsplot(stages, boxes="series", xlim=c(145,65), shading="short") ranges(corals, bin="est", tax="genus", occs=TRUE, ranges.args=list(lwd=0.1)) # z and az separately tsplot(stages, boxes="series", xlim=c(145,65), shading="short") ranges(corals, bin="est", tax="genus", occs=FALSE, group="ecology", ranges.args=list(lwd=0.1)) # same, show only taxa that originate within the interval tsplot(stages, boxes="series", xlim=c(105,60), shading="short") ranges(corals, bin="est", tax="genus", occs=TRUE, group="ecology", filt="orig" , labs=TRUE, labels.args=list(cex=0.5)) # same using the age uncertainties of the occurrence age estimates tsplot(stages, boxes="series", xlim=c(105,60), shading="short") ranges(corals, bin=c("max_ma", "min_ma"), tax="genus", occs=TRUE, group="ecology", filt="orig" , labs=TRUE, labels.args=list(cex=0.5)) # fully customized/ annotated tsplot(stages, boxes="series", xlim=c(105,60), shading="short") ranges( corals, # dataset bin="est", # bin column tax="genus", # taxon column occs=TRUE, # occurrence points will be plotted group="growth", # separate ranges based on growth types filt="orig" , # show only taxa that originate in the interval ranges.args=list( lwd=1, # set range width to 1 col=c("darkgreen", "darkred") # set color of the ranges (by groups) ), total.args=list( cex=2, # set the size of the group identifier lablels col=c("darkgreen", "darkred") # set the color of the group identifier labels ), occs.args=list( col=c("darkgreen", "darkred"), pch=3 ), labs=TRUE, # taxon labels will be plotted labels.args=list( cex=0.4, # the sizes of the taxon labels col=c("darkgreen", "darkred") # set the color of the taxon labels by group ) )
# import data(stages) data(corals) # all ranges - using the age uncertainties of the occurrences tsplot(stages, boxes="sys", xlim=c(250,0)) ranges(corals, bin=c("max_ma", "min_ma"), tax="genus", occs=FALSE) # or use single estimates: assign age esimates to the occurrences corals$est<-stages$mid[corals$stg] # all ranges (including the recent!!) tsplot(stages, boxes="sys", xlim=c(250,0)) ranges(corals, bin="est", tax="genus", occs=FALSE) # closing on the Cretaceous, with occurrences tsplot(stages, boxes="series", xlim=c(145,65), shading="short") ranges(corals, bin="est", tax="genus", occs=TRUE, ranges.args=list(lwd=0.1)) # z and az separately tsplot(stages, boxes="series", xlim=c(145,65), shading="short") ranges(corals, bin="est", tax="genus", occs=FALSE, group="ecology", ranges.args=list(lwd=0.1)) # same, show only taxa that originate within the interval tsplot(stages, boxes="series", xlim=c(105,60), shading="short") ranges(corals, bin="est", tax="genus", occs=TRUE, group="ecology", filt="orig" , labs=TRUE, labels.args=list(cex=0.5)) # same using the age uncertainties of the occurrence age estimates tsplot(stages, boxes="series", xlim=c(105,60), shading="short") ranges(corals, bin=c("max_ma", "min_ma"), tax="genus", occs=TRUE, group="ecology", filt="orig" , labs=TRUE, labels.args=list(cex=0.5)) # fully customized/ annotated tsplot(stages, boxes="series", xlim=c(105,60), shading="short") ranges( corals, # dataset bin="est", # bin column tax="genus", # taxon column occs=TRUE, # occurrence points will be plotted group="growth", # separate ranges based on growth types filt="orig" , # show only taxa that originate in the interval ranges.args=list( lwd=1, # set range width to 1 col=c("darkgreen", "darkred") # set color of the ranges (by groups) ), total.args=list( cex=2, # set the size of the group identifier lablels col=c("darkgreen", "darkred") # set the color of the group identifier labels ), occs.args=list( col=c("darkgreen", "darkred"), pch=3 ), labs=TRUE, # taxon labels will be plotted labels.args=list( cex=0.4, # the sizes of the taxon labels col=c("darkgreen", "darkred") # set the color of the taxon labels by group ) )
This function will determine whether there are meaningful differences between the taxonomic rates in the individual time bins of two subsets of an occurrence database.
ratesplit( x, sel, tax = "genus", bin = "stg", rate = "pc", method = "AIC", AICc = TRUE, na.rm = TRUE, alpha = NULL, output = "simple" )
ratesplit( x, sel, tax = "genus", bin = "stg", rate = "pc", method = "AIC", AICc = TRUE, na.rm = TRUE, alpha = NULL, output = "simple" )
x |
|
sel |
|
tax |
|
bin |
|
rate |
|
method |
|
AICc |
|
na.rm |
|
alpha |
|
output |
|
Splitting an occurrence database to its subsets secreases the amount of information passed to the rate calculations and therefore the precision of the individual estimates. Therefore, our ability to tell apart two similar values decreases with the number of sampled taxa. In order to assess the subsets individually and compare them, it is advised to test whether the split into two subsets is meaningful, given the total data. Examples of this use can be found in Kiessling and Simpson (2011) and Kiessling and Kocsis (2015).
The meaningfulness of the split is dependent on the estimate accurracy and the magnitude of the difference. Two different methods are implemented: binom
and combine
.
References
Foote, M. (1999) Morphological Diversity In The Evolutionary Radiation Of Paleozoic and Post-Paleozoic Crinoids. Paleobiology 25, 1–115. doi:10.1017/S0094837300020236.
Kiessling, W., & Simpson, C. (2011). On the potential for ocean acidification to be a general cause of ancient reef crises. Global Change Biology, 17(1), 56-67.
Kiessling, W., & Kocsis, A. T. (2015). Biodiversity dynamics and environmental occupancy of fossil azooxanthellate and zooxanthellate scleractinian corals. Paleobiology, 41(3), 402-414.
A list of two numeric vectors.
# example with the coral dataset of Kiessling and Kocsis (2015) data(corals) data(stages) # split by ecology z<-corals[corals$ecology=="z",] az<-corals[corals$ecology=="az",] # calculate diversity dynamics ddZ<-divDyn(z, tax="genus", bin="stg") ddAZ<-divDyn(az, tax="genus", bin="stg") # origination rate plot tsplot(stages, boxes="sys", shading="series", xlim=54:95, ylab="raw per capita originations") lines(stages$mid, ddZ$oriPC, lwd=2, lty=1, col="blue") lines(stages$mid, ddAZ$oriPC, lwd=2, lty=2, col="red") legend("topright", inset=c(0.1,0.1), legend=c("z", "az"), lwd=2, lty=c(1,2), col=c("blue", "red"), bg="white") # The ratesplit function rs<-ratesplit(rbind(z, az), sel="ecology", tax="genus", bin="stg") rs # display selectivity with points # select the higher rates selIntervals<-cbind(ddZ$oriPC[rs$ori], ddAZ$oriPC[rs$ori]) groupSelector<-apply(selIntervals, 1, function(w) w[1]<w[2]) # draw the points points(stages$mid[rs$ori[groupSelector]], ddAZ$oriPC[rs$ori[groupSelector]], pch=16, col="red", cex=2) points(stages$mid[rs$ori[!groupSelector]], ddZ$oriPC[rs$ori[!groupSelector]], pch=16, col="blue", cex=2)
# example with the coral dataset of Kiessling and Kocsis (2015) data(corals) data(stages) # split by ecology z<-corals[corals$ecology=="z",] az<-corals[corals$ecology=="az",] # calculate diversity dynamics ddZ<-divDyn(z, tax="genus", bin="stg") ddAZ<-divDyn(az, tax="genus", bin="stg") # origination rate plot tsplot(stages, boxes="sys", shading="series", xlim=54:95, ylab="raw per capita originations") lines(stages$mid, ddZ$oriPC, lwd=2, lty=1, col="blue") lines(stages$mid, ddAZ$oriPC, lwd=2, lty=2, col="red") legend("topright", inset=c(0.1,0.1), legend=c("z", "az"), lwd=2, lty=c(1,2), col=c("blue", "red"), bg="white") # The ratesplit function rs<-ratesplit(rbind(z, az), sel="ecology", tax="genus", bin="stg") rs # display selectivity with points # select the higher rates selIntervals<-cbind(ddZ$oriPC[rs$ori], ddAZ$oriPC[rs$ori]) groupSelector<-apply(selIntervals, 1, function(w) w[1]<w[2]) # draw the points points(stages$mid[rs$ori[groupSelector]], ddAZ$oriPC[rs$ori[groupSelector]], pch=16, col="red", cex=2) points(stages$mid[rs$ori[!groupSelector]], ddZ$oriPC[rs$ori[!groupSelector]], pch=16, col="blue", cex=2)
This pseudo-generic function iterates a function on the subelements of a list of objects that have the same class and matching dimensions/names and reorganizes the result to match the structure of the replicates or a prototype template.
repmatch(x, FUN = NULL, proto = NULL, direct = c("dim", "name"), ...)
repmatch(x, FUN = NULL, proto = NULL, direct = c("dim", "name"), ...)
x |
( |
FUN |
( |
proto |
( |
direct |
( |
... |
arguments passed to |
The function is designed to unify/merge objects that result from the same function applied to different source data (e.g. the results of subsample()
). In its current form, the function supports vectors
(including one-dimensional tables
and arrays
), matrix
and data.frame
objects.
If FUN
is a function
, the output is vector
for vector
-like replicates, matrix
when x
is a list
of matrix
objects, and data.frame
s for data.frame
replicates. In case FUN=NULL
: if x
is a list of vectors
, the function will return a matrix
; an array
is returned, if x
is a list
of matrix
class obejcts; if x
is a list of data.frame
objects, the function returns a data.frame
.
# basic example vect <- rnorm(100) # make 50 replicates repl <- rep(list(vect), 50) repmatch(repl, FUN=mean, direct="dim") # named input # two vectors # a a<- 1:10 names(a) <- letters[1:length(a)] a[c(3,5,8)] <- NA a <- a[!is.na(a)] #b b<- 10:1 names(b) <- letters[length(b):1] b[c(1, 3,6, length(b))]<- NA b <- b[!is.na(b)] # list x2 <- rep(c(list(a),list(b)), 3) # simple match - falling through "dim" to "name" directive repmatch(x2, FUN=NULL) # prototyped prot <- 1:10 names(prot) <-letters[1:10] repmatch(x2, FUN=mean, proto=prot, na.rm=TRUE)
# basic example vect <- rnorm(100) # make 50 replicates repl <- rep(list(vect), 50) repmatch(repl, FUN=mean, direct="dim") # named input # two vectors # a a<- 1:10 names(a) <- letters[1:length(a)] a[c(3,5,8)] <- NA a <- a[!is.na(a)] #b b<- 10:1 names(b) <- letters[length(b):1] b[c(1, 3,6, length(b))]<- NA b <- b[!is.na(b)] # list x2 <- rep(c(list(a),list(b)), 3) # simple match - falling through "dim" to "name" directive repmatch(x2, FUN=NULL) # prototyped prot <- 1:10 names(prot) <-letters[1:10] repmatch(x2, FUN=mean, proto=prot, na.rm=TRUE)
seqduplicated()
The function determines which elements of a vector are duplicates (similarly to duplicated
) in consecutive rows.
collapse()
Omits duplicates similarly to unique
, but only in consecutive rows, so the sequence of state changes remains, but without duplicates.
seqduplicated(x, na.rm = FALSE, na.breaks = TRUE) collapse(x, na.rm = FALSE, na.breaks = TRUE)
seqduplicated(x, na.rm = FALSE, na.breaks = TRUE) collapse(x, na.rm = FALSE, na.breaks = TRUE)
x |
( |
na.rm |
( |
na.breaks |
( |
These functions are essentially about checking whether a value in a vector at index is the same as the value at the previous index. This seamingly primitive task had to be rewritten with Rcpp for speed and the appropriate handling of NA
values.
A logical vector.
# example vector examp <- c(4,3,3,3,2,2,1,NA,3,3,1,NA,NA,5, NA, 5) # seqduplicated() seqduplicated(examp) # contrast with duplicated(examp) # with NA removal seqduplicated(examp, na.rm=TRUE) # the same with collapse() collapse(examp) # contrast with unique(examp) # with NA removal collapse(examp, na.rm=TRUE) # with NA removal, no breaking collapse(examp, na.rm=TRUE, na.breaks=FALSE)
# example vector examp <- c(4,3,3,3,2,2,1,NA,3,3,1,NA,NA,5, NA, 5) # seqduplicated() seqduplicated(examp) # contrast with duplicated(examp) # with NA removal seqduplicated(examp, na.rm=TRUE) # the same with collapse() collapse(examp) # contrast with unique(examp) # with NA removal collapse(examp, na.rm=TRUE) # with NA removal, no breaking collapse(examp, na.rm=TRUE, na.breaks=FALSE)
This intermediate-level function will plot a time series with the quantiles shown with transparency values.
shades( x, y, col = "black", res = 10, border = NA, interpolate = FALSE, method = "symmetric", na.rm = FALSE )
shades( x, y, col = "black", res = 10, border = NA, interpolate = FALSE, method = "symmetric", na.rm = FALSE )
x |
|
y |
|
col |
|
res |
|
border |
|
interpolate |
|
method |
|
na.rm |
|
The function has no return value.
# some random values accross the Phanerozoic data(stages) tsplot(stages, boxes="sys", shading="series", ylim=c(-5,5), ylab=c("normal distributions")) randVar <- t(sapply(1:95, FUN=function(x){rnorm(150, 0,1)})) shades(stages$mid, randVar, col="blue", res=10,method="symmetric") # a bottom-bounded distribution (log normal) tsplot(stages, boxes="sys", shading="series", ylim=c(0,30), ylab="log-normal distributions") randVar <- t(sapply(1:95, FUN=function(x){rlnorm(150, 0,1)})) shades(stages$mid, randVar, col="blue", res=c(0,0.33, 0.66, 1),method="decrease")
# some random values accross the Phanerozoic data(stages) tsplot(stages, boxes="sys", shading="series", ylim=c(-5,5), ylab=c("normal distributions")) randVar <- t(sapply(1:95, FUN=function(x){rnorm(150, 0,1)})) shades(stages$mid, randVar, col="blue", res=10,method="symmetric") # a bottom-bounded distribution (log normal) tsplot(stages, boxes="sys", shading="series", ylim=c(0,30), ylab="log-normal distributions") randVar <- t(sapply(1:95, FUN=function(x){rlnorm(150, 0,1)})) shades(stages$mid, randVar, col="blue", res=c(0,0.33, 0.66, 1),method="decrease")
The function returns lists of taxa that occurr with only one particular entry in a given variable.
singletons( dat, tax = "clgen", var = NULL, bin = NULL, bybin = FALSE, na.rm = TRUE )
singletons( dat, tax = "clgen", var = NULL, bin = NULL, bybin = FALSE, na.rm = TRUE )
dat |
( |
tax |
( |
var |
( |
bin |
( |
bybin |
( |
na.rm |
( |
Singletons are defined in number of ways in the literature. True singletons are species that are represented by only one specimen, but one can talk about single-occurrence, single-interval, single-reference or single collection taxa as well. These can be returned with this function.
As the time bin has particular importance, it is possible to filter singleton taxa in the context of a single bin. These can be returned with the bybin
argument, that constrains and iterates the filtering to every bin.
If this argument is set to TRUE
and the variable in question is a references, than single-reference taxa will be taxa that occurred in only one reference within each bin - it does not necessarily mean that only one reference describes the taxon in the total database!
A vector of character entries in tax
.
# load example dataset data(corals) # Example 1. single-occurrence taxa singOcc <- singletons(corals, tax="genus", bin="stg") # Example 2. output for every bin singOccBin <- singletons(corals, tax="genus", bin="stg", bybin=TRUE) # Example 3. single-interval taxa (all) singInt <- singletons(corals, tax="genus", var="stg") # Example 4. single interval taxa (for every bin) singIntBin <- singletons(corals, tax="genus", var="stg", bin="stg", bybin=TRUE) # Example 5. single reference taxa (total dataset) singRef <- singletons(corals, tax="genus", var="reference_no") # Example 6. single reference taxa (see description for differences ) singRefBin <- singletons(corals, tax="genus", var="reference_no", bin="stg", bybin=TRUE)
# load example dataset data(corals) # Example 1. single-occurrence taxa singOcc <- singletons(corals, tax="genus", bin="stg") # Example 2. output for every bin singOccBin <- singletons(corals, tax="genus", bin="stg", bybin=TRUE) # Example 3. single-interval taxa (all) singInt <- singletons(corals, tax="genus", var="stg") # Example 4. single interval taxa (for every bin) singIntBin <- singletons(corals, tax="genus", var="stg", bin="stg", bybin=TRUE) # Example 5. single reference taxa (total dataset) singRef <- singletons(corals, tax="genus", var="reference_no") # Example 6. single reference taxa (see description for differences ) singRefBin <- singletons(corals, tax="genus", var="reference_no", bin="stg", bybin=TRUE)
The function will slices time with a given set of boundaries and produce a time scale object if desired.
slice(x, breaks, offset = 0, ts = TRUE, revtime = TRUE)
slice(x, breaks, offset = 0, ts = TRUE, revtime = TRUE)
x |
( |
breaks |
( |
offset |
( |
ts |
( |
revtime |
( |
Due to stratigraphic constraints, we can only process deep time data, when it is sliced to discrete bins. It is suggested that you do this separately for most of your analyses. This function is also used by the divDyn
function when age
entries are provided.
Either of new entries and levels or time scale.
y<- runif(200, 0,100) au <- slice(y, breaks=seq(0, 100, 10)) withOut <- slice(y, breaks=seq(0, 100, 10), ts=FALSE)
y<- runif(200, 0,100) au <- slice(y, breaks=seq(0, 100, 10)) withOut <- slice(y, breaks=seq(0, 100, 10), ts=FALSE)
Stage-level (age-level) timescale used in some analyses.
data(stages)
data(stages)
A data.frame
with 95 observations and 10 variables:
sys
Abbreviations of geologic systems.
system
Geologic periods.
series
Geologic series.
stage
Names of geologic stages.
short
Abbreviations of geologic stages.
bottom
Numeric ages of the bottoms boundaries (earliest ages) of the bins.
mid
Numeric age midpoints of the bins, the averages of bottom
and top
.
top
Numeric ages of the tops (latest ages) of the bins.
dur
Numeric ages of the durations for the bins.
stg
Integer number identifiers of the bins.
systemCol
Hexadecimal color code of the systems.
seriesCol
Hexadecimal color code of the series.
col
Hexadecimal color code of the stages.
This is an example time scale object that can be used in the Phanerozoic-scale analyses. Example occurrence datasets related to the package use the variable stg
when referring to this timescale. This version uses the longer Rhaetian option.
Gradstein, F. M., Ogg, J. G., & Schmitz, M. D. (2020). The geologic time scale 2020. Elsevier.
Based on Gradstein et al. (2020).
Stage-level (age-level) timescale used in some analyses.
data(stages2018)
data(stages2018)
A data.frame
with 95 observations and 10 variables:
sys
Abbreviations of geologic systems.
system
Geologic periods.
series
Geologic series.
stage
Names of geologic stages.
short
Abbreviations of geologic stages.
bottom
Numeric ages of the bottoms boundaries (earliest ages) of the bins.
mid
Numeric age midpoints of the bins, the averages of bottom
and top
.
top
Numeric ages of the tops (latest ages) of the bins.
dur
Numeric ages of the durations for the bins.
stg
Integer number identifiers of the bins.
systemCol
Hexadecimal color code of the systems.
seriesCol
Hexadecimal color code of the series.
col
Hexadecimal color code of the stages.
This is an example time scale object that can be used in the Phanerozoic-scale analyses. Example occurrence datasets related to the package use the variable stg
when referring to this timescale.
This is the stages
object used until divDyn version 0.8.1.
Ogg, J. G., G. Ogg, and F. M. Gradstein. 2016. A concise geologic time scale: 2016. Elsevier.
Based on Ogg et al. (2016), compiled by Wolfgang Kiessling.
Table including the user-chosen interval data and the stratigraphic units of the dynamic timescale.
data(stratkeys)
data(stratkeys)
A data.frame
with 761 observations of 8 variables:
interval
The names of the registered intervals in the early_interval
/max_interval
and late_interval
/min_interval
columns.
period
The period containing the interval.
epoch
The epoch containing the interval.
X10_my_bin
The 10 million year time scale interval containing the interval.
ten
Numeric identifier of the 10 million year interval in the tens
object.
stage
The stage containing the interval.
stg
Numeric identifier of the interval in the stage-level time scale provided as stages
object.
Since the separation of the FossilWorks (used to be http://www.fossilworks.org/
) portal from the Paleobiology Database (https://paleobiodb.org/) the access to the stratigraphic information in the database have been problematic. This table includes groupings of
early_interval
/max_interval
entries of the dynamic timescale that users can choose during collection entry. The table assigns these intervals to some corresponding stratigraphic units from different time scales.
These entries were distilled from those collections that only have a max_interval
value. As there is a mismatch between the data Paleobiology Database and FossilWorks this list is not comprehensive and a couple entries are probably missing. For this reason, this dataset is expected to be updated in the future.
This particular version (v0.9.2) is based on a download of all collections in FossilWorks between the Ediacaran and the Holocene. The download took place on 22 June, 2018. The entries were transformed to keys
to be used with the categorize
function. Some entries were corrected manually.
Used to be http://www.fossilworks.org/
.
The function returns where the continuous streaks start and how long they are, which can be used for efficient and flexible subsetting.
streaklog(x) whichmaxstreak(x, which = -1)
streaklog(x) whichmaxstreak(x, which = -1)
x |
( |
which |
|
The output list of streaklog
contains the following elements:
starts
: the indices where the streaks start.
streaks
: the lengths of the individual streaks (number of values).
runs
: the number of streaks.
The function whichmaxstreak() will return the indices of those values that are in the longest continuous streak.
A list (streaklog) or a numeric vector (whichmaxstreak).
# generate a sequence of values b<-40:1 # add some gaps b[c(1:4, 15, 19, 23:27)] <- NA # the functions streaklog(b) whichmaxstreak(b)
# generate a sequence of values b<-40:1 # add some gaps b[c(1:4, 15, 19, 23:27)] <- NA # the functions streaklog(b) whichmaxstreak(b)
The function will take a function that has an occurrence dataset as an argument, and reruns it iteratively on the subsets of the dataset.
subsample( x, q, tax = NULL, bin = NULL, FUN = divDyn, coll = NULL, iter = 50, type = "cr", keep = NULL, rem = NULL, duplicates = TRUE, output = "arit", useFailed = FALSE, FUN.args = NULL, na.rm = FALSE, counter = TRUE, ... )
subsample( x, q, tax = NULL, bin = NULL, FUN = divDyn, coll = NULL, iter = 50, type = "cr", keep = NULL, rem = NULL, duplicates = TRUE, output = "arit", useFailed = FALSE, FUN.args = NULL, na.rm = FALSE, counter = TRUE, ... )
x |
( |
q |
( |
tax |
( |
bin |
( |
FUN |
( |
coll |
( |
iter |
( |
type |
( |
keep |
( |
rem |
( |
duplicates |
( |
output |
( |
useFailed |
( |
FUN.args |
( |
na.rm |
( |
counter |
( |
... |
arguments passed to |
The subsample
function implements the iterative framework of the sampling standardization procedure.
The function 1. takes the dataset x
, 2. runs function FUN
on the dataset and creates a container for results of trials
3. runs one of the subsampling trial functions (e.g. subtrialCR
) to get a subsampled 'trial dataset'
4. runs FUN
on the trial dataset and
5. averages the results of the trials for a simple output of step 4. such as vector
s, matrices
and data.frames
. For averaging, the vectors
and matrices
have to have the same output dimensions in the subsampling, as in the original object. For data.frames
, the bin-specific information have to be in rows and the bin
numbers have to be given in a variable bin
in the output of FUN
.
For a detailed treatment on what the function does, please see the vignette ('Handout to the R package 'divDyn' v0.5.0 for diversity dynamics from fossil occurrence data'). Currently the Classical Rarefaction ("cr"
, Raup, 1975), the occurrence weighted by-list subsampling ("oxw"
, Alroy et al., 2001) and the Shareholder Quorum Subsampling methods are implemented ("sqs"
, Alroy, 2010).
References:
Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.
Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910
Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135
Either a list of replicates or an object matching the class of FUN
.
data(corals) data(stages) # Example 1-calculate metrics of diversity dynamics dd <- divDyn(corals, tax="genus", bin="stg") rarefDD<-subsample(corals,iter=30, q=50, tax="genus", bin="stg", output="dist", keep=95) # plotting tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,230)) lines(stages$mid, dd$divRT, lwd=2) shades(stages$mid, rarefDD$divRT, col="blue") legend("topleft", legend=c("raw","rarefaction"), col=c("black", "blue"), lwd=c(2,2), bg="white") # Example 2-SIB diversity # draft a simple function to calculate SIB diversity sib<-function(x, bin, tax){ calc<-tapply(INDEX=x[,bin], X=x[,tax], function(y){ length(levels(factor(y))) }) return(calc[as.character(stages$stg)]) } sibDiv<-sib(corals, bin="stg", tax="genus") # calculate it with subsampling rarefSIB<-subsample(corals,iter=25, q=50, tax="genus", bin="stg", output="arit", keep=95, FUN=sib) rarefDD<-subsample(corals,iter=25, q=50, tax="genus", bin="stg", output="arit", keep=95) # plot tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="SIB diversity (genera)", ylim=c(0,230)) lines(stages$mid, rarefDD$divSIB, lwd=2, col="black") lines(stages$mid, rarefSIB, lwd=2, col="blue") # Example 3 - different subsampling types with default function (divDyn) # compare different subsampling types # classical rarefaction cr<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", output="dist", keep=95) # by-list subsampling (unweighted) - 3 collections UW<-subsample(corals,iter=25, q=3,tax="genus", bin="stg", coll="collection_no", output="dist", keep=95, type="oxw", xexp=0) # occurrence weighted by list subsampling OW<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", coll="collection_no", output="dist", keep=95, type="oxw", xexp=1) SQS<-subsample(corals,iter=25, q=0.4,tax="genus", bin="stg", output="dist", keep=95, type="sqs") # plot tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,100)) shades(stages$mid, cr$divRT, col="red") shades(stages$mid, UW$divRT, col="blue") shades(stages$mid, OW$divRT, col="green") shades(stages$mid, SQS$divRT, col="cyan") legend("topleft", bg="white", legend=c("CR (20)", "UW (3)", "OW (20)", "SQS (0.4)"), col=c("red", "blue", "green", "cyan"), lty=c(1,1,1,1), lwd=c(2,2,2,2))
data(corals) data(stages) # Example 1-calculate metrics of diversity dynamics dd <- divDyn(corals, tax="genus", bin="stg") rarefDD<-subsample(corals,iter=30, q=50, tax="genus", bin="stg", output="dist", keep=95) # plotting tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,230)) lines(stages$mid, dd$divRT, lwd=2) shades(stages$mid, rarefDD$divRT, col="blue") legend("topleft", legend=c("raw","rarefaction"), col=c("black", "blue"), lwd=c(2,2), bg="white") # Example 2-SIB diversity # draft a simple function to calculate SIB diversity sib<-function(x, bin, tax){ calc<-tapply(INDEX=x[,bin], X=x[,tax], function(y){ length(levels(factor(y))) }) return(calc[as.character(stages$stg)]) } sibDiv<-sib(corals, bin="stg", tax="genus") # calculate it with subsampling rarefSIB<-subsample(corals,iter=25, q=50, tax="genus", bin="stg", output="arit", keep=95, FUN=sib) rarefDD<-subsample(corals,iter=25, q=50, tax="genus", bin="stg", output="arit", keep=95) # plot tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="SIB diversity (genera)", ylim=c(0,230)) lines(stages$mid, rarefDD$divSIB, lwd=2, col="black") lines(stages$mid, rarefSIB, lwd=2, col="blue") # Example 3 - different subsampling types with default function (divDyn) # compare different subsampling types # classical rarefaction cr<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", output="dist", keep=95) # by-list subsampling (unweighted) - 3 collections UW<-subsample(corals,iter=25, q=3,tax="genus", bin="stg", coll="collection_no", output="dist", keep=95, type="oxw", xexp=0) # occurrence weighted by list subsampling OW<-subsample(corals,iter=25, q=20,tax="genus", bin="stg", coll="collection_no", output="dist", keep=95, type="oxw", xexp=1) SQS<-subsample(corals,iter=25, q=0.4,tax="genus", bin="stg", output="dist", keep=95, type="sqs") # plot tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="range-through diversity (genera)", ylim=c(0,100)) shades(stages$mid, cr$divRT, col="red") shades(stages$mid, UW$divRT, col="blue") shades(stages$mid, OW$divRT, col="green") shades(stages$mid, SQS$divRT, col="cyan") legend("topleft", bg="white", legend=c("CR (20)", "UW (3)", "OW (20)", "SQS (0.4)"), col=c("red", "blue", "green", "cyan"), lty=c(1,1,1,1), lwd=c(2,2,2,2))
These functions create one subsampling trial dataset with a desired subsampling method
subtrialCR( x, q, bin = NULL, unit = NULL, keep = NULL, useFailed = FALSE, showFailed = FALSE ) subtrialOXW( x, q, bin = NULL, coll = NULL, xexp = 1, keep = NULL, useFailed = FALSE, showFailed = FALSE ) subtrialSQS( x, tax, q, bin = NULL, coll = NULL, ref = NULL, singleton = "occ", excludeDominant = FALSE, largestColl = FALSE, fcorr = "good", byList = FALSE, keep = NULL, useFailed = FALSE, showFailed = FALSE, appr = "under" )
subtrialCR( x, q, bin = NULL, unit = NULL, keep = NULL, useFailed = FALSE, showFailed = FALSE ) subtrialOXW( x, q, bin = NULL, coll = NULL, xexp = 1, keep = NULL, useFailed = FALSE, showFailed = FALSE ) subtrialSQS( x, tax, q, bin = NULL, coll = NULL, ref = NULL, singleton = "occ", excludeDominant = FALSE, largestColl = FALSE, fcorr = "good", byList = FALSE, keep = NULL, useFailed = FALSE, showFailed = FALSE, appr = "under" )
x |
( |
q |
( |
bin |
( |
unit |
( |
keep |
( |
useFailed |
( |
showFailed |
( |
coll |
( |
xexp |
( |
tax |
( |
ref |
( |
singleton |
|
excludeDominant |
|
largestColl |
|
fcorr |
|
byList |
( |
appr |
( |
The essence of these functions are present within the subsampling wrapper function subsample
. Each function implements a certain subsampling type.
The return value of the funcfions by default is a logical
vector indicating which rows of the original dataset should be present in the subsample.
The inexact method for SQS is implemented here as it is computationally less demanding.
References:
Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fürsich, F. T., … Webber, A. (2001). Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Science, 98(11), 6261-6266.
Alroy, J. (2010). The Shifting Balance of Diversity Among Major Marine Animal Groups. Science, 329, 1191-1194. https://doi.org/10.1126/science.1189910
Raup, D. M. (1975). Taxonomic Diversity Estimation Using Rarefaction. Paleobiology, 1, 333-342. https: //doi.org/10.2307/2400135
A logical vector.
#one classical rarefaction trial data(corals) # return 5 references for each stage bRows<-subtrialCR(corals, bin="stg", unit="reference_no", q=5) # control unCor<-unique(corals[bRows,c("stg", "reference_no")]) table(unCor$stg)
#one classical rarefaction trial data(corals) # return 5 references for each stage bRows<-subtrialCR(corals, bin="stg", unit="reference_no", q=5) # control unCor<-unique(corals[bRows,c("stg", "reference_no")]) table(unCor$stg)
The function calculates global statistics of the entire database
sumstat( x, tax = "genus", bin = "stg", coll = NULL, ref = NULL, duplicates = NULL )
sumstat( x, tax = "genus", bin = "stg", coll = NULL, ref = NULL, duplicates = NULL )
x |
|
tax |
|
bin |
|
coll |
|
ref |
|
duplicates |
|
The function returns the following values.
bins
: The total number of bins sampled.
occs
: The total number of sampled occurrences.
colls
: The total number of sampled collections.
refs
: The total number of sampled references.
taxa
: The total number of sampled taxa.
gappiness
: The proportion of sampling gaps in the ranges of the taxa (without the range-endpoints).
A named numeric vector.
data(corals) sumstat(corals, tax="genus", bin="stg", coll="collection_no", ref="reference_no")
data(corals) sumstat(corals, tax="genus", bin="stg", coll="collection_no", ref="reference_no")
This function will calculate both forward and backward survivorship proportions from a given occurrence dataset or FAD-LAD matrix.
survivors( x, tax = "genus", bin = "stg", method = "forward", noNAStart = FALSE, fl = NULL )
survivors( x, tax = "genus", bin = "stg", method = "forward", noNAStart = FALSE, fl = NULL )
x |
|
tax |
|
bin |
|
method |
|
noNAStart |
|
fl |
|
Proportions of survivorship are great tools to visualize changes in the composition of a group over time (Raup, 1978). The curves show how a once coexisting set of taxa, called a cohort, loses its participants (forward survivorship) as time progress, or gains its elements as time is analyzed backwards. Each value corresponds to a cohort in a bin (a) and one other bin (b). The value expresses what proportion of the analyzed cohort (present together in bin a) is present in bin b.
References:
Raup, D. M. (1978). Cohort analysis of generic survivorship. Paleobiology, 4(1), 1-15.
A numeric matrix of survivorship probabilities.
data(corals) surv<-survivors(corals, tax="genus", bin="stg", method="forward") # plot data(stages) tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="proportion of survivors present", ylim=c(0.01,1),plot.args=list(log="y")) for(i in 1:ncol(surv)) lines(stages$mid, surv[,i])
data(corals) surv<-survivors(corals, tax="genus", bin="stg", method="forward") # plot data(stages) tsplot(stages, shading="series", boxes="sys", xlim=c(260,0), ylab="proportion of survivors present", ylim=c(0.01,1),plot.args=list(log="y")) for(i in 1:ncol(surv)) lines(stages$mid, surv[,i])
The function takes another function and reruns it on every taxon- and/or bin-specific subsets of an occurrence dataset.
tabinate(x, bin = NULL, tax = NULL, FUN = NULL, ...)
tabinate(x, bin = NULL, tax = NULL, FUN = NULL, ...)
x |
|
bin |
|
tax |
|
FUN |
( |
... |
arguments passed to |
The main tabinate
function acts as a wrapper for any type of function that requires a subset of the occurrence dataset that represents either one bin
or one tax
entry or both.
For example, the iterator can be used to calculate geographic ranges from occurrence coordinates (georange
).
The output structure of FUN should be independent from the input subset, or the function will return an error.
Setting both bin
If bin=NULL
and tax=NULL
, will run FUN
on the entire dataset (no effect). Providing either bin
or tax
and keeping the other NULL
will iterate FUN
for every bin
or tax
entry (whichever is presented).
The function returns a vector of values if the return value of FUN
is a single value. In case it is a vector, the final output will be a matrix.
When both bin
and tax
is presented, the function output will be a matrix (one output value for a taxon/bin subset) or an array (3d, when FUN
returns a vector). Setting FUN
to NULL
will return the occurrence dataset as list
s.
The return object depends on the output of FUN
, as well as the bin
and tax
input.
data(corals) # the number of different coordinate pairs in every time slice tabinate(corals, bin="stg", FUN=georange, lat="paleolat", lng="paleolng", method="co") # geographic range (site occupancy) of every taxon in every bin tabinate(corals, bin="stg", tax="genus", FUN=georange, lat="paleolat", lng="paleolng", method="co")
data(corals) # the number of different coordinate pairs in every time slice tabinate(corals, bin="stg", FUN=georange, lat="paleolat", lng="paleolng", method="co") # geographic range (site occupancy) of every taxon in every bin tabinate(corals, bin="stg", tax="genus", FUN=georange, lat="paleolat", lng="paleolng", method="co")
Roughly 10 million year timescale used in some analyses.
data(tens)
data(tens)
A data.frame
with 49 observations and 9 variables:
The name of the bin: Period and number.
The primary state of the oceans from the point of carbonate precipitation. ar
indicates aragonitic, cc
indicates calcitic conditions.
Primary climatic characteristic: w
denotes warm, c
denotes cold.
bottom
Numeric ages of the bottom boundaries (earliest ages) of the bins.
mid
Numeric ages midpoints of the bins, the averages of bottom
and top
.
top
Numeric ages of the tops (latest ages) of the bins.
dur
Numeric ages of the durations of the bins.
ten
Integer number identifiers of the bins. §correct to num!
This is an example time scale object that can be used in the Phanerozoic scale analyses. This time scale comprises 49 bins, roughly 10 million years of durations that result from the combination of certain standard stages.
Executive committee meeting (2015) of old Paleobiology Database. Additional variables were added by Wolfgang Kiessling.
Function to use bars for time series.
tsbars(x, y, width = "max", yref = 0, gap = 0, vertical = TRUE, ...)
tsbars(x, y, width = "max", yref = 0, gap = 0, vertical = TRUE, ...)
x |
|
y |
|
width |
|
yref |
|
gap |
|
vertical |
|
... |
Arguments passed to |
People often present time series with connected points, although the visual depiction implies a certain process that describes how the values change between the points.
Instead of using simple scatter plots, Barplots can be used to describe series where a single value is the most descriptive of a discreet time bin. The tsbars()
function
draws rectangles of different widths with the rect
function, to plot series in such a way.
The function has no return value.
# an occurrence-based example # needed data data(stages) data(corals) # calculate diversites dd <-divDyn(corals, tax="genus", bin="stg") # plot range-through diversities tsplot(stages, xlim=51:94, ylim=c(0,250), boxes="sys") tsbars(x=stages$mid, y=dd$divRT, width=stages$dur, gap=1, col=stages$col)
# an occurrence-based example # needed data data(stages) data(corals) # calculate diversites dd <-divDyn(corals, tax="genus", bin="stg") # plot range-through diversities tsplot(stages, xlim=51:94, ylim=c(0,250), boxes="sys") tsbars(x=stages$mid, y=dd$divRT, width=stages$dur, gap=1, col=stages$col)
This function allows the user to quickly plot a time scale data table
tsplot( tsdat, ylim = c(0, 1), xlim = NULL, prop = 0.05, gap = 0, bottom = "bottom", top = "top", xlab = "Age (Ma)", ylab = "", boxes = NULL, boxes.col = NULL, shading = NULL, shading.col = c("white", "gray80"), plot.args = NULL, boxes.args = NULL, labels = TRUE, labels.args = NULL, lplab = TRUE, rplab = TRUE )
tsplot( tsdat, ylim = c(0, 1), xlim = NULL, prop = 0.05, gap = 0, bottom = "bottom", top = "top", xlab = "Age (Ma)", ylab = "", boxes = NULL, boxes.col = NULL, shading = NULL, shading.col = c("white", "gray80"), plot.args = NULL, boxes.args = NULL, labels = TRUE, labels.args = NULL, lplab = TRUE, rplab = TRUE )
tsdat |
|
ylim |
|
xlim |
|
prop |
|
gap |
|
bottom |
|
top |
|
xlab |
|
ylab |
|
boxes |
|
boxes.col |
|
shading |
|
shading.col |
|
plot.args |
|
boxes.args |
|
labels |
|
labels.args |
|
lplab |
|
rplab |
|
As most analysis use an individually compiled time scale object, in order to ensure compatibility between the analyzed and plotted values, the time scale table used for the analysis could be plotted rather than a standardized table. Two example tables have been included in the package (stages
and tens
) that can serve as templates.
The function has no return value.
data(stages) tsplot(stages, boxes="sys", shading="series") # same with colours tsplot(stages, boxes="sys", shading="series", boxes.col="systemCol") # only the Mesozoic, custom axes tsplot(stages, boxes="system", shading="stage", xlim=52:81, plot.args=list(axes=FALSE, main="Mesozoic")) axis(1, at=seq(250, 75, -25), labels=seq(250, 75, -25)) axis(2) # only the Triassic, use the supplied abbreviations tsplot(stages, boxes="short", shading="stage", xlim=c(250,199), ylab="variable", labels.args=list(cex=1.5, col="blue"), boxes.args=list(col="gray95")) # colourful plot with two levels of hierarchy tsplot(stages, boxes=c("short", "system"), shading="series", boxes.col=c("col", "systemCol"), xlim=c(52:69))
data(stages) tsplot(stages, boxes="sys", shading="series") # same with colours tsplot(stages, boxes="sys", shading="series", boxes.col="systemCol") # only the Mesozoic, custom axes tsplot(stages, boxes="system", shading="stage", xlim=52:81, plot.args=list(axes=FALSE, main="Mesozoic")) axis(1, at=seq(250, 75, -25), labels=seq(250, 75, -25)) axis(2) # only the Triassic, use the supplied abbreviations tsplot(stages, boxes="short", shading="stage", xlim=c(250,199), ylab="variable", labels.args=list(cex=1.5, col="blue"), boxes.args=list(col="gray95")) # colourful plot with two levels of hierarchy tsplot(stages, boxes=c("short", "system"), shading="series", boxes.col=c("col", "systemCol"), xlim=c(52:69))