This repository provides a user friendly interface to create a baseline to analyse Spygen eDNA data. It guides users with one function per action to complete in a specific order, all automatically.
To clone the repository from GitHub, open R and go to your Terminal or open directly a Terminal and do:
Required dependencies can be found in the DESCRIPTION file and can be installed and load with the following function :
## Install devtools package ----
install.packages("devtools")
## Install required package ----
devtools::install_deps(upgrade = "never")Otherwise, you can install it as a R package:
## Install devtools package ----
install.packages("devtools")
devtools::install_github("CyrilHaute/setupeDNAbaseline")You can just follow the analyses/ scripts to use the workflow.
The repository is structured as follow:
data/ : contains raw Spygen eDNA and gps data:
eDNA_raw_data/ file contain all eDNA raw data;trace_gps/ file contain gps data.R/ : contains all functions:
analyses/ : contains scripts to load data and run R/ functions:
outputs/ : contains all results:
01_clean_eDNA/ file contain all results from step I;02_eDNA_tracks/ file contain all results from step II.The scripts and functions have been written as much as possible in base R. For instance, we did not used the tidyverse library. However, we used the R native pipe operator |> instead of %>% that requires to load tidyverse.
To use the R native pipe, follow the instructions:
Data can be accessible through marbec-data.
The details of all functions used in the repository can be found here : https://cyrilhaute.github.io/setupeDNAbaseline/reference/index.html
The workflow is separated into two different steps:
This step convert raw Spygen eDNA data into a format suitable for analysis.
convert_to_matrix function convert raw Spygen eDNA data to site X species matrix. Species identified multiple times are summed and summarize into one column.This function requires only one argument, the path to raw Spygen data (in format .xlsx) and return an uncleaned site X species matrix :
spygen_matrix <- convert_to_matrix_function(raw_spygen_path = "my/path/to/raw_eDNA_data.xlsx")| spygen_code | nb | Dicentrarchus labrax | Chromis chromis | A_regius_U_cirrosa | Sciaena umbra |
|---|---|---|---|---|---|
| SPY180624 | nb_rep | 5 | 9 | 0 | 1 |
| SPY180624 | nb_seq | 7238 | 27013 | 0 | 220 |
| SPY181146 | nb_rep | 3 | 11 | 0 | 0 |
| SPY181146 | nb_seq | 3804 | 20232 | 0 | 0 |
As you can see, the function return a dataframe containing species not spelled in the binomial format (e.g., A_regius_U_cirrosa).
species_clean function clean the site X species matrix by removing misnamed species (missing names, identified at the family level or as spp., sp., all species not spelled in the binomial format) and correct species names according to FishBase or WORMS in case FishBase return NA.This function requires the uncleaned site X species matrix obtained with the convert_to_matrix function.
It returns a list containing three objects :
A dataframe in the format site X species with new species names checked from FishBase;
A dataframe in the format site X species with old species names before checking from FishBase;
A character vector listing all removed species.
This allows users to follow the cleaning steps and check which species have been removed and which names have been corrected.
spygen_matrix_clean <- species_clean_function(spygen_matrix = spygen_matrix)Here is the cleaned site X matrix dataframe with new species names (spygen_matrix_clean$spygen_matrix_clean):
| spygen_code | nb | Dicentrarchus labrax | Chromis chromis | Sciaena umbra | Sphyraena viridensis |
|---|---|---|---|---|---|
| SPY180624 | nb_rep | 5 | 9 | 1 | 1 |
| SPY180624 | nb_seq | 7238 | 27013 | 220 | 155 |
| SPY181146 | nb_rep | 3 | 11 | 0 | 2 |
| SPY181146 | nb_seq | 3804 | 20232 | 0 | 152 |
For the next step, save as .csv the dataframe with new species names called spygen_matrix_clean:
write.csv(spygen_matrix_clean$spygen_matrix_clean, file = "my/path/to/outputs/spygen_matrix_clean.csv", row.names = FALSE)spygen_new_data function allows adding new eDNA data to previous one, by checking for duplicate and replace or not with new data if differences are detected.[!IMPORTANT] To work properly, the function need to add the new Spygen data in the order they’ve been sent by Spygen!
This function requires the path of old eDNA data (in format .csv), the path of new eDNA data (in .xlsx format) and the path to save data.
The function save new data at the indicated path.
spygen_new_data_function(old_spygen_data_path = "my/path/to/outputs/spygen_matrix_clean.csv",
new_spygen_data_path = "my/path/to/new/raw_eDNA_data.xlsx",
path_save = "my/path/to/outputs/new_spygen_data.csv")By doing so, this function creates successively new eDNA files, allowing to follow data and the reference database version.
To load data created either with the species_clean or spygen_new_data functions, do:
spygen_subset function is a user friendly function that create subset of eDNA data.This function requires a path of cleaned eDNA data (in format .csv) and a character vector of spygen code or a dataframe containing a column spygen_code.
subset_eDNA <- spygen_subset_function(eDNA_species_data_path = "my/path/to/outputs/spygen_matrix_clean.csv",
spygen_code_subset = c("SPY180624", "SPY181146", "SPY181147"))The function return a subset dataframe of eDNA data including only species present in the subset.
[!CAUTION] This step only convert data to a suitable format for analysis, with only basic cleaning step. This does not exempt users from checking the list of species returned by the functions (e.g., species detected outside their distribution range).
This step associate to each Spygen survey a gps track and convert it to a shapefile.
load_waypoint and spygen_waypoint functions associate to each Spygen survey the closest gps waypoint at the survey date.The load_waypoint function requires only gps data path.
The spygen_waypoint function requires the path of Spygen metadata (containing start and end coordinates), the waypoints data obtained with the load_waypoint function, and a threshold indicating the maximum distance in meters between the coordinates of a Spygen survey and the coordinates of the nearest waypoint.
It returns a list containing three objects :
A dataframe named high distance containing spygen survey with distance to the closest waypoints greater than the distance threshold;
A dataframe named good distance containing spygen survey with distance to the closest waypoints smaller than the distance threshold;
A dataframe named na survey containing spygen survey with no attributed waypoints.
waypoints <- load_waypoint(path = "data/trace_gps")
spygen_waypoint_output <- spygen_waypoint(eDNA_metadata_path = "data/metadata.csv",
waypoints = waypoints,
distance_threshold = 10,
path_save = "path/to/save/data")load_tracks and spygen_tracks functions associate to each waypoint the closest gps track at the survey date.The load_tracks function requires only gps data path.
The spygen_tracks function requires the waypoints data obtained with the spygen_waypoint function (spygen_waypoint_output$good_distance), and a threshold indicating the maximum distance in meters between the coordinates of a Spygen survey and the coordinates of the nearest track.
It returns the same thing as for the spygen_waypoint function, except it’s for tracks.
gps_tracks <- load_tracks(path = "data/trace_gps")
spygen_tracks_output <- spygen_tracks(waypoints = spygen_waypoint_output$good_distance,
gps_tracks = gps_tracks,
distance_threshold = 10,
path_save = "path/to/save/data")shapefile_tracks function convert gps track from point to a polygon as a shapefile.The function requires the tracks coordinates obtained with the spygen_tracks_output function (spygen_tracks_output$tracks_good_distance).
shapefile_tracks(eDNA_tracks = spygen_tracks_output$tracks_good_distance,
path_save = "path/to/save/data")It return a shapefile of Spygen tracks as follow (with green and red dot representing waypoints start and end, respectively):

