A new machine learning approach to seabed biotope classification
Summary
Files for use with the R script accompanying the paper Cooper (2020). Note
that this script also uses files from
https://doi.org/10.14466/CefasDataHub.34
_ (details provided in script).
Cooper, K.M. (2020). A new machine learning approach to seabed biotope
classification. Science Advances.
.. _https://doi.org/10.14466/cefasdatahub.34
:
https://doi.org/10.14466/CefasDataHub.34
Categories
Keywords
N/A
Use limitation statement
There are no public access constraints to this data. Use of this data is subject to the licence identified.
Licence
Open Government LicenceAttribution statement
See Cefas Data Portal for details – link to dataset below.
Technical information
Update frequency
notPlanned
Lineage
Files include: BiotopePredictionScript.R (R script), EUROPE.shp (European Coastline), EuropeLiteScoWal.shp (European Coastline with UK boundaries), DEFRADEMKC8.shp (Seabed bathymetry), C5922DATASETFAM13022017.csv (Training dataset), PARTC16112018.csv (Test dataset), PARTCAGG16112018.csv (Aggregation data). Description of C5922DATASETFAM13022017.csv: This file is based on the RSMP dataset (see https://www.cefas.co.uk/cefas-data-hub/dois/rsmp-baseline-dataset/), but with macrofaunal data output at the level of family or above. A variety of gear types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2 Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and 0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm inshore. Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2 Day grab, 88% were processed using a 1mm sieve. Taxon names were standardised according to the WoRMS (World Register of Marine Species) list using the Taxon Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial 13,449 taxon names, only 774 remained after correction and aggregation to family level. The final dataset comprises of a single sheet comma-separated values (.csv) file. Colonials accounted for less than 20% of the total number of taxa and, where present, were given a value of 1 in the dataset. This component of the fauna was missing from 325 out of the 777 surveys, reflecting either a true absence, or simply that colonial taxa were ignored by the analyst. Sediment particle size data were provided as percentage weight by sieve mesh size, with the dataset including 99 different sieve sizes. Sediment samples have been processed using sieve, and a combination of sieve and laser diffraction techniques. Key metadata fields include: Sample coordinates (Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres) and Water Depth (m). A number of additional explanatory variables are also provided (salinity, temperature, chlorophyll a, Suspended particulate matter, Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total, the dataset dimensions are 33,198 rows (samples) x 900 columns (variables/factors), yielding a matrix of 29,878,200 individual data values.
Spatial information
Coordinate reference system
N/A
Geographic extent
- Latitude from: 52.4581 to 52.4595
- Longitude from: 1.73881 to 1.74086
Metadata information
Language
English
Metadata identifier
1b2c9bdf-a40a-44e1-9919-68ae9dcca74c
Published by
Centre for Environment, Fisheries & Aquaculture Science
Contact publisher
data.manager@cefas.gov.ukDataset reference dates
Creation date
05 July 2019
Revision date
31 May 2023
Publication date
05 July 2019
Period
- From: 30 March 1969
- To: 11 January 2018
Search
Data and Supporting Information
Data services and download by area of interest | Link | Action |
---|---|---|
The Cefas Data Portal contains metadata records and data sets available to download and connect to in support of our commitment to open science. Data is available in the following formats: Binary download. | Open link |