A new machine learning approach to seabed biotope classification

Summary

Files for use with the R script accompanying the paper Cooper (2020). Note that this script also uses files from https://doi.org/10.14466/CefasDataHub.34_ (details provided in script). Cooper, K.M. (2020). A new machine learning approach to seabed biotope classification. Science Advances.

.. _https://doi.org/10.14466/cefasdatahub.34: https://doi.org/10.14466/CefasDataHub.34

Keywords

N/A

Use limitation statement

There are no public access constraints to this data. Use of this data is subject to the licence identified.

Licence

Open Government Licence

Attribution statement

See Cefas Data Portal for details – link to dataset below.

Technical Information

Update frequency: notPlanned
Lineage: Files include: BiotopePredictionScript.R (R script), EUROPE.shp (European Coastline), EuropeLiteScoWal.shp (European Coastline with UK boundaries), DEFRADEMKC8.shp (Seabed bathymetry), C5922DATASETFAM13022017.csv (Training dataset), PARTC16112018.csv (Test dataset), PARTCAGG16112018.csv (Aggregation data). Description of C5922DATASETFAM13022017.csv: This file is based on the RSMP dataset (see https://www.cefas.co.uk/cefas-data-hub/dois/rsmp-baseline-dataset/), but with macrofaunal data output at the level of family or above. A variety of gear types have been used for sample collection including grabs (0.1m2 Hamon, 0.2m2 Hamon, 0.1m2 Day, 0.1m2 Van Veen and 0.1m2 Smith McIntrye) and cores. Of these various devices, 93% of samples were acquired using either a 0.1m2 Hamon grab or a 0.1m2 Day grab. Sieve sizes used in sample processing include 1mm and 0.5mm, reflecting the conventional preference for 1mm offshore and 0.5mm inshore. Of the samples collected using either a 0.1m2 Hamon grab or a 0.1m2 Day grab, 88% were processed using a 1mm sieve. Taxon names were standardised according to the WoRMS (World Register of Marine Species) list using the Taxon Match Tool (http://www.marinespecies.org/aphia.php?p=match). Of the initial 13,449 taxon names, only 774 remained after correction and aggregation to family level. The final dataset comprises of a single sheet comma-separated values (.csv) file. Colonials accounted for less than 20% of the total number of taxa and, where present, were given a value of 1 in the dataset. This component of the fauna was missing from 325 out of the 777 surveys, reflecting either a true absence, or simply that colonial taxa were ignored by the analyst. Sediment particle size data were provided as percentage weight by sieve mesh size, with the dataset including 99 different sieve sizes. Sediment samples have been processed using sieve, and a combination of sieve and laser diffraction techniques. Key metadata fields include: Sample coordinates (Latitude & Longitude), Survey Name, Gear, Date, Grab Sample Volume (litres) and Water Depth (m). A number of additional explanatory variables are also provided (salinity, temperature, chlorophyll a, Suspended particulate matter, Water depth, Wave Orbital Velocity, Average Current, Bed Stress). In total, the dataset dimensions are 33,198 rows (samples) x 900 columns (variables/factors), yielding a matrix of 29,878,200 individual data values.

Spatial Information

Coordinate reference system

N/A

Geographic extent

Latitude from: 52.4581 to 52.4595
Longitude from: 1.73881 to 1.74086

Metadata Information

Language

English

Metadata identifier

1b2c9bdf-a40a-44e1-9919-68ae9dcca74c

Data and Supporting Information

Data services and download by area of interest	Link	Action
The Cefas Data Portal contains metadata records and data sets available to download and connect to in support of our commitment to open science. Data is available in the following formats: Binary download.		Open link

Change your cookie settings

Change your cookie settings

A new machine learning approach to seabed biotope classification

Summary

Categories

Keywords

Use limitation statement

Licence

Attribution statement

Contact Information

Technical Information

Spatial Information

Coordinate reference system

Geographic extent

Metadata Information

Language

Metadata identifier

Data Protection

Data and Supporting Information

Published by

Contact publisher

Dataset reference dates

Creation date

Revision date

Publication date

Period

Related datasets

Search