Section 01
OceanPile: Introduction to the Large-Scale Multimodal Corpus for Foundation Models in the Marine Domain
OceanPile is a marine domain-specific multimodal dataset built by the OceanGPT team, containing various data types such as text and images. It aims to fill the gap in large-scale training data for the marine domain, provide high-quality marine science corpus support for foundation model training, and lay the data foundation for marine science-specific foundation models.