The A-Z of AI

Datasets

The information used to teach AI about the world.

Datasets are large collections of digital information that are used to train AI.

They might contain anything from weather data, such as air pressure and temperature, to photos, music, or indeed anything else that helps an AI system with the task it has been assigned.

Datasets are like textbooks for computers.

A pile of apples titled "My Green Apple Dataset" appears to have a stray orange nestled at its center. A little red error symbol, flagging its whereabouts, represents the careful process of refining datasets.

Just as a child learns through examples, the same is true of machines. Datasets are the bedrock of this learning process.

AI design teams have to carefully consider the data they choose to train their AI with, and may build in parameters that help the system make sense of the information it’s given.

Due to their scale and complexity, these collections can be very challenging to build and refine — whether they consist of a few hundred audio samples or extensive maps covering the whole of the known solar system.

For this reason, AI design teams often share datasets for the benefit of the wider scientific community, making it easier to collaborate and build on each other's research.