Jonathan Harris site logo

Deep Learning

Deep learning is the technique behind most of the AI breakthroughs of the past decade — image recognition, speech synthesis, and large language models all run on it. Understanding it does not require a maths degree.

By Jonathan Harris, AI author and host of Turing’s Torch AI Weekly.

What this guide covers

Deep learning is a class of machine learning that uses artificial neural networks with many layers — the depth in the name refers to the number of these layers, not to profundity of insight. Each layer transforms its input into a more abstract representation, so by the final layer the network has learned features that are useful for the task at hand.

The breakthrough insight, established through the 2010s, was that with enough data and enough compute these networks could learn representations from raw, unstructured inputs. You do not need to hand-engineer features. You feed in pixels, audio waveforms, or token sequences and the network figures out what matters. That is fundamentally different from earlier machine learning, which required human experts to define the features before learning could begin.

The major architectures now in wide use include convolutional neural networks (CNNs) for images and video, recurrent networks (RNNs, LSTMs) for sequences, and the transformer architecture that underpins every large language model including GPT-4, Claude, and Gemini.

Where it works well

Deep learning dominates wherever the input is unstructured and the relevant features are not obvious in advance. Computer vision — reading scans, inspecting manufactured parts, monitoring CCTV — is the most commercially deployed example. Speech recognition and natural language processing are others. Any task that humans do well by looking at or listening to something is a reasonable candidate.

Transfer learning compounds the advantage. You can take a model pre-trained on hundreds of millions of examples at enormous cost, and fine-tune it on a small domain-specific dataset for a fraction of the resources. This has made high-quality NLP and vision accessible to organisations that could never have trained a model from scratch.

The performance ceiling is genuinely high. In controlled conditions on well-defined benchmarks, deep learning systems now match or exceed human performance on many vision and language tasks. That ceiling matters when the margin between good and very good is commercially significant.

Where it gets complicated

The black-box problem is real and structural, not cosmetic. Most deep learning models cannot explain why they produced a given output in terms a human would accept as a reason. This is tolerable when errors are cheap — a bad recommendation is annoying but recoverable. It is serious when errors carry legal, medical, or safety consequences.

Data and compute costs are high. Training a frontier model from scratch requires infrastructure that is out of reach for most organisations. Fine-tuning is cheaper, but even that requires careful data curation, GPU access, and engineering skill to do properly.

Deep networks are also brittle to distribution shift in ways that standard ML models sometimes are not. A computer vision model that performs flawlessly on factory images can fail unexpectedly when the lighting changes, the camera is swapped, or the product line is updated. Robustness testing against realistic variation is not optional.

FAQ

Is deep learning better than machine learning?

Not better in general — better for specific problems. If your input is unstructured (images, audio, text) and you have large amounts of data, deep learning usually wins. If your input is structured, tabular, and limited in volume, standard machine learning is often faster, cheaper, and more interpretable.

What is a neural network?

A network of mathematical functions loosely inspired by biological neurons. Each node receives inputs, applies a transformation, and passes the result on. Training adjusts the weights of those transformations so that the network produces correct outputs for known examples.

What is a large language model?

A deep learning model trained on text at very large scale. It learns statistical patterns in language and can generate coherent, contextually appropriate text. GPT-4, Claude, Gemini, and Llama are examples.

Do I need deep learning for my business problem?

Probably not unless your problem involves unstructured data or requires human-level language or vision capability. Most business forecasting, classification, and optimisation problems are solved faster and more reliably with standard machine learning.

Related books

Keep exploring

Related topics