Getting Started with PyTorch for Deep Learning
Getting your first deep learning model to actually run, train, and converge can feel more like a rite of passage than a programming task. You install libraries, copy examples from docs, tune a few mysterious parameters, then watch the loss explode into NaNs.
I have been there. The good news is that PyTorch makes the core ideas of deep learning much more approachable than they first appear. Once you understand a handful of concepts (tensors, computation graphs, automatic differentiation, and the training loop) the rest is mostly variations on a theme.
In this guide I will walk through a practical, beginner friendly path to using PyTorch. The focus is on getting you to a working model quickly, not on covering every feature.
1. Why PyTorch for Deep Learning
PyTorch is one of the dominant deep learning frameworks in research and production. Some reasons I recommend it to beginners:
- Pythonic and intuitive - Tensors feel like NumPy arrays with superpowers.
- Dynamic computation graph - You build the model as regular Python code, which makes debugging easier.
- Huge ecosystem - Pretrained models, tutorials, and libraries like Hugging Face Transformers.
- Production ready - PyTorch works well with deployment stacks like FastAPI and Docker.
If your long term goal is to work with modern architectures such as transformers, RAG systems, or multi agent systems, PyTorch is a strong foundation.
2. Setting Up Your Environment
2.1 Installing PyTorch
The safest way to install PyTorch is to use the official selector on the PyTorch website. For a basic CPU install with pip you can usually run:
pip install torch torchvision torchaudio
If you have an NVIDIA GPU, you will want a CUDA enabled build. The PyTorch site gives you the correct command for your OS, Python version, and CUDA version.
2.2 A Minimal Project Structure
Even for quick experiments, I encourage using a small but clean structure. It pays off as your work grows, following the same kind of discipline that matters for ML codebases.
pytorch-intro/
data/
models/
notebooks/
src/
train.py
dataset.py
model.py
requirements.txt
For this article we will keep everything in a single script, but keep this structure in mind for real projects.
3. Tensors: The Core Building Block
Tensors are to PyTorch what arrays are to NumPy. They are multi dimensional arrays with additional features like GPU support and automatic differentiation.
3.1 Creating Tensors
Open a Python shell or Jupyter notebook and try:
import torch
# Basic tensor
x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
print(x)
print(x.shape)
# Random tensor
rand_x = torch.randn(3, 4)
print(rand_x)
A few common creation functions:
zeros = torch.zeros(2, 3)
ones = torch.ones(2, 3)
arange = torch.arange(0, 10, step=2) # 0, 2, 4, 6, 8
3.2 Moving Tensors to GPU
If you have a GPU and installed a CUDA version of PyTorch, you can do:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = torch.randn(3, 3)
x = x.to(device)
print(x.device)
This pattern device = ... and .to(device) appears everywhere. In production code, I almost always centralize device handling to avoid subtle bugs.
4. Automatic Differentiation with autograd
Deep learning is just repeated gradient based optimization. PyTorch's autograd system computes gradients for you.
4.1 Basic Gradient Example
import torch
# Create a tensor with gradient tracking enabled
x = torch.tensor(2.0, requires_grad=True)
# Define a simple function: y = x^2
y = x ** 2
# Compute dy/dx
y.backward()
print(x.grad) # Should be 2 * x = 4.0
Any tensor with requires_grad=True will accumulate gradients when you call .backward() on a scalar result.
This is what allows complex models such as transformers to be trained just as easily as a tiny regression model.
5. Defining a Simple Neural Network
The usual way to define models in PyTorch is to subclass torch.nn.Module.
Let us build a small fully connected network for a toy classification task.
import torch
import torch.nn as nn
class SimpleNN(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super().__init__()
self.fc1 = nn.Linear(input_dim, hidden_dim)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
return x
# Example usage
model = SimpleNN(input_dim=2, hidden_dim=16, output_dim=2)
print(model)
Key points:
__init__defines the layers (parameters).forwarddefines how data flows through the network.- You do not manually wire up gradients. PyTorch tracks everything.
6. A Complete Training Loop from Scratch
Let us walk through a full example: classify points in 2D space into two classes.
6.1 Create a Synthetic Dataset
We will generate a simple dataset using PyTorch itself.
import torch
from torch.utils.data import Dataset, DataLoader
class ToyDataset(Dataset):
def __init__(self, n_samples=1000):
super().__init__()
# Random points in 2D
self.x = torch.randn(n_samples, 2)
# Labels: 1 if x0 + x1 > 0, else 0
self.y = (self.x.sum(dim=1) > 0).long()
def __len__(self):
return len(self.x)
def __getitem__(self, idx):
return self.x[idx], self.y[idx]
train_dataset = ToyDataset(n_samples=5000)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
Dataset and DataLoader are the workhorses of data handling in PyTorch. The same pattern scales from toy examples to real-time inference pipelines.
6.2 Initialize Model, Loss, Optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleNN(input_dim=2, hidden_dim=16, output_dim=2).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
CrossEntropyLossis standard for multi class classification with logits.Adamis a widely used optimizer. Others like SGD, AdamW are also common.
6.3 The Training Loop
Here is the core training loop that you will write in some form for almost any PyTorch project.
def train(model, dataloader, criterion, optimizer, device, num_epochs=10):
model.train() # Set model to training mode
for epoch in range(num_epochs):
running_loss = 0.0
correct = 0
total = 0
for inputs, labels in dataloader:
inputs = inputs.to(device)
labels = labels.to(device)
# 1. Zero gradients
optimizer.zero_grad()
# 2. Forward pass
outputs = model(inputs) # shape: [batch_size, num_classes]
# 3. Compute loss
loss = criterion(outputs, labels)
# 4. Backward pass
loss.backward()
# 5. Update parameters
optimizer.step()
# Stats
running_loss += loss.item() * inputs.size(0)
_, predicted = outputs.max(1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
epoch_loss = running_loss / total
epoch_acc = correct / total
print(f"Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Acc: {epoch_acc:.4f}")
train(model, train_loader, criterion, optimizer, device, num_epochs=10)
This structure is nearly identical whether you are training a small MLP or a large transformer. The complexity in real world projects usually comes from data preprocessing, evaluation metrics, and infrastructure.
7. Evaluating and Using the Model
After training, you typically switch the model to evaluation mode and disable gradient tracking.
7.1 Evaluation Mode
def evaluate(model, dataloader, device):
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in dataloader:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model(inputs)
_, predicted = outputs.max(1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
acc = correct / total
print(f"Accuracy: {acc:.4f}")
# Reuse train_dataset for a quick test
test_loader = DataLoader(train_dataset, batch_size=128, shuffle=False)
evaluate(model, test_loader, device)
The torch.no_grad() context is important. It reduces memory usage and speeds up inference as gradients are not computed.
7.2 Making Single Predictions
model.eval()
with torch.no_grad():
sample = torch.tensor([[0.5, -0.2]], device=device)
output = model(sample)
probs = torch.softmax(output, dim=1)
predicted_class = probs.argmax(dim=1).item()
print("Predicted class:", predicted_class)
8. From Toy Models to Real Projects
Once you are comfortable with the basic workflow, extending to more realistic problems is straightforward.
8.1 Image Classification with torchvision
For images, you typically use torchvision datasets and transforms:
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
train_dataset = datasets.MNIST(
root="data",
train=True,
download=True,
transform=transform,
)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
You then define a CNN or reuse a pretrained model, and the training loop looks almost identical.
8.2 NLP and Transformers
For NLP tasks like text classification or question answering, modern workflows often build on pretrained transformer models. These models are usually trained in PyTorch and exposed via frameworks like Hugging Face Transformers.
When you work on applications such as RAG systems or multimodal pipelines that combine vision and language, you will often:
- Load a pretrained transformer encoder.
- Use it to compute embeddings for texts.
- Store those in a vector database for retrieval.
The PyTorch basics you learned here, especially tensors, models, and training loops, carry over directly.
9. Practical Tips I Wish I Knew Earlier
A few pragmatic lessons from building real systems.
9.1 Set Random Seeds
For reproducibility, especially when debugging:
import random
import numpy as np
import torch
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(seed)
9.2 Use Small Batches When Prototyping
If you run into out of memory errors on GPU, lower batch_size. It is often the quickest fix when exploring.
9.3 Log More Than Just Loss
Even in small projects, track metrics like accuracy or F1. For more complex systems like RAG, evaluation is much more nuanced and worth studying separately.
9.4 Think About Privacy Early
If you work with real user data, start thinking about privacy from the beginning. Techniques like differential privacy become relevant quickly once you move beyond toy datasets.
Key Takeaways
- PyTorch centers around tensors, automatic differentiation, and the
nn.Moduleabstraction for models. - A standard training loop consists of forward pass, loss computation, backward pass, and optimizer step inside batches and epochs.
DatasetandDataLoadergive you a clean way to feed data into models, both for toy datasets and real world tasks.- Moving tensors and models to the correct
device(CPU or GPU) is essential for both correctness and performance. - Evaluation mode with
model.eval()andtorch.no_grad()is critical for correct, efficient inference. - The same PyTorch fundamentals scale from simple MLPs to transformers, RAG systems, and production grade ML services.
- Good engineering practices around structure, reproducibility, and privacy matter as soon as you leave toy examples behind.
Related Articles
Building a RAG Chatbot from Scratch with Python
Learn how to build a Retrieval-Augmented Generation (RAG) chatbot from scratch in Python, from data loading to retrieval and LLM integration.
10 min read · beginnerGetting StartedFine-Tuning Open-Source LLMs with LoRA and QLoRA
Learn how to fine-tune open-source LLMs efficiently using LoRA and QLoRA, with practical code, tips, and trade-offs for production systems.
10 min read · intermediateGetting StartedBuilding AI Agents with LangChain and LangGraph
Learn how to build robust AI agents with LangChain and LangGraph, from simple tool calls to multi-step workflows, with practical Python examples.
10 min read · intermediate