Why YOLOv11 Is the 2026 Default

YOLO (You Only Look Once) has been the practical standard for real-time object detection since 2015. What started as a single-pass detector that traded some accuracy for dramatic speed has become a mature family of models, each generation narrowing the gap with slower two-stage detectors while keeping inference fast enough for live video.

YOLOv11, released by Ultralytics in late 2024, made a specific trade-off that matters for production: it achieves higher mAP than YOLOv8 while using fewer parameters. The nano variant (yolo11n) hits 39.5 mAP@50-95 on COCO with just 2.6M parameters — YOLOv8n needed 3.2M parameters to reach 37.3. That difference matters when you’re running 40 model copies across a Kubernetes cluster or shipping to a microcontroller.

Beyond detection, YOLOv11 handles five tasks from a single unified API: object detection, instance segmentation, pose estimation, image classification, and oriented bounding boxes (OBB) for satellite or aerial imagery. You pick the task; the architecture adapts.

Choosing Your Model Size

Every YOLOv11 variant is a tradeoff between speed and accuracy. Pick based on where you deploy, not on which number looks best in a benchmark table.

N
yolo11n
mAP 39.5
Params 2.6M
FPS ~190
Edge / MCU
S
yolo11s
mAP 47.0
Params 9.4M
FPS ~140
Jetson / CPU
M
yolo11m
mAP 51.5
Params 20.1M
FPS ~90
Recommended
L
yolo11l
mAP 53.4
Params 25.3M
FPS ~65
Server GPU
X
yolo11x
mAP 54.7
Params 56.9M
FPS ~35
Max Accuracy

The medium model is the right starting point for most production projects. It clears 50 mAP — enough for most industrial and commercial tasks — while running comfortably on a single T4 GPU at 90+ frames per second. If your use case involves tiny objects (cell counting, PCB defects, satellite imagery), go one size larger. If you’re targeting a Raspberry Pi or an MCU, start with nano.

Practical Rule

Start with yolo11m.pt and benchmark against your actual deployment hardware before choosing a different size. Most teams go too small chasing speed and then wonder why accuracy is poor, or go too large and hit latency walls in production. Profile first, optimize second.

Installation and Environment Setup

Ultralytics packages everything you need in a single pip install. You don’t need to clone a repository or manage configuration files manually.

bash terminal
# Python 3.10+ required. CUDA 12+ recommended for GPU training. pip install ultralytics # Verify installation — prints version and GPU status python -c “from ultralytics import YOLO; print(YOLO(‘yolo11n.pt’).info())” # Optional: for FastAPI deployment endpoint later in this guide pip install fastapi uvicorn python-multipart

The first time you load a model, Ultralytics downloads the weights automatically from its GitHub releases. If you’re in an air-gapped environment, download yolo11m.pt manually and pass the local file path instead of the size string.

Terminal output showing Ultralytics YOLOv11 model summary: 339 layers, 20.1M parameters, 68.5 GFLOPs, and GPU memory usage during inference
Fig. 1 — yolo11m.pt model summary output showing 339 layers, 20.1M parameters, and 68.5 GFLOPs. The model loads in under two seconds on a modern GPU.

Running Inference on Images and Video

The Ultralytics API is intentionally minimal. A single method call handles images, video files, URLs, numpy arrays, and PIL Images. You don’t need to write preprocessing code for basic inference.

Single Image Inference

python infer_image.py
from ultralytics import YOLO model = YOLO(“yolo11m.pt”) # downloads ~40MB on first run results = model(“path/to/image.jpg”, conf=0.4, iou=0.45) for r in results: print(f”Detected {len(r.boxes)} objects”) # Iterate boxes for box in r.boxes: cls = r.names[int(box.cls)] conf = float(box.conf) x1, y1, x2, y2 = [int(v) for v in box.xyxy[0]] print(f” {cls:15s} conf={conf:.2f} bbox=[{x1},{y1},{x2},{y2}]”) r.save(“result.jpg”) # writes annotated image r.show() # opens preview window

Live Webcam Inference

python webcam.py
from ultralytics import YOLO import cv2 model = YOLO(“yolo11n.pt”) # nano for real-time webcam cap = cv2.VideoCapture(0) while cap.isOpened(): ret, frame = cap.read() if not ret: break # stream=True returns a generator — more memory-efficient for video for r in model(frame, stream=True, verbose=False): annotated = r.plot() # draws boxes + labels cv2.imshow(“YOLOv11”, annotated) if cv2.waitKey(1) & 0xFF == ord(“q”): break cap.release() cv2.destroyAllWindows()

Pass stream=True for any video source. Without it, Ultralytics accumulates all results in memory before returning — fine for a single image, a bottleneck for a two-hour security recording.

Training on Your Own Data

Pretrained COCO weights cover 80 classes well, but any specialized domain — medical imaging, retail shelf analysis, agricultural defect detection — needs custom training. The process has four stages.

1

Collect and Annotate Images

500–1,000 labeled images per class is enough to fine-tune meaningfully from COCO weights. Use Roboflow, Label Studio, or CVAT for annotation. Export in YOLO format (one .txt file per image, each line: class cx cy w h in normalized coordinates).

2

Organize the Dataset Directory

Ultralytics expects a specific folder structure. Images and labels live in parallel directories, split into train/, val/, and optionally test/. A YAML file ties it together.

3

Write the data.yaml Config

This file tells the trainer where your data lives and what your classes are named. Get this file right before touching any Python code — a misconfigured YAML is the most common first-time training error.

4

Run Training and Monitor

Call model.train() with your config. Ultralytics saves checkpoints every epoch and automatically runs validation at the end. Check runs/train/ for results, confusion matrices, and PR curves.

Dataset Directory Structure

text folder structure
my_dataset/ ├── images/ │ ├── train/ # your training images (.jpg / .png) │ ├── val/ # validation images (~15% of total) │ └── test/ # held-out test images (optional) └── labels/ ├── train/ # one .txt per image, same filename ├── val/ └── test/

data.yaml Config File

yaml data.yaml
# Absolute path to your dataset root path: /home/user/my_dataset train: images/train val: images/val test: images/test # optional # Number of classes and their names (order matters — must match label files) nc: 3 names: 0: car 1: person 2: bicycle

Training Script

python train.py
from ultralytics import YOLO # Start from COCO pretrained weights for faster convergence model = YOLO(“yolo11m.pt”) results = model.train( data=“data.yaml”, epochs=100, imgsz=640, # input resolution — 640 is standard batch=16, # reduce to 8 if GPU OOM device=“cuda”, # “cpu” for CPU-only machines patience=20, # early stopping: stop if no improvement for 20 epochs lr0=0.01, # initial learning rate lrf=0.01, # final lr = lr0 * lrf weight_decay=0.0005, augment=True, # mosaic, flips, HSV jitter enabled by default project=“runs/train”, name=“my_detector”, exist_ok=True, ) print(f”Best mAP@50: {results.results_dict[‘metrics/mAP50(B)’]:.4f}”) print(f”Best weights saved to: {results.save_dir}/weights/best.pt”)

Training 100 epochs on a T4 GPU with a 3,000-image dataset takes roughly 45–75 minutes. Ultralytics saves best.pt (highest validation mAP) and last.pt (final epoch) automatically. Always deploy best.pt, not last.pt.

“The difference between a model that reaches 85% mAP and one that stalls at 60% is almost always in the data, not the architecture. More diverse images, better labels, and correct augmentation configuration outperform any hyperparameter change.”
— Consistent finding across dozens of custom YOLOv11 fine-tuning projects

Validating Your Trained Model

Never trust training loss curves alone. Run a proper validation pass against your held-out test set and read each metric carefully before calling a model production-ready.

python validate.py
from ultralytics import YOLO model = YOLO(“runs/train/my_detector/weights/best.pt”) metrics = model.val( data=“data.yaml”, split=“test”, # use the held-out test split conf=0.4, iou=0.45, verbose=True, ) print(f”mAP@50: {metrics.box.map50:.4f}”) print(f”mAP@50-95: {metrics.box.map:.4f}”) print(f”Precision: {metrics.box.mp:.4f}”) print(f”Recall: {metrics.box.mr:.4f}”) # Per-class breakdown for i, name in model.names.items(): print(f” {name}: mAP50={metrics.box.maps[i]:.4f}”)

What to look for in the metrics:

mAP@50
≥0.80

Mean Average Precision at 50% IoU overlap. The headline metric. Below 0.70 for a production use case usually means more training data is needed.

mAP@50-95
≥0.55

Averaged across IoU thresholds 0.50–0.95. Penalizes loose bounding boxes. Important when precise localization matters (robotics, medical).

Precision
≥0.85

Of all predicted boxes, how many were correct. Low precision means false alarms — the model fires on things that aren’t there.

Recall
≥0.80

Of all real objects, how many were found. Low recall means missed detections — objects that were there but the model didn’t report.

Precision and recall sit in tension: lowering your confidence threshold finds more objects (better recall) but also more false positives (worse precision). Set the threshold for your use case — a security system tolerates false alarms better than a medical device does.

Exporting Your Model for Deployment

The .pt PyTorch weights file requires PyTorch at runtime. For production deployments, export to a format that removes that dependency and runs faster in inference-only mode.

python export.py
from ultralytics import YOLO model = YOLO(“runs/train/my_detector/weights/best.pt”) # ONNX — runs anywhere: servers, edge, mobile, browsers via onnxruntime model.export( format=“onnx”, imgsz=640, dynamic=True, # variable batch size at runtime simplify=True, # graph simplification for smaller file opset=17, ) # Creates: runs/train/my_detector/weights/best.onnx # TensorRT — NVIDIA GPUs only, highest throughput, FP16 quantization model.export( format=“engine”, imgsz=640, half=True, # FP16 halves memory, adds ~10–20% speed workspace=4, # GB of GPU workspace for TRT optimization ) # Creates: runs/train/my_detector/weights/best.engine # CoreML — Apple Silicon and iOS deployment model.export(format=“coreml”, imgsz=640, nms=True) # TFLite — Android and microcontrollers model.export(format=“tflite”, imgsz=640, int8=True)
Comparison chart showing inference speed improvements: YOLOv11m in PyTorch at 90 FPS, ONNX at 115 FPS, TensorRT FP16 at 210 FPS on a T4 GPU
Fig. 2 — Speed comparison of yolo11m across export formats on an NVIDIA T4 GPU. TensorRT FP16 delivers 2.3× the throughput of native PyTorch at no accuracy cost under normal conditions.

Deploying a Detection API with FastAPI

For most web and microservice deployments, wrapping your model in a FastAPI endpoint is the fastest path to production. The endpoint accepts an uploaded image file and returns JSON detections.

python api.py
import numpy as np from fastapi import FastAPI, File, UploadFile, Query from fastapi.responses import JSONResponse from PIL import Image from io import BytesIO from ultralytics import YOLO app = FastAPI(title=“YOLOv11 Detection API”, version=“1.0”) model = YOLO(“best.pt”) # loaded once at startup, reused per request @app.get(“/”) async def health(): return {“status”: “ok”, “model”: “yolo11m”} @app.post(“/detect”) async def detect( file: UploadFile = File(…), conf: float = Query(0.4, ge=0.1, le=1.0), iou: float = Query(0.45, ge=0.1, le=1.0), ): img_bytes = await file.read() img = Image.open(BytesIO(img_bytes)).convert(“RGB”) img_array = np.array(img) results = model(img_array, conf=conf, iou=iou, verbose=False) detections = [] for r in results: for box in r.boxes: detections.append({ “class”: r.names[int(box.cls)], “confidence”: round(float(box.conf), 4), “bbox”: [int(v) for v in box.xyxy[0].tolist()], }) return JSONResponse({ “detections”: detections, “count”: len(detections), “image_size”: [img.width, img.height], })
bash run the server
# Start the API server on port 8000 uvicorn api:app –host 0.0.0.0 –port 8000 # Test with curl curl -X POST http://localhost:8000/detect \ -F “file=@test_image.jpg” \ -G –data-urlencode “conf=0.4”

One important production detail: the YOLO("best.pt") call happens at module load time, not inside the endpoint function. This means the model initializes once when the server starts and is reused for every request. If you load the model inside the endpoint, every request pays a 2–5 second initialization penalty — an easy mistake that kills throughput.

Deployment Options by Use Case

REST API

FastAPI + Uvicorn

The setup above. Containerize with Docker and deploy to any cloud. Best for 10–500 requests per second with a GPU-backed server.

Edge

ONNX Runtime

Export to ONNX and run via onnxruntime on Raspberry Pi, Jetson Nano, or industrial PLCs without a GPU. The nano model hits real-time on a Jetson Orin.

Cloud GPU

TensorRT Engine

Export to TensorRT for NVIDIA server GPUs. FP16 mode doubles throughput over ONNX at no meaningful accuracy loss. Requires CUDA-capable hardware at inference time.

Mobile

CoreML / TFLite

CoreML for iOS and Apple Silicon. TFLite with INT8 quantization for Android. The nano model runs at 25–40 FPS on a mid-range smartphone after quantization.

YOLOv11 vs Previous Versions

If you’re migrating from an older YOLO version, here’s where YOLOv11 stands relative to its predecessors and a transformer-based competitor:

Model mAP@50-95 (COCO) Params FPS (T4 GPU) Best Fit
YOLOv5n 28.0 1.9M ~230 Legacy edge systems
YOLOv8n 37.3 3.2M ~180 Previous edge standard
YOLOv11n 39.5 2.6M ~190 Current edge standard
YOLOv8s 44.9 11.2M ~130 Balanced (older)
YOLOv11s 47.0 9.4M ~140 Balanced (current)
YOLOv11m 51.5 20.1M ~90 Production default
YOLOv11l 53.4 25.3M ~65 High-accuracy server
YOLOv11x 54.7 56.9M ~35 Maximum accuracy
RT-DETR-L 53.0 32.0M ~55 Transformer alternative

The pattern is consistent: every YOLOv11 variant achieves higher mAP than its YOLOv8 counterpart while using fewer parameters. If you’re running YOLOv8 in production today, switching to the equivalent YOLOv11 size is a free accuracy upgrade with no infrastructure change required.

Common Training Problems and Fixes

Three issues trip up most first-time YOLOv11 users:

  • mAP plateaus early and stops improving: Check your val split for label noise first. A contaminated validation set makes the model look like it stopped learning when it’s actually still improving against your test set. Also try reducing lr0 to 0.001 for small datasets under 1,000 images.
  • CUDA out of memory during training: Halve your batch size before reducing image size. Batch 8 at 640px trains better than batch 16 at 416px. If you’re still OOM, add half=True to enable FP16 training.
  • Lots of false positives at low confidence: Your negative examples (images with no labeled objects) may be missing from the dataset. YOLO learns from what’s absent as much as what’s present. Include dedicated background images — roughly one negative for every four positive images is a reasonable ratio.
Before You Deploy

Run your best.pt model on at least 50 images from production conditions — not from your curated test split. Real deployment images often have different lighting, compression artifacts, partial occlusions, or aspect ratios that your training data didn’t cover. Catching distribution shift before deployment is an afternoon of work. Catching it after a week of bad predictions in production is considerably more painful.

What YOLOv11 Cannot Do

No model is universal. YOLOv11 struggles in three specific scenarios you should know before committing to it for your use case.

Dense small-object scenes — satellite imagery with hundreds of tiny vehicles, microscopy slides with thousands of cells — require specialized approaches. YOLOv11 was designed for objects that occupy at least 1–2% of the image area. Below that, you need tiling strategies or specialized architectures like SAHI (Slicing Aided Hyper Inference).

Class imbalance above 50:1 is problematic. If you have 10,000 images of class A and 200 of class B, the model will confidently detect A and poorly detect B regardless of what hyperparameters you set. Solve this with data — collect more examples of minority classes or use augmentation pipelines that oversample them.

Finally, YOLO is a detection model, not a recognition model. It tells you there’s a car; it cannot tell you which specific car it is. For identity tasks (face recognition, vehicle re-identification, product SKU lookup), you need a separate embedding or classification head on top of the detector’s output.

Build the Full Pipeline Around Your Detector

YOLOv11 is one component. See how to connect it to OCR, batch processors, and production APIs in our Computer Vision Pipelines guide — built with Claude Code.