Pydantic 101: Data Validation and Parsing Made Easy
python
pydantic
data-validation
type-hints
parsing
settings-management

Pydantic 101: Data Validation and Parsing Made Easy

Learn the basics of Pydantic, a powerful Python library for data validation, parsing, and settings management. Covers models, types, validation, and more.

March 22, 2025
3 minutes

Pydantic 101: Master Data Validation in Python ✨

In the world of Python development, dealing with data from various sources is a common task. This data might come from user input, API calls, configuration files, or databases. Ensuring this data is in the correct format and meets specific requirements is crucial. Enter Pydantic! ✅

Introduction: The Problem with Untyped Data

Traditional Python is dynamically typed, meaning you don't have to explicitly declare variable types. While this offers flexibility, it can lead to runtime errors if data doesn't match expected types. Consider this:

1
def process_user_data(name, age, email):
2
# ... do something with the data ...
3
print(f"User: {name}, Age: {age}, Email: {email}")
4
5
# This *might* work, but what if:
6
process_user_data(123, "John", "invalid-email") # Numbers and strings swapped, invalid email format

Without validation, the process_user_data function might crash, produce incorrect results, or even create security vulnerabilities.

What is Pydantic?

Pydantic is a Python library that provides data validation and parsing using Python type hints. It leverages Python's type system (introduced in Python 3.5) to define data structures, called models, and automatically enforces type constraints and validations.

At its core, Pydantic:

  1. Defines Data Schemas: You create models that describe the expected structure and types of your data.
  2. Validates Data: Pydantic automatically validates incoming data against these models.
  3. Parses Data: It converts input data (like dictionaries or JSON) into instances of your models, with type coercion where possible.
  4. Provides Clear Error Messages: If validation fails, Pydantic provides helpful, human-readable error messages indicating what went wrong.
  5. Manages Settings: Pydantic is also excellent for managing application settings, including environment variables.

Basic Usage: Defining a Pydantic Model

Let's create a Pydantic model for our user data:

1
from pydantic import BaseModel, EmailStr, PositiveInt, ValidationError
2
3
4
class User(BaseModel):
5
name: str
6
age: PositiveInt # Ensure age is a positive integer
7
email: EmailStr # Use Pydantic's EmailStr for email validation
8
9
# Valid data
10
user_data = {
11
"name": "Alice",
12
"age": 30,
13
"email": "alice@example.com",
14
}
15
16
user = User(**user_data) # Create a User instance
17
print(user)
18
print(user.name)
19
print(user.age)
20
print(user.email)
21
22
# Invalid data
23
invalid_data = {
24
"name": 123, # Wrong type
25
"age": -5, # Not positive
26
"email": "invalid", # Not a valid email
27
}
28
29
try:
30
invalid_user = User(**invalid_data)
31
except ValidationError as e:
32
print(e) # Print detailed validation errors
33
user_model.py
1
from pydantic import BaseModel, EmailStr, PositiveInt, ValidationError
2
3
class User(BaseModel):
4
name: str
5
age: PositiveInt # Ensure age is a positive integer
6
email: EmailStr # Use Pydantic's EmailStr for email validation
7
8
# Valid data
9
user_data = {
10
"name": "Alice",
11
"age": 30,
12
"email": "alice@example.com",
13
}
14
15
user = User(**user_data) # Create a User instance
16
print(user)
17
print(user.name)
18
print(user.age)
19
print(user.email)
20
21
# Invalid data
22
invalid_data = {
23
"name": 123, # Wrong type
24
"age": -5, # Not positive
25
"email": "invalid", # Not a valid email
26
}
27
28
try:
29
invalid_user = User(**invalid_data)
30
except ValidationError as e:
31
print(e) # Print detailed validation errors
32

Explanation:

  • class User(BaseModel):: We define a Pydantic model by inheriting from BaseModel.
  • name: str, age: PositiveInt, email: EmailStr: We use type hints to specify the expected data types. PositiveInt and EmailStr are special Pydantic types that provide built-in validation.
  • User(**user_data): We create an instance of the User model by passing a dictionary. Pydantic automatically validates the data.
  • ValidationError: If the data is invalid, Pydantic raises a ValidationError exception, which contains detailed information about the errors. The output of print(e) from the except block is very informative.
  • Accessing Fields: After successful instantiation, accessing fields behaves like regular class attributes (user.name, user.age, etc.).

Common Pydantic Field Types and Options

Pydantic supports a wide range of built-in field types and options:

  • Basic Types: str, int, float, bool, datetime, date, time, list, dict, set, etc.
  • Pydantic-Specific Types:
    • EmailStr: Validates email addresses.
    • HttpUrl: Validates URLs.
    • PositiveInt, NegativeInt, PositiveFloat, NegativeFloat: For numeric constraints.
    • SecretStr, SecretBytes: For sensitive data (hides values in representations).
    • FilePath, DirectoryPath: Validate paths exists.
  • Field Customization:
    • default: Provides a default value if a field is missing.
    • alias: Allows you to use a different name for the field in the input data (useful for mapping keys from external sources).
    • gt, ge, lt, le: Greater than, greater than or equal to, less than, less than or equal to (for numeric fields).
    • min_length, max_length: For strings.
1
from pydantic import BaseModel, Field, HttpUrl
2
from datetime import datetime
3
from typing import Optional
4
5
6
class Product(BaseModel):
7
name: str = Field(..., min_length=3, max_length=50) # Required, string length constraints
8
description: Optional[str] = None # Optional field, defaults to None
9
price: float = Field(..., gt=0) # Required, must be greater than 0
10
created_at: datetime = Field(default_factory=datetime.now) # Defaults to current time
11
website: HttpUrl = Field(alias="product_url") # Use "product_url" as the input key, but store as "website"
12
13
14
product_data = {
15
"name": "Awesome Widget",
16
"price": 19.99,
17
"product_url": "https://example.com/widget",
18
}
19
20
product = Product(**product_data)
21
print(product)
22
print(product.created_at)
23

Nested Models

You can nest Pydantic models to represent complex data structures:

1
from pydantic import BaseModel
2
from typing import List
3
4
5
class Address(BaseModel):
6
street: str
7
city: str
8
zip_code: str
9
10
11
class UserWithAddress(BaseModel):
12
name: str
13
age: int
14
address: Address # Nested Address model
15
friends: List['UserWithAddress'] = [] # List of self-referencing models
16
17
18
user_data = {
19
"name": "Bob",
20
"age": 40,
21
"address": {
22
"street": "123 Main St",
23
"city": "Anytown",
24
"zip_code": "12345",
25
},
26
"friends": []
27
}
28
29
user = UserWithAddress(**user_data)
30
print(user)
31
print(user.address.city) # Access nested fields
32

Handling Optional and Default Values

Pydantic makes it easy to deal with optional fields and default values:

1
from pydantic import BaseModel
2
from typing import Optional
3
4
class Item(BaseModel):
5
name: str
6
description: Optional[str] = None # Optional field, can be None
7
price: float = 0.0 #has default value
8
9
# Missing 'description' is okay
10
item1 = Item(name="Book")
11
print(item1) # description will be None
12
13
# Providing a value overrides the default
14
item2 = Item(name="Pen", description="A nice pen", price=2.5)
15
print(item2)

Settings Management

Pydantic excels at managing application settings, especially when combined with environment variables. This is typically done using pydantic.BaseSettings:

1
from pydantic import BaseSettings
2
import os
3
4
class Settings(BaseSettings):
5
app_name: str = "My Awesome App"
6
debug_mode: bool = False
7
api_key: str = Field(..., env="MY_API_KEY") # Get API key from env variable
8
9
class Config:
10
env_file = ".env" # load setting from .env file
11
12
# Load settings from environment variables and .env file
13
settings = Settings()
14
15
print(settings)
16
print(f"API Key: {settings.api_key}")
17

Key Points:

  • BaseSettings: Inherit from BaseSettings for settings management.
  • Environment Variables: Pydantic can automatically load values from environment variables. Field(..., env="MY_API_KEY") tries to get the value from the MY_API_KEY environment variable.
  • .env Files: The Config inner class and env_file = ".env" tell Pydantic to load settings from a .env file in your project directory (you'll typically use a library like python-dotenv to work with .env files). If an environment variable is set both in the environment and in the .env file, the environment variable takes precedence.

Using Pydantic with FastAPI

Pydantic is tightly integrated with FastAPI, a modern web framework for building APIs. FastAPI uses Pydantic models for:

  • Request Body Validation: Define the structure of your API requests.
  • Response Body Serialization: Automatically convert your Pydantic models into JSON responses.
  • Automatic Documentation: Generate interactive API documentation (Swagger/OpenAPI) based on your models.

This integration makes building robust and well-documented APIs incredibly easy.

Custom Validators

While Pydantic provides many built-in validators, you can create your own custom validators using the @validator decorator:

1
from pydantic import BaseModel, validator
2
3
class User(BaseModel):
4
name: str
5
age: int
6
7
@validator("age")
8
def age_must_be_positive(cls, value):
9
if value <= 0:
10
raise ValueError("Age must be positive")
11
return value

Conclusion

Pydantic is a powerful and versatile library for data validation, parsing, and settings management in Python. It simplifies data handling, improves code reliability, and integrates seamlessly with other popular tools like FastAPI. By using Pydantic, you can write cleaner, more maintainable, and less error-prone code, making your Python projects more robust and efficient. 🔥

Share
Comments are disabled