Memray: Python Memory Profiler

Overview

Memray is a powerful memory profiler for Python applications developed by Bloomberg. It provides detailed insights into memory allocation patterns, helps identify memory leaks, and offers various visualization tools to analyze memory usage in Python applications.

Key Features

Comprehensive Memory Tracking: Tracks all memory allocations in Python applications
Native Code Support: Profiles both Python and C/C++ extensions
Multiple Output Formats: Flame graphs, tables, and HTML reports
Live Monitoring: Real-time memory usage monitoring
Pytest Integration: Built-in pytest plugin for test memory profiling
Low Overhead: Minimal performance impact during profiling
Cross-Platform: Works on Linux, macOS, and Windows

Installation

Basic Installation

# Install via pip
pip install memray

# Install with optional dependencies for better visualization
pip install memray[full]

Development Installation

# Clone the repository
git clone https://github.com/bloomberg/memray.git
cd memray

# Install in development mode
pip install -e .

Basic Usage

1. Command Line Profiling

Simple Script Profiling

# Profile a Python script
memray run my_script.py

# Profile with output file
memray run -o output.bin my_script.py

# Profile a Python module
memray run -m my_module

Advanced Options

# Profile with native code tracking
memray run --native my_script.py

# Profile with live monitoring
memray run --live my_script.py

# Profile with specific output format
memray run --output-format=flamegraph my_script.py

2. Programmatic API Usage

Basic Tracking

import memray

# Track memory allocations in a code block
with memray.Tracker("output.bin"):
    # Your code here
    data = [i for i in range(1000000)]
    result = process_data(data)

Advanced API Usage

import memray
import threading

# Track specific functions
@memray.track_function
def memory_intensive_function():
    large_list = [i**2 for i in range(100000)]
    return sum(large_list)

# Track memory in threads
def worker_thread():
    with memray.Tracker("thread_output.bin"):
        # Thread-specific memory tracking
        thread_data = process_thread_data()

thread = threading.Thread(target=worker_thread)
thread.start()

Output Formats and Visualization

1. Flame Graph Reports

# Generate flame graph
memray flamegraph output.bin

# Generate flame graph with specific options
memray flamegraph --output=flamegraph.html --temporal output.bin

2. Table Reports

# Generate table report
memray table output.bin

# Generate table with specific sorting
memray table --sort-by=allocations output.bin

3. HTML Reports

# Generate comprehensive HTML report
memray html output.bin

# Generate HTML report with specific options
memray html --output=report.html --temporal output.bin

4. Live Monitoring

# Run with live terminal interface
memray run --live my_script.py

Pytest Integration

Installation

# Install pytest-memray plugin
pip install pytest-memray

Basic Usage

# Run tests with memory profiling
pytest --memray tests/

# Run specific test with memory profiling
pytest --memray tests/test_memory_intensive.py::test_large_allocation

Memory Limit Testing

import pytest

# Test with memory limit
@pytest.mark.limit_memory("24 MB")
def test_memory_efficient_function():
    # This test will fail if it allocates more than 24 MB
    data = [i for i in range(100000)]
    result = process_data(data)
    assert result is not None

# Test with memory limit and specific function
@pytest.mark.limit_memory("10 MB")
def test_small_allocation():
    small_data = [i for i in range(1000)]
    return small_data

Pytest Configuration

# pytest.ini
[tool:pytest]
addopts = --memray
memray = true
memray-bin-path = ./memray_output

Advanced Features

1. Native Code Profiling

# Profile Python + C/C++ extensions
memray run --native my_script.py

# This is especially useful for:
# - NumPy operations
# - Pandas data processing
# - Custom C extensions
# - Scientific computing libraries

2. Temporal Analysis

# Generate temporal flame graph
memray flamegraph --temporal output.bin

# Generate temporal HTML report
memray html --temporal output.bin

3. Memory Leak Detection

import memray
import gc

# Track memory allocations over time
with memray.Tracker("leak_detection.bin"):
    for i in range(100):
        # Simulate potential memory leak
        data = [j for j in range(10000)]
        process_data(data)
        # Note: not calling del data or gc.collect()

4. Custom Reporters

import memray
from memray.reporters import TableReporter

# Create custom reporter
class CustomReporter(TableReporter):
    def generate_report(self, output_file):
        # Custom report generation logic
        pass

# Use custom reporter
reporter = CustomReporter()
reporter.generate_report("output.bin")

Real-World Examples

1. Web Application Profiling

# app.py
from flask import Flask, request, jsonify
import memray

app = Flask(__name__)

@app.route('/process', methods=['POST'])
def process_data():
    with memray.Tracker("web_request.bin"):
        data = request.get_json()
        result = heavy_computation(data)
        return jsonify(result)

def heavy_computation(data):
    # Memory-intensive operation
    large_list = [i**2 for i in range(1000000)]
    processed = [x * 2 for x in large_list]
    return sum(processed)

if __name__ == '__main__':
    app.run(debug=True)

2. Data Processing Pipeline

import pandas as pd
import numpy as np
import memray

def process_large_dataset():
    with memray.Tracker("data_processing.bin"):
        # Load large dataset
        df = pd.read_csv('large_dataset.csv')

        # Memory-intensive operations
        df['new_column'] = df['column1'] * df['column2']
        df = df.groupby('category').agg({'new_column': 'sum'})

        # NumPy operations
        array = df.values
        result = np.dot(array, array.T)

        return result

# Profile the entire pipeline
process_large_dataset()

3. Machine Learning Model Training

import memray
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import numpy as np

def train_model():
    with memray.Tracker("ml_training.bin"):
        # Generate large dataset
        X, y = make_classification(n_samples=100000, n_features=100, random_state=42)

        # Train model
        model = RandomForestClassifier(n_estimators=100, random_state=42)
        model.fit(X, y)

        # Make predictions
        predictions = model.predict(X)

        return model, predictions

# Profile model training
model, predictions = train_model()

Performance Optimization

1. Memory-Efficient Code Patterns

import memray

# Bad: Creating large intermediate lists
def inefficient_function():
    with memray.Tracker("inefficient.bin"):
        data = [i for i in range(1000000)]
        squared = [x**2 for x in data]
        filtered = [x for x in squared if x % 2 == 0]
        return sum(filtered)

# Good: Using generators and avoiding intermediate lists
def efficient_function():
    with memray.Tracker("efficient.bin"):
        data = (i for i in range(1000000))
        squared = (x**2 for x in data)
        filtered = (x for x in squared if x % 2 == 0)
        return sum(filtered)

2. Memory Pool Usage

import memray
from functools import lru_cache

# Use caching to avoid recomputation
@lru_cache(maxsize=128)
def expensive_computation(n):
    with memray.Tracker("cached_computation.bin"):
        return sum(i**2 for i in range(n))

# Use object pooling for frequently created objects
class ObjectPool:
    def __init__(self, factory, max_size=100):
        self.factory = factory
        self.pool = []
        self.max_size = max_size

    def get(self):
        if self.pool:
            return self.pool.pop()
        return self.factory()

    def put(self, obj):
        if len(self.pool) < self.max_size:
            self.pool.append(obj)

Integration with CI/CD

1. GitHub Actions Integration

# .github/workflows/memory-profiling.yml
name: Memory Profiling

on: [push, pull_request]

jobs:
  memory-profiling:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: "3.9"

      - name: Install dependencies
        run: |
          pip install memray pytest-memray
          pip install -r requirements.txt

      - name: Run memory profiling tests
        run: |
          pytest --memray tests/

      - name: Upload memory reports
        uses: actions/upload-artifact@v2
        with:
          name: memory-reports
          path: memray_output/

2. Docker Integration

# Dockerfile
FROM python:3.9-slim

# Install memray
RUN pip install memray

# Copy application code
COPY . /app
WORKDIR /app

# Install application dependencies
RUN pip install -r requirements.txt

# Run with memory profiling
CMD ["memray", "run", "--output", "/app/memray_output.bin", "app.py"]

Troubleshooting

1. Common Issues

Permission Errors

# Fix permission issues
sudo chmod +x /usr/local/bin/memray

# Or install with user flag
pip install --user memray

Large Output Files

# Compress output files
gzip output.bin

# Use temporal analysis to reduce file size
memray run --temporal my_script.py

Memory Overhead

# Use sampling mode for large applications
memray run --sample-rate=0.1 my_script.py

# Profile specific functions only
memray run --filter="my_module.*" my_script.py

2. Performance Tips

Minimize Overhead

# Profile only critical sections
import memray

def main():
    # Don't profile initialization
    setup_application()

    # Profile only the main logic
    with memray.Tracker("critical_section.bin"):
        main_application_logic()

    # Don't profile cleanup
    cleanup_application()

Use Appropriate Output Formats

# For quick analysis, use table format
memray table output.bin

# For detailed analysis, use HTML format
memray html output.bin

# For interactive analysis, use live mode
memray run --live my_script.py

Best Practices

1. Profiling Strategy

Start with table reports for quick overview
Use flame graphs for detailed analysis
Profile in production-like environments
Compare before/after optimizations

2. Memory Optimization

Identify hotspots in flame graphs
Look for memory leaks in temporal analysis
Optimize allocation patterns
Use appropriate data structures

3. Testing Integration

Set memory limits for critical functions
Profile test suites regularly
Monitor memory usage in CI/CD
Document memory requirements

4. Production Monitoring

Profile representative workloads
Monitor memory trends over time
Set up alerts for memory anomalies
Regular performance reviews

Comparison with Other Tools

Memray vs. Other Python Profilers

Feature	Memray	memory_profiler	py-spy	pympler
Memory Tracking	✅	✅	❌	✅
Native Code	✅	❌	✅	❌
Live Monitoring	✅	❌	✅	❌
Flame Graphs	✅	❌	✅	❌
Pytest Integration	✅	❌	❌	❌
Low Overhead	✅	❌	✅	✅

Resources and References

Official Repository: https://github.com/bloomberg/memray
Documentation: https://bloomberg.github.io/memray/
PyPI Package: https://pypi.org/project/memray/
Examples: https://github.com/bloomberg/memray/tree/main/examples

Conclusion

Memray is a powerful and comprehensive memory profiler for Python applications. Its ability to track both Python and native code, provide multiple visualization formats, and integrate with testing frameworks makes it an essential tool for Python developers working on memory-intensive applications.

Key advantages:

Comprehensive tracking of memory allocations
Native code support for C/C++ extensions
Multiple output formats for different analysis needs
Low overhead during profiling
Excellent integration with pytest and CI/CD
Active development and community support

Whether you're debugging memory leaks, optimizing performance, or ensuring memory efficiency in your Python applications, Memray provides the tools and insights needed to understand and improve memory usage patterns.

Overview​

Key Features​

Installation​

Basic Installation​

Development Installation​

Basic Usage​

1. Command Line Profiling​

Simple Script Profiling​

Advanced Options​

2. Programmatic API Usage​

Basic Tracking​

Advanced API Usage​

Output Formats and Visualization​

1. Flame Graph Reports​

2. Table Reports​

3. HTML Reports​

4. Live Monitoring​

Pytest Integration​

Installation​

Basic Usage​

Memory Limit Testing​

Pytest Configuration​

Advanced Features​

1. Native Code Profiling​

2. Temporal Analysis​

3. Memory Leak Detection​

4. Custom Reporters​

Real-World Examples​

1. Web Application Profiling​

2. Data Processing Pipeline​

3. Machine Learning Model Training​

Performance Optimization​

1. Memory-Efficient Code Patterns​

2. Memory Pool Usage​

Integration with CI/CD​

1. GitHub Actions Integration​

2. Docker Integration​

Troubleshooting​

1. Common Issues​

Permission Errors​

Large Output Files​

Memory Overhead​

2. Performance Tips​

Minimize Overhead​

Use Appropriate Output Formats​

Best Practices​

1. Profiling Strategy​

2. Memory Optimization​

3. Testing Integration​

4. Production Monitoring​

Comparison with Other Tools​

Memray vs. Other Python Profilers​

Resources and References​

Conclusion​

Overview

Key Features

Installation

Basic Installation

Development Installation

Basic Usage

1. Command Line Profiling

Simple Script Profiling

Advanced Options

2. Programmatic API Usage

Basic Tracking

Advanced API Usage

Output Formats and Visualization

1. Flame Graph Reports

2. Table Reports

3. HTML Reports

4. Live Monitoring

Pytest Integration

Installation

Basic Usage

Memory Limit Testing

Pytest Configuration

Advanced Features

1. Native Code Profiling

2. Temporal Analysis

3. Memory Leak Detection

4. Custom Reporters

Real-World Examples

1. Web Application Profiling

2. Data Processing Pipeline

3. Machine Learning Model Training

Performance Optimization

1. Memory-Efficient Code Patterns

2. Memory Pool Usage

Integration with CI/CD

1. GitHub Actions Integration

2. Docker Integration

Troubleshooting

1. Common Issues

Permission Errors

Large Output Files

Memory Overhead

2. Performance Tips

Minimize Overhead

Use Appropriate Output Formats

Best Practices

1. Profiling Strategy

2. Memory Optimization

3. Testing Integration

4. Production Monitoring

Comparison with Other Tools

Memray vs. Other Python Profilers

Resources and References

Conclusion