Skip to main content

Memray: Python Memory Profiler

Overview

Memray is a powerful memory profiler for Python applications developed by Bloomberg. It provides detailed insights into memory allocation patterns, helps identify memory leaks, and offers various visualization tools to analyze memory usage in Python applications.

Key Features

  • Comprehensive Memory Tracking: Tracks all memory allocations in Python applications
  • Native Code Support: Profiles both Python and C/C++ extensions
  • Multiple Output Formats: Flame graphs, tables, and HTML reports
  • Live Monitoring: Real-time memory usage monitoring
  • Pytest Integration: Built-in pytest plugin for test memory profiling
  • Low Overhead: Minimal performance impact during profiling
  • Cross-Platform: Works on Linux, macOS, and Windows

Installation

Basic Installation

# Install via pip
pip install memray

# Install with optional dependencies for better visualization
pip install memray[full]

Development Installation

# Clone the repository
git clone https://github.com/bloomberg/memray.git
cd memray

# Install in development mode
pip install -e .

Basic Usage

1. Command Line Profiling

Simple Script Profiling

# Profile a Python script
memray run my_script.py

# Profile with output file
memray run -o output.bin my_script.py

# Profile a Python module
memray run -m my_module

Advanced Options

# Profile with native code tracking
memray run --native my_script.py

# Profile with live monitoring
memray run --live my_script.py

# Profile with specific output format
memray run --output-format=flamegraph my_script.py

2. Programmatic API Usage

Basic Tracking

import memray

# Track memory allocations in a code block
with memray.Tracker("output.bin"):
# Your code here
data = [i for i in range(1000000)]
result = process_data(data)

Advanced API Usage

import memray
import threading

# Track specific functions
@memray.track_function
def memory_intensive_function():
large_list = [i**2 for i in range(100000)]
return sum(large_list)

# Track memory in threads
def worker_thread():
with memray.Tracker("thread_output.bin"):
# Thread-specific memory tracking
thread_data = process_thread_data()

thread = threading.Thread(target=worker_thread)
thread.start()

Output Formats and Visualization

1. Flame Graph Reports

# Generate flame graph
memray flamegraph output.bin

# Generate flame graph with specific options
memray flamegraph --output=flamegraph.html --temporal output.bin

2. Table Reports

# Generate table report
memray table output.bin

# Generate table with specific sorting
memray table --sort-by=allocations output.bin

3. HTML Reports

# Generate comprehensive HTML report
memray html output.bin

# Generate HTML report with specific options
memray html --output=report.html --temporal output.bin

4. Live Monitoring

# Run with live terminal interface
memray run --live my_script.py

Pytest Integration

Installation

# Install pytest-memray plugin
pip install pytest-memray

Basic Usage

# Run tests with memory profiling
pytest --memray tests/

# Run specific test with memory profiling
pytest --memray tests/test_memory_intensive.py::test_large_allocation

Memory Limit Testing

import pytest

# Test with memory limit
@pytest.mark.limit_memory("24 MB")
def test_memory_efficient_function():
# This test will fail if it allocates more than 24 MB
data = [i for i in range(100000)]
result = process_data(data)
assert result is not None

# Test with memory limit and specific function
@pytest.mark.limit_memory("10 MB")
def test_small_allocation():
small_data = [i for i in range(1000)]
return small_data

Pytest Configuration

# pytest.ini
[tool:pytest]
addopts = --memray
memray = true
memray-bin-path = ./memray_output

Advanced Features

1. Native Code Profiling

# Profile Python + C/C++ extensions
memray run --native my_script.py

# This is especially useful for:
# - NumPy operations
# - Pandas data processing
# - Custom C extensions
# - Scientific computing libraries

2. Temporal Analysis

# Generate temporal flame graph
memray flamegraph --temporal output.bin

# Generate temporal HTML report
memray html --temporal output.bin

3. Memory Leak Detection

import memray
import gc

# Track memory allocations over time
with memray.Tracker("leak_detection.bin"):
for i in range(100):
# Simulate potential memory leak
data = [j for j in range(10000)]
process_data(data)
# Note: not calling del data or gc.collect()

4. Custom Reporters

import memray
from memray.reporters import TableReporter

# Create custom reporter
class CustomReporter(TableReporter):
def generate_report(self, output_file):
# Custom report generation logic
pass

# Use custom reporter
reporter = CustomReporter()
reporter.generate_report("output.bin")

Real-World Examples

1. Web Application Profiling

# app.py
from flask import Flask, request, jsonify
import memray

app = Flask(__name__)

@app.route('/process', methods=['POST'])
def process_data():
with memray.Tracker("web_request.bin"):
data = request.get_json()
result = heavy_computation(data)
return jsonify(result)

def heavy_computation(data):
# Memory-intensive operation
large_list = [i**2 for i in range(1000000)]
processed = [x * 2 for x in large_list]
return sum(processed)

if __name__ == '__main__':
app.run(debug=True)

2. Data Processing Pipeline

import pandas as pd
import numpy as np
import memray

def process_large_dataset():
with memray.Tracker("data_processing.bin"):
# Load large dataset
df = pd.read_csv('large_dataset.csv')

# Memory-intensive operations
df['new_column'] = df['column1'] * df['column2']
df = df.groupby('category').agg({'new_column': 'sum'})

# NumPy operations
array = df.values
result = np.dot(array, array.T)

return result

# Profile the entire pipeline
process_large_dataset()

3. Machine Learning Model Training

import memray
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import numpy as np

def train_model():
with memray.Tracker("ml_training.bin"):
# Generate large dataset
X, y = make_classification(n_samples=100000, n_features=100, random_state=42)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Make predictions
predictions = model.predict(X)

return model, predictions

# Profile model training
model, predictions = train_model()

Performance Optimization

1. Memory-Efficient Code Patterns

import memray

# Bad: Creating large intermediate lists
def inefficient_function():
with memray.Tracker("inefficient.bin"):
data = [i for i in range(1000000)]
squared = [x**2 for x in data]
filtered = [x for x in squared if x % 2 == 0]
return sum(filtered)

# Good: Using generators and avoiding intermediate lists
def efficient_function():
with memray.Tracker("efficient.bin"):
data = (i for i in range(1000000))
squared = (x**2 for x in data)
filtered = (x for x in squared if x % 2 == 0)
return sum(filtered)

2. Memory Pool Usage

import memray
from functools import lru_cache

# Use caching to avoid recomputation
@lru_cache(maxsize=128)
def expensive_computation(n):
with memray.Tracker("cached_computation.bin"):
return sum(i**2 for i in range(n))

# Use object pooling for frequently created objects
class ObjectPool:
def __init__(self, factory, max_size=100):
self.factory = factory
self.pool = []
self.max_size = max_size

def get(self):
if self.pool:
return self.pool.pop()
return self.factory()

def put(self, obj):
if len(self.pool) < self.max_size:
self.pool.append(obj)

Integration with CI/CD

1. GitHub Actions Integration

# .github/workflows/memory-profiling.yml
name: Memory Profiling

on: [push, pull_request]

jobs:
memory-profiling:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.9"

- name: Install dependencies
run: |
pip install memray pytest-memray
pip install -r requirements.txt

- name: Run memory profiling tests
run: |
pytest --memray tests/

- name: Upload memory reports
uses: actions/upload-artifact@v2
with:
name: memory-reports
path: memray_output/

2. Docker Integration

# Dockerfile
FROM python:3.9-slim

# Install memray
RUN pip install memray

# Copy application code
COPY . /app
WORKDIR /app

# Install application dependencies
RUN pip install -r requirements.txt

# Run with memory profiling
CMD ["memray", "run", "--output", "/app/memray_output.bin", "app.py"]

Troubleshooting

1. Common Issues

Permission Errors

# Fix permission issues
sudo chmod +x /usr/local/bin/memray

# Or install with user flag
pip install --user memray

Large Output Files

# Compress output files
gzip output.bin

# Use temporal analysis to reduce file size
memray run --temporal my_script.py

Memory Overhead

# Use sampling mode for large applications
memray run --sample-rate=0.1 my_script.py

# Profile specific functions only
memray run --filter="my_module.*" my_script.py

2. Performance Tips

Minimize Overhead

# Profile only critical sections
import memray

def main():
# Don't profile initialization
setup_application()

# Profile only the main logic
with memray.Tracker("critical_section.bin"):
main_application_logic()

# Don't profile cleanup
cleanup_application()

Use Appropriate Output Formats

# For quick analysis, use table format
memray table output.bin

# For detailed analysis, use HTML format
memray html output.bin

# For interactive analysis, use live mode
memray run --live my_script.py

Best Practices

1. Profiling Strategy

  • Start with table reports for quick overview
  • Use flame graphs for detailed analysis
  • Profile in production-like environments
  • Compare before/after optimizations

2. Memory Optimization

  • Identify hotspots in flame graphs
  • Look for memory leaks in temporal analysis
  • Optimize allocation patterns
  • Use appropriate data structures

3. Testing Integration

  • Set memory limits for critical functions
  • Profile test suites regularly
  • Monitor memory usage in CI/CD
  • Document memory requirements

4. Production Monitoring

  • Profile representative workloads
  • Monitor memory trends over time
  • Set up alerts for memory anomalies
  • Regular performance reviews

Comparison with Other Tools

Memray vs. Other Python Profilers

FeatureMemraymemory_profilerpy-spypympler
Memory Tracking
Native Code
Live Monitoring
Flame Graphs
Pytest Integration
Low Overhead

Resources and References

Conclusion

Memray is a powerful and comprehensive memory profiler for Python applications. Its ability to track both Python and native code, provide multiple visualization formats, and integrate with testing frameworks makes it an essential tool for Python developers working on memory-intensive applications.

Key advantages:

  • Comprehensive tracking of memory allocations
  • Native code support for C/C++ extensions
  • Multiple output formats for different analysis needs
  • Low overhead during profiling
  • Excellent integration with pytest and CI/CD
  • Active development and community support

Whether you're debugging memory leaks, optimizing performance, or ensuring memory efficiency in your Python applications, Memray provides the tools and insights needed to understand and improve memory usage patterns.