Advanced
Performance Optimization
Optimize Git performance for large repositories, improve speed, and handle scale efficiently
Git Performance Optimization
As repositories grow in size and complexity, Git performance becomes crucial. This comprehensive guide covers strategies to optimize Git for large codebases, improve operation speed, and handle enterprise-scale development efficiently.
Understanding Performance Bottlenecks
Common Performance Issues
Git performance problems typically manifest in these areas:
Operation | Symptoms | Common Causes |
---|---|---|
Clone/Fetch | Slow downloads | Large history, many refs |
Status/Diff | Slow working tree scans | Many files, deep directories |
Push/Pull | Network timeouts | Large pack files, bandwidth |
Merge/Rebase | Long processing times | Complex history, large files |
GC/Repack | High CPU/memory usage | Fragmented objects, deep deltas |
Performance Diagnosis
# Measure repository characteristics
git count-objects -v -H
# count: loose objects
# size: loose objects size
# in-pack: packed objects
# packs: number of pack files
# size-pack: packed size
# prune-packable: redundant loose objects
# Analyze repository size
du -sh .git/
ls -lah .git/objects/pack/
# Identify performance bottlenecks
time git status
time git log --oneline -100
time git diff HEAD~1
# Enable performance tracing
GIT_TRACE_PERFORMANCE=1 git status
GIT_TRACE_PACK_ACCESS=1 git log --oneline -10
Large Repository Strategies
Shallow Clones
Reduce history to improve clone performance:
# Shallow clone with limited history
git clone --depth 1 https://github.com/user/large-repo.git
git clone --depth 50 https://github.com/user/large-repo.git
# Shallow clone with single branch
git clone --depth 1 --single-branch --branch main https://github.com/user/repo.git
# Convert existing clone to shallow
git fetch --depth 1
# Unshallow repository when full history needed
git fetch --unshallow
# Check if repository is shallow
git rev-parse --is-shallow-repository
Partial Clone
Clone only what you need:
# Clone without large files initially (Git 2.19+)
git clone --filter=blob:limit=1M https://github.com/user/repo.git
git clone --filter=blob:none https://github.com/user/repo.git
# Clone specific subdirectory (sparse-checkout)
git clone --filter=blob:none --sparse https://github.com/user/repo.git
cd repo
git sparse-checkout set path/to/subdirectory
# Tree-only clone (no blob objects)
git clone --filter=tree:0 https://github.com/user/repo.git
# Fetch missing objects on demand
git cat-file -p <missing-blob-hash> # Automatically fetches
Sparse Checkout
Work with subset of files:
# Enable sparse checkout
git config core.sparseCheckout true
git sparse-checkout init --cone
# Define included paths
echo "src/" > .git/info/sparse-checkout
echo "docs/" >> .git/info/sparse-checkout
echo "!src/legacy/" >> .git/info/sparse-checkout # Exclude pattern
# Modern cone pattern approach (Git 2.25+)
git sparse-checkout set src/ docs/
git sparse-checkout add tests/unit/
git sparse-checkout list
# Disable sparse checkout
git sparse-checkout disable
Pack File Optimization
Understanding Pack Files
# Analyze current pack files
git verify-pack -v .git/objects/pack/pack-*.idx | head -20
# Format: SHA-1 type size packed-size offset depth base-SHA-1
# Identify delta chains
git verify-pack -v .git/objects/pack/pack-*.idx |
awk '$7 != "" { print $7, $1 }' | sort | uniq -c | sort -nr
# Find largest objects
git verify-pack -v .git/objects/pack/pack-*.idx |
sort -k3 -nr | head -10
Optimal Packing Configuration
# Configure pack settings for better performance
git config pack.deltaCacheSize 512m # Delta cache size
git config pack.windowMemory 1g # Window memory limit
git config pack.threads 0 # Use all available cores
git config pack.depth 50 # Maximum delta depth
git config pack.window 250 # Delta search window
# For very large repositories
git config pack.deltaCacheSize 2g
git config pack.windowMemory 2g
git config pack.depth 250
git config pack.window 1000
Advanced Repacking
# Aggressive repack for one-time optimization
git repack -ad --depth=250 --window=1000
# Regular repack with moderate settings
git repack -ad --depth=50 --window=250
# Repack with pack size limits
git config pack.packSizeLimit 2g
git repack -ad
# Multi-pack-index for multiple pack files (Git 2.21+)
git multi-pack-index write
git multi-pack-index verify
# Geometric repacking (Git 2.30+)
git repack --geometric=2
Index Performance
Index Optimization
# Enable index optimizations
git config core.preloadindex true # Parallel index loading
git config core.fscache true # File system cache (Windows)
git config core.untrackedCache true # Cache untracked files
# Split index for large working trees (experimental)
git config core.splitIndex true
git update-index --split-index
# Monitor index size and performance
ls -lah .git/index
time git status
Untracked File Performance
# Configure untracked file handling
git config status.showUntrackedFiles no # Disable untracked scanning
git config status.showUntrackedFiles normal # Default behavior
# Use .gitignore effectively
echo "*.tmp" >> .gitignore
echo "node_modules/" >> .gitignore
echo "build/" >> .gitignore
# Exclude patterns locally
echo "*.local" >> .git/info/exclude
echo "temp/" >> .git/info/exclude
File System Monitor
# Enable file system monitor (Git 2.25+)
git config core.fsmonitor true
# For Watchman integration
git config core.fsmonitor-watchman .git/hooks/fsmonitor-watchman
# Check if fsmonitor is working
git config --get core.fsmonitor
git status # Should be faster with fsmonitor
Network Performance
Protocol Optimization
# Use SSH with compression
git config core.compression 9
ssh -o "Compression yes" git@github.com
# Configure SSH multiplexing
# In ~/.ssh/config:
# Host github.com
# ControlMaster auto
# ControlPath ~/.ssh/control-%r@%h:%p
# ControlPersist 600
# Use Git protocol v2 (default in Git 2.26+)
git config --global protocol.version 2
# Check protocol version in use
GIT_TRACE_PACKET=1 git ls-remote origin 2>&1 | grep "git< version"
Delta and Compression Settings
# Optimize push/fetch performance
git config core.deltaBaseCacheLimit 1g # Delta base cache
git config core.bigFileThreshold 100m # Large file threshold
git config core.compression 6 # Compression level (0-9)
# Transfer settings
git config pack.compression 6 # Pack compression level
git config http.postBuffer 524288000 # 500MB buffer for large pushes
Bandwidth Optimization
# Limit bandwidth usage
git config http.lowSpeedLimit 1000 # Bytes per second
git config http.lowSpeedTime 300 # Timeout in seconds
# Use resume capability for large transfers
git config http.followRedirects true
# Bundle repositories for offline transfer
git bundle create repo.bundle --all
# Transfer repo.bundle file
git clone repo.bundle repo-clone
Memory Optimization
Memory Configuration
# Configure memory usage limits
git config pack.windowMemory 512m # Pack window memory
git config pack.deltaCacheSize 256m # Delta cache size
git config core.deltaBaseCacheLimit 256m # Delta base cache
# For large repositories, increase limits
git config pack.windowMemory 2g
git config pack.deltaCacheSize 1g
git config core.deltaBaseCacheLimit 1g
# Monitor memory usage during operations
/usr/bin/time -v git repack -ad
# Shows peak memory usage
Memory-Efficient Operations
# Process large files efficiently
git config core.streamingThreshold 512m # Stream files larger than 512MB
git config core.packedGitLimit 256m # Mmap limit for pack files
git config core.packedGitWindowSize 16m # Pack window size
# Disable memory-intensive features if needed
git config diff.noprefix false # Reduce diff memory usage
git config core.precomposeunicode false # Disable Unicode preprocessing
CPU Optimization
Parallel Processing
# Enable parallel operations
git config pack.threads 0 # Use all CPU cores for packing
git config index.threads 0 # Parallel index operations
git config checkout.workers 0 # Parallel checkout
# Submodule operations
git config submodule.fetchJobs 4 # Parallel submodule fetches
# Check CPU utilization during operations
top -p $(pgrep git)
htop # Better visualization
Algorithm Selection
# Choose faster diff algorithms
git config diff.algorithm histogram # Faster for most cases
git config diff.algorithm patience # Better for complex changes
# Merge strategy optimization
git config merge.renameLimit 1000 # Increase rename detection limit
git config diff.renameLimit 1000
# Configure for your specific use case
git config diff.renames true # Enable rename detection
git config merge.renames true
Large File Handling
Git LFS Integration
# Install and configure Git LFS
git lfs install
# Track large files
git lfs track "*.psd"
git lfs track "*.zip"
git lfs track "*.mp4"
git lfs track "docs/*.pdf"
# Configure LFS settings
git config lfs.fetchrecentcommitsdays 7 # Recent commits to fetch
git config lfs.fetchrecentrefsdays 7 # Recent refs to fetch
git config lfs.activitytimeout 30 # Transfer timeout
# LFS performance optimization
git config lfs.concurrenttransfers 8 # Parallel transfers
git config lfs.tustransfers true # Use TUS protocol
Alternative Large File Strategies
# Use external storage for large files
# Store large files outside Git, reference by path
# Example: Database of large files
echo "large-file.zip" > large-files.txt
git add large-files.txt
# Actual file stored in S3, CDN, etc.
# Clean/smudge filters for automatic handling
git config filter.large-files.clean 'store-large-file.sh'
git config filter.large-files.smudge 'retrieve-large-file.sh'
# In .gitattributes:
# *.zip filter=large-files
Garbage Collection Tuning
Automatic GC Configuration
# Configure automatic garbage collection
git config gc.auto 6700 # Objects threshold for auto-gc
git config gc.autoPackLimit 50 # Pack files threshold
git config gc.autoDetach true # Run gc in background
# Disable auto-gc if needed (manual gc only)
git config gc.auto 0
# More aggressive auto-gc for active repositories
git config gc.auto 1000
git config gc.autoPackLimit 10
Manual GC Optimization
# Regular maintenance gc
git gc
# Aggressive gc (slower but more thorough)
git gc --aggressive --prune=now
# Progressive gc strategy
git gc --auto # Let Git decide
git repack -ad # Repack all objects
git prune --expire=2.weeks.ago # Remove old objects
# Monitor gc effectiveness
git count-objects -v
Reflog Management
# Configure reflog retention
git config gc.reflogExpire 90.days # Keep reflog entries for 90 days
git config gc.reflogExpireUnreachable 30.days # Unreachable entries
# For high-activity repositories, reduce retention
git config gc.reflogExpire 30.days
git config gc.reflogExpireUnreachable 7.days
# Manual reflog cleanup
git reflog expire --expire=30.days --all
git gc --prune=now
Scaling Strategies
Monorepo Optimization
# Monorepo-specific optimizations
git config core.preloadindex true
git config core.fscache true
git config core.untrackedCache true
git config feature.manyFiles true # Enable many-files mode (Git 2.30+)
# Use partial clone with sparse checkout
git clone --filter=blob:none --sparse <url>
git sparse-checkout set project1/ shared/
Multi-Repository Strategies
# Git worktrees for multiple branches
git worktree add -b feature/new-ui ../repo-ui
git worktree add -b hotfix/security ../repo-hotfix
# List and manage worktrees
git worktree list
git worktree remove ../repo-hotfix
# Submodules for component management
git submodule add https://github.com/user/component.git components/ui
git submodule update --init --recursive
git submodule foreach git pull origin main
Federation and Subtrees
# Git subtree for including external repositories
git subtree add --prefix=vendor/lib https://github.com/user/lib.git main --squash
git subtree pull --prefix=vendor/lib https://github.com/user/lib.git main --squash
# Federation using multiple remotes
git remote add upstream-a https://github.com/org/repo-a.git
git remote add upstream-b https://github.com/org/repo-b.git
git fetch --all
Monitoring and Metrics
Performance Monitoring
# Built-in Git performance tracing
GIT_TRACE=1 git status # Command execution trace
GIT_TRACE_PERFORMANCE=1 git log # Performance timing
GIT_TRACE_PACK_ACCESS=1 git show HEAD # Pack file access
GIT_CURL_VERBOSE=1 git push # HTTP transfer details
# System resource monitoring
iostat -x 1 # I/O statistics
vmstat 1 # Virtual memory statistics
sar -u 1 # CPU utilization
# Git-specific monitoring
git config --get-regexp ".*" # All configuration
git remote show origin # Remote information
git log --oneline --graph --decorate --all # Repository structure
Custom Performance Scripts
#!/bin/bash
# Performance monitoring script
echo "=== Git Performance Report ==="
echo "Date: $(date)"
echo
echo "Repository size:"
du -sh .git/
echo "Object statistics:"
git count-objects -v
echo "Pack file information:"
ls -lah .git/objects/pack/
echo "Recent performance (status command):"
time git status > /dev/null
echo "Recent performance (log command):"
time git log --oneline -100 > /dev/null
echo "Configuration affecting performance:"
git config --get-regexp "pack\.|core\.|index\.|gc\."
Benchmarking Operations
# Benchmark common operations
hyperfine 'git status'
hyperfine 'git log --oneline -100'
hyperfine 'git diff HEAD~10'
# Compare configurations
hyperfine --prepare 'git config core.preloadindex false' 'git status'
hyperfine --prepare 'git config core.preloadindex true' 'git status'
# Memory profiling with valgrind (if available)
valgrind --tool=massif git log --oneline -1000
Platform-Specific Optimizations
Windows Optimizations
# Windows-specific performance settings
git config core.fscache true # Essential for Windows
git config core.preloadindex true # Parallel processing
git config core.autoLF false # Disable line ending conversion
# Antivirus exclusions (add these paths):
# - .git folders in your repositories
# - Git installation directory
# - Global git config location
# Use Git Credential Manager
git config --global credential.helper manager-core
# PowerShell vs Command Prompt
# PowerShell often has better performance for Git operations
macOS Optimizations
# macOS-specific settings
git config core.trustctime false # File system timestamp issues
git config core.precomposeunicode false # Disable Unicode preprocessing
# Use Homebrew Git instead of system Git
brew install git
which git # Should show /opt/homebrew/bin/git or /usr/local/bin/git
# Enable file system events (fsevents)
# Usually enabled by default on macOS
Linux Optimizations
# Linux-specific optimizations
git config core.checkstat minimal # Reduce stat() calls
git config core.trustctime false # Ignore ctime changes
# File system considerations
# ext4: Generally good performance
# btrfs: May need copy-on-write disabled for .git directories
# xfs: Excellent for large repositories
# Disable CoW on btrfs for .git directories
chattr +C .git/
Troubleshooting Performance Issues
Common Slow Operations
# Slow git status
git config status.showUntrackedFiles no # Disable untracked file scanning
git config core.untrackedCache true # Cache untracked file status
echo "temp/" >> .git/info/exclude # Exclude temporary directories
# Slow git log
git log --oneline --no-merges # Avoid merge commit processing
git log --first-parent # Follow only first parent
git config log.decorate false # Disable decoration
# Slow git diff
git config diff.algorithm histogram # Faster diff algorithm
git config core.bigFileThreshold 100m # Skip diffing large files
Network-Related Issues
# Slow clone/fetch operations
git config http.postBuffer 524288000 # 500MB buffer
git config http.lowSpeedLimit 0 # Disable speed limit
git config pack.compression 1 # Lower compression
# Connection timeout issues
git config http.timeout 600 # 10 minute timeout
git config http.keepalive true # Keep connections alive
# Proxy-related performance
git config http.proxy http://proxy:8080
git config https.proxy https://proxy:8080
Memory and CPU Issues
# Out of memory during operations
git config pack.windowMemory 128m # Reduce memory usage
git config pack.deltaCacheSize 64m
git config core.deltaBaseCacheLimit 64m
# High CPU usage during gc
git config pack.threads 2 # Limit CPU cores
git config gc.auto 0 # Disable auto-gc
# Memory leak investigation
ps aux | grep git # Check for multiple processes
valgrind --leak-check=full git <command> # Memory leak detection
Performance Testing and Validation
Automated Performance Testing
#!/bin/bash
# Git performance test suite
echo "Starting Git performance tests..."
# Test 1: Status performance
echo "Testing git status performance..."
time_status=$(time ( git status > /dev/null ) 2>&1 | grep real | cut -f2)
echo "Status time: $time_status"
# Test 2: Log performance
echo "Testing git log performance..."
time_log=$(time ( git log --oneline -1000 > /dev/null ) 2>&1 | grep real | cut -f2)
echo "Log time: $time_log"
# Test 3: Diff performance
echo "Testing git diff performance..."
time_diff=$(time ( git diff HEAD~10 > /dev/null ) 2>&1 | grep real | cut -f2)
echo "Diff time: $time_diff"
# Test 4: Repository size
echo "Repository metrics:"
git count-objects -v -H
Continuous Performance Monitoring
# Performance regression detection
#!/bin/bash
# performance-monitor.sh
BASELINE_STATUS_TIME=0.5 # seconds
BASELINE_LOG_TIME=1.0 # seconds
current_status_time=$(time ( git status > /dev/null ) 2>&1 | grep real | cut -f2 | cut -dm -f2 | cut -ds -f1)
if (( $(echo "$current_status_time > $BASELINE_STATUS_TIME" | bc -l) )); then
echo "ALERT: git status performance degraded: ${current_status_time}s > ${BASELINE_STATUS_TIME}s"
fi
Best Practices Summary
Configuration Checklist
# Essential performance configurations
git config --global core.preloadindex true
git config --global core.fscache true
git config --global pack.threads 0
git config --global index.threads 0
git config --global checkout.workers 0
# Repository-specific tuning
git config core.untrackedCache true
git config gc.auto 6700
git config pack.deltaCacheSize 256m
git config pack.windowMemory 512m
# Network optimization
git config http.postBuffer 524288000
git config core.compression 6
git config protocol.version 2
Workflow Optimizations
✅ Do:
- Use shallow clones for CI/CD environments
- Implement sparse checkout for large monorepos
- Regular garbage collection maintenance
- Monitor repository growth and performance
- Use Git LFS for binary assets
- Configure appropriate .gitignore patterns
❌ Don't:
- Store large binary files directly in Git history
- Disable automatic garbage collection without manual maintenance
- Use overly aggressive compression settings
- Ignore performance degradation warnings
- Keep unlimited reflog history in high-activity repos
Next Steps
⚡ Congratulations! You now have comprehensive Git performance optimization knowledge.
Continue your mastery:
- Git Security - Implement security best practices
- Git Hooks - Automate performance monitoring
- Workflow Strategies - Optimize team workflows
Advanced Performance Topics
- Implement custom Git protocols for specialized use cases
- Design repository federation strategies
- Create performance monitoring dashboards
- Optimize for specific CI/CD platforms
- Develop automated performance regression testing
- Design disaster recovery procedures for large repositories