satmachineadmin/misc-docs/EMERGENCY_PROTOCOLS.md
padreug cd0d958c2c
Some checks are pending
CI / lint (push) Waiting to run
CI / tests (3.10) (push) Blocked by required conditions
CI / tests (3.9) (push) Blocked by required conditions
/ release (push) Waiting to run
/ pullrequest (push) Blocked by required conditions
consolidate docs
2025-11-03 22:23:10 +01:00

40 KiB

Satoshi Machine Admin - Emergency Protocols

DCA System Failure Recovery Guide

Document Version: 1.0 Last Updated: 2025-10-19 Extension Version: v0.0.1 Status: Production


Table of Contents

  1. Critical Failure Scenarios
  2. Emergency Protocol Checklist
  3. Recovery Procedures
  4. Prevention Measures
  5. Monitoring & Alerts
  6. Contact Information

Critical Failure Scenarios

1. Duplicate Transaction Processing ⚠️ CRITICAL

Risk Level: 🔴 CRITICAL Impact: Same Lamassu transaction processed twice → double distribution to clients → financial loss

Detection Methods

  1. Dashboard Monitoring:

    • Sudden large balance deductions from client accounts
    • Multiple distribution entries for same timestamp
    • Commission wallet receiving duplicate amounts
  2. Database Query:

-- Find duplicate transactions
SELECT transaction_id, COUNT(*) as count,
       STRING_AGG(id::text, ', ') as record_ids
FROM satoshimachine.lamassu_transactions
GROUP BY transaction_id
HAVING COUNT(*) > 1;
  1. Automated Alert Triggers:
    • Same txn_id appears in multiple processing cycles
    • Client balance drops more than expected based on deposit ratios

Immediate Response

  1. STOP POLLING IMMEDIATELY - Disable automatic background task
  2. Document all duplicate entries with screenshots
  3. Identify affected clients and amounts
  4. Calculate total over-distribution amount

Recovery Steps

-- Step 1: Identify duplicate distributions
SELECT lt.transaction_id, lt.id, lt.created_at, lt.base_amount,
       COUNT(dp.id) as distribution_count
FROM satoshimachine.lamassu_transactions lt
LEFT JOIN satoshimachine.dca_payments dp ON dp.lamassu_transaction_id = lt.id
GROUP BY lt.id
HAVING COUNT(dp.id) > (SELECT COUNT(*) FROM satoshimachine.dca_clients WHERE remaining_balance > 0);

-- Step 2: Calculate over-distributed amounts per client
SELECT client_id,
       SUM(amount_sats) as total_received,
       -- Manual calculation of expected amount needed here
FROM satoshimachine.dca_payments
WHERE lamassu_transaction_id IN (SELECT id FROM duplicates_table)
GROUP BY client_id;

Manual Correction:

  1. Calculate correct distribution amounts
  2. Create compensating negative adjustments (if supported) OR
  3. Deduct from future distributions until balanced
  4. Document all corrections in audit log
  5. Notify affected clients if material amount

Prevention Measures

Required Code Changes:

# Add to transaction_processor.py BEFORE processing
existing = await get_lamassu_transaction_by_txid(txn_id)
if existing:
    logger.warning(f"⚠️ Transaction {txn_id} already processed, skipping")
    return None

Required Database Change:

ALTER TABLE satoshimachine.lamassu_transactions
ADD CONSTRAINT unique_transaction_id UNIQUE (transaction_id);

2. SSH/Database Connection Loss

Risk Level: 🟡 MEDIUM Impact: Polling stops → transactions not processed → clients not receiving Bitcoin on time

Detection Methods

  1. Dashboard Indicators:

    • No new records in transaction history for > 24 hours
    • "Test Connection" button fails in admin configuration
    • Background task logs show SSH connection errors
  2. Database Query:

-- Check last successful poll
SELECT MAX(created_at) as last_transaction,
       EXTRACT(EPOCH FROM (NOW() - MAX(created_at)))/3600 as hours_since_last
FROM satoshimachine.lamassu_transactions;

-- If hours_since_last > 24, investigate immediately
  1. Log File Check:
# Check LNBits logs for SSH errors
tail -n 100 /home/padreug/AioLabs/Git/lnbits-extensions/lnbits/data/logs/lnbits.log | grep -i "ssh\|connection"

Immediate Response

  1. Verify network connectivity to Lamassu server
  2. Test SSH credentials manually:
ssh postgres@<lamassu-host> -p <port> -i <private-key-path>
  1. Check firewall rules and network changes
  2. Verify Lamassu server is running and accessible

Recovery Steps

Option A: Credential Issues

  1. Regenerate SSH keys if compromised
  2. Update authorized_keys on Lamassu server
  3. Update configuration in admin dashboard
  4. Test connection before re-enabling polling

Option B: Network Issues

  1. Coordinate with network admin to restore connectivity
  2. Verify IP whitelisting if applicable
  3. Test connection stability before resuming

Option C: Lamassu Server Issues

  1. Contact Lamassu administrator
  2. Verify PostgreSQL service is running
  3. Check database is accessible

Post-Recovery:

# System automatically catches up using last_polled_at timestamp
# Run manual poll to process missed transactions
POST /api/v1/dca/manual-poll

Prevention Measures

  1. SSH Key Authentication (more reliable than password):
# Generate dedicated key for this service
ssh-keygen -t ed25519 -f ~/.ssh/satmachine_lamassu -C "satmachine-polling"
  1. Connection Monitoring: Implement daily health check
  2. Retry Logic: Add exponential backoff in polling code
  3. Alert System: Email/SMS when polling fails for > 2 hours

3. Payment Distribution Failures

Risk Level: 🔴 CRITICAL Impact: Commission deducted and client balances reduced, but transfers fail → money stuck in limbo

Detection Methods

  1. Dashboard Monitoring:

    • Client balances decrease but payment status shows "failed"
    • Commission wallet balance doesn't increase as expected
    • Error notifications in payment processing
  2. Database Query:

-- Find stuck/failed payments (older than 1 hour, not completed)
SELECT dp.id, dp.client_id, dp.amount_sats, dp.status, dp.created_at,
       c.username, dp.payment_hash
FROM satoshimachine.dca_payments dp
JOIN satoshimachine.dca_clients c ON dp.client_id = c.id
WHERE dp.status != 'completed'
  AND dp.created_at < NOW() - INTERVAL '1 hour'
ORDER BY dp.created_at DESC;
  1. Wallet Balance Check:
-- Compare expected vs actual commission wallet balance
SELECT
  SUM(commission_amount) as total_commission_expected,
  -- Manually check actual wallet balance in LNBits
FROM satoshimachine.lamassu_transactions;

Immediate Response

  1. STOP POLLING - Prevent more transactions from failing

  2. Identify root cause:

    • Insufficient balance in source wallet?
    • Invalid wallet adminkey?
    • LNBits API issues?
    • Network connectivity to LNBits?
  3. Document all failed payments with IDs and amounts

Recovery Steps

Step 1: Verify Wallet Configuration

# Test that commission wallet is accessible
curl -X GET https://<lnbits-host>/api/v1/wallet \
  -H "X-Api-Key: <commission-wallet-adminkey>"

Step 2: Check Wallet Balance

  • Ensure commission wallet has sufficient balance
  • Verify no wallet locks or restrictions

Step 3: Manual Retry Process

Option A: Retry via API (if retry endpoint exists)

# Retry failed payment
POST /api/v1/dca/payments/{payment_id}/retry

Option B: Manual Recreation (if no retry available)

  1. Query failed payment details
  2. Mark original payment as "cancelled"
  3. Create new payment entry with same parameters
  4. Process through normal payment flow
  5. Update client records

Step 4: Verify Reconciliation

-- After recovery, verify all clients balance correctly
SELECT
  c.id,
  c.username,
  c.remaining_balance,
  COALESCE(SUM(d.amount), 0) as total_deposits,
  COALESCE(SUM(CASE WHEN p.status = 'completed' THEN p.amount_sats ELSE 0 END), 0) as total_payments
FROM satoshimachine.dca_clients c
LEFT JOIN satoshimachine.dca_deposits d ON c.id = d.client_id AND d.status = 'confirmed'
LEFT JOIN satoshimachine.dca_payments p ON c.id = p.client_id
GROUP BY c.id;

Prevention Measures

Required Code Changes:

# Add pre-flight check in transaction_processor.py
async def verify_commission_wallet_accessible():
    """Verify commission wallet exists and is accessible before processing"""
    try:
        response = await wallet_api_call(commission_wallet_id)
        if not response.ok:
            raise Exception("Commission wallet not accessible")
        return True
    except Exception as e:
        logger.error(f"Pre-flight check failed: {e}")
        return False

# Add transaction rollback on distribution failure
async def process_transaction_with_rollback():
    transaction_record = None
    try:
        # Deduct balances
        # Create distributions
        # Transfer funds
        # Mark complete
    except Exception as e:
        # ROLLBACK: Restore client balances
        # Mark transaction as failed
        # Alert admin

Monitoring:

  1. Alert if any payment remains in non-completed status > 15 minutes
  2. Daily reconciliation of commission wallet balance
  3. Automated testing of wallet accessibility

4. Balance Discrepancies

Risk Level: 🟡 MEDIUM Impact: Client balances don't match deposit/payment records → accounting errors, audit failures

Detection Methods

Full Reconciliation Query:

-- Complete balance reconciliation report
SELECT
  c.id,
  c.username,
  c.remaining_balance as current_balance,
  COALESCE(SUM(d.amount), 0) as total_deposits,
  COALESCE(SUM(CASE WHEN p.status = 'completed' THEN p.amount_sats ELSE 0 END), 0) as total_distributed,
  COALESCE(SUM(d.amount), 0) - COALESCE(SUM(CASE WHEN p.status = 'completed' THEN p.amount_sats ELSE 0 END), 0) as calculated_balance,
  c.remaining_balance - (COALESCE(SUM(d.amount), 0) - COALESCE(SUM(CASE WHEN p.status = 'completed' THEN p.amount_sats ELSE 0 END), 0)) as discrepancy
FROM satoshimachine.dca_clients c
LEFT JOIN satoshimachine.dca_deposits d ON c.id = d.client_id AND d.status = 'confirmed'
LEFT JOIN satoshimachine.dca_payments p ON c.id = p.client_id
GROUP BY c.id, c.username, c.remaining_balance
HAVING ABS(c.remaining_balance - (COALESCE(SUM(d.amount), 0) - COALESCE(SUM(CASE WHEN p.status = 'completed' THEN p.amount_sats ELSE 0 END), 0))) > 1;

Immediate Response

  1. Run full reconciliation query above
  2. Export complete audit trail to CSV
  3. Document discrepancy amounts per client
  4. Identify pattern (all clients affected? specific time period?)

Recovery Steps

For Each Discrepancy:

  1. Trace Transaction History:
-- Get complete transaction history for client
SELECT 'DEPOSIT' as type, id, amount, status, created_at, confirmed_at
FROM satoshimachine.dca_deposits
WHERE client_id = <client_id>
UNION ALL
SELECT 'PAYMENT' as type, id, amount_sats, status, created_at, NULL
FROM satoshimachine.dca_payments
WHERE client_id = <client_id>
ORDER BY created_at;
  1. Manual Recalculation:

    • Sum all confirmed deposits
    • Subtract all completed payments
    • Compare to current balance
    • Identify missing/extra transactions
  2. Correction Methods:

Option A: Adjustment Entry (Recommended)

-- Create compensating deposit for positive discrepancy
INSERT INTO satoshimachine.dca_deposits (client_id, amount, status, note)
VALUES (<client_id>, <adjustment_amount>, 'confirmed', 'Balance correction - reconciliation 2025-10-19');

-- OR Create compensating payment for negative discrepancy
INSERT INTO satoshimachine.dca_payments (client_id, amount_sats, status, note)
VALUES (<client_id>, <adjustment_amount>, 'completed', 'Balance correction - reconciliation 2025-10-19');

Option B: Direct Balance Update (Use with extreme caution)

-- ONLY if audit trail is complete and discrepancy is unexplained
UPDATE satoshimachine.dca_clients
SET remaining_balance = <correct_balance>,
    updated_at = NOW()
WHERE id = <client_id>;

-- MUST document in separate audit log

Prevention Measures

Daily Automated Reconciliation:

# Add to tasks.py
async def daily_reconciliation_check():
    """Run daily at 00:00 UTC"""
    discrepancies = await find_balance_discrepancies()
    if discrepancies:
        await send_alert_to_admin(discrepancies)
        await log_reconciliation_report(discrepancies)

Database Constraints:

-- Prevent negative balances
ALTER TABLE satoshimachine.dca_clients
ADD CONSTRAINT positive_balance CHECK (remaining_balance >= 0);

-- Prevent confirmed deposits with zero amount
ALTER TABLE satoshimachine.dca_deposits
ADD CONSTRAINT positive_deposit CHECK (amount > 0);

Audit Enhancements:

  • Store before/after balance with each transaction
  • Implement change log table for all balance modifications
  • Automated snapshots of all balances daily

5. Commission Calculation Errors

Risk Level: 🟠 HIGH Impact: Wrong commission rate applied → over/under collection → revenue loss or client overcharge

Detection Methods

Verification Query:

-- Verify all commission calculations are mathematically correct
SELECT
  id,
  transaction_id,
  crypto_atoms,
  commission_percentage,
  discount,
  base_amount,
  commission_amount,
  -- Recalculate expected values
  ROUND(crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100))) as expected_base,
  ROUND(crypto_atoms - (crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100)))) as expected_commission,
  -- Calculate differences
  base_amount - ROUND(crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100))) as base_difference,
  commission_amount - ROUND(crypto_atoms - (crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100)))) as commission_difference
FROM satoshimachine.lamassu_transactions
WHERE ABS(base_amount - ROUND(crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100)))) > 1
   OR ABS(commission_amount - ROUND(crypto_atoms - (crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100))))) > 1;

Manual Spot Check:

Example: 2000 GTQ → 266,800 sats (3% commission, 0% discount)

Expected calculation:
- Effective commission = 0.03 * (100 - 0) / 100 = 0.03
- Base amount = 266800 / (1 + 0.03) = 258,835 sats
- Commission = 266800 - 258835 = 7,965 sats

Verify these match database values.

Immediate Response

  1. Run verification query to identify all affected transactions
  2. Calculate total under/over collection amount
  3. Determine if pattern (all transactions? specific time period? specific commission rate?)
  4. STOP POLLING if active miscalculation detected

Recovery Steps

Step 1: Quantify Impact

-- Total revenue impact
SELECT
  COUNT(*) as affected_transactions,
  SUM(commission_difference) as total_revenue_impact,
  MIN(created_at) as first_occurrence,
  MAX(created_at) as last_occurrence
FROM (
  -- Use verification query from above
) as calc_check
WHERE ABS(commission_difference) > 1;

Step 2: Client Impact Assessment

-- Which clients were affected and by how much
SELECT
  c.id,
  c.username,
  COUNT(lt.id) as affected_transactions,
  SUM(lt.base_amount) as total_distributed,
  SUM(expected_base) as should_have_distributed,
  SUM(expected_base - lt.base_amount) as client_impact
FROM satoshimachine.lamassu_transactions lt
JOIN satoshimachine.dca_payments dp ON dp.lamassu_transaction_id = lt.id
JOIN satoshimachine.dca_clients c ON dp.client_id = c.id
WHERE -- filter for affected transactions
GROUP BY c.id;

Step 3: Correction

If Under-Collected Commission:

  • Revenue lost to business
  • Client received correct amount
  • No client-facing correction needed
  • Document for accounting

If Over-Collected Commission:

  • Client under-distributed
  • Create compensating payments to affected clients:
-- Add to client balances
UPDATE satoshimachine.dca_clients c
SET remaining_balance = remaining_balance + adjustment.amount
FROM (
  -- Calculate adjustment per client
  SELECT client_id, SUM(should_have_distributed - actual_distributed) as amount
  FROM affected_transactions_analysis
  GROUP BY client_id
) adjustment
WHERE c.id = adjustment.client_id;

Step 4: Fix Code Bug

  • Identify root cause in transaction_processor.py
  • Add unit test for failed scenario
  • Deploy fix
  • Verify with test transaction

Prevention Measures

Unit Tests (Add to test suite):

def test_commission_calculation_scenarios():
    """Test edge cases for commission calculation"""
    test_cases = [
        # (crypto_atoms, commission%, discount%, expected_base, expected_commission)
        (266800, 0.03, 0.0, 258835, 7965),
        (100000, 0.05, 10.0, 95694, 4306),  # With discount
        (1, 0.03, 0.0, 0, 1),  # Minimum amount
        # Add more edge cases
    ]
    for case in test_cases:
        result = calculate_commission(*case[:3])
        assert result.base_amount == case[3]
        assert result.commission_amount == case[4]

Calculation Verification (Add to processing):

# After calculation, verify math is correct
calculated_total = base_amount + commission_amount
assert abs(calculated_total - crypto_atoms) <= 1, "Commission calculation error detected"

Audit Trail:

  • Store all calculation parameters with each transaction
  • Log formula used and intermediate values
  • Enable post-processing verification

6. Wallet Key Rotation/Invalidation

Risk Level: 🟠 HIGH Impact: Commission wallet adminkey changes → can't receive commission payments → processing halts

Detection Methods

  1. API Error Responses:

    • Payment API returns 401/403 authentication errors
    • "Invalid API key" messages in logs
  2. Wallet Balance Check:

-- Commission not accumulating despite transactions
SELECT
  SUM(commission_amount) as expected_total_commission,
  -- Compare to actual wallet balance in LNBits dashboard
FROM satoshimachine.lamassu_transactions
WHERE created_at > '<date-of-last-known-good-balance>';
  1. Manual Test:
# Test commission wallet adminkey
curl -X GET https://<lnbits-host>/api/v1/wallet \
  -H "X-Api-Key: <commission-wallet-adminkey>"

# Should return wallet details, not auth error

Immediate Response

  1. STOP POLLING - No point processing if can't distribute
  2. Identify if key was intentionally rotated or compromised
  3. Obtain new valid adminkey for commission wallet
  4. Verify source wallet adminkey is also still valid

Recovery Steps

Step 1: Update Configuration

  1. Access admin dashboard
  2. Navigate to Configuration section
  3. Update commission wallet adminkey
  4. Update source wallet adminkey if also affected
  5. Test Connection before saving

Step 2: Verify Configuration Persisted

-- Check configuration was saved correctly
SELECT commission_wallet_id,
       LEFT(commission_wallet_adminkey, 10) || '...' as key_preview,
       updated_at
FROM satoshimachine.lamassu_config
ORDER BY updated_at DESC
LIMIT 1;

Step 3: Reprocess Failed Transactions

  • Identify transactions that failed due to auth errors
  • Mark for retry or manually reprocess
  • Verify commission payments complete successfully

Step 4: Resume Operations

  1. Test with manual poll first
  2. Verify single transaction processes completely
  3. Re-enable automatic polling
  4. Monitor for 24 hours

Prevention Measures

Key Management Documentation:

  • Document which LNBits wallets are used for commission/source
  • Store backup admin keys in secure location (password manager)
  • Define key rotation procedure with testing steps
  • Require testing in staging before production changes

Configuration Validation:

# Add to config save endpoint in views_api.py
async def validate_wallet_keys(commission_key, source_key):
    """Test wallet keys before saving configuration"""
    # Test commission wallet
    commission_valid = await test_wallet_access(commission_key)
    if not commission_valid:
        raise ValueError("Commission wallet key is invalid")

    # Test source wallet (if applicable)
    if source_key:
        source_valid = await test_wallet_access(source_key)
        if not source_valid:
            raise ValueError("Source wallet key is invalid")

    return True

Automated Monitoring:

  • Daily health check of wallet accessibility
  • Alert if wallet API calls start failing
  • Backup key verification in secure environment

Emergency Protocol Checklist

Use this checklist when ANY error is detected in the DCA system.

Phase 1: Immediate Actions (First 5 Minutes)

  • Stop Automatic Processing

    • Disable background polling task
    • Verify no transactions are currently processing
    • Document time polling was stopped
  • Assess Severity

    • Is money at risk? (duplicate processing, failed payments)
    • Are clients affected? (missing distributions, balance errors)
    • Is this blocking operations? (connection loss, wallet issues)
  • Initial Documentation

    • Take screenshots of error messages
    • Note exact timestamp of error detection
    • Record current system state (balances, last transaction, etc.)
  • Notify Stakeholders

    • Alert system administrator
    • Notify clients if distributions will be delayed > 24 hours
    • Escalate if financial impact > threshold

Phase 2: Investigation (15-30 Minutes)

  • Collect Diagnostic Information

    • Run relevant SQL queries from scenarios above
    • Check LNBits logs: /lnbits/data/logs/
    • Review recent configuration changes
    • Test external connections (SSH, wallets)
  • Identify Root Cause

    • Match symptoms to failure scenarios above
    • Determine if human error, system failure, or external issue
    • Estimate scope of impact (time range, # clients, # transactions)
  • Document Findings

    • Record root cause analysis
    • List all affected records (transaction IDs, client IDs)
    • Calculate financial impact (over/under distributed amounts)
    • Take database snapshots for audit trail

Phase 3: Recovery (30 Minutes - 2 Hours)

  • Fix Root Cause

    • Apply code fix if bug
    • Update configuration if settings issue
    • Restore connection if network issue
    • Refer to specific recovery steps in scenarios above
  • Data Correction

    • Run reconciliation queries
    • Execute correction SQL statements
    • Verify all client balances are accurate
    • Ensure audit trail is complete
  • Verification

    • Test fix with single transaction
    • Verify wallets are accessible
    • Confirm connections are stable
    • Run full reconciliation report

Phase 4: Resumption (After Verification)

  • Gradual Restart

    • Process one manual poll successfully
    • Monitor for errors during processing
    • Verify distributions complete end-to-end
    • Check commission payments arrive correctly
  • Re-enable Automation

    • Turn on background polling task
    • Set monitoring alerts
    • Document in system log
  • Enhanced Monitoring

    • Watch closely for 24 hours
    • Run reconciliation after next 3-5 transactions
    • Verify no recurrence of issue

Phase 5: Post-Incident (24-48 Hours After)

  • Complete Post-Mortem

    • Document full timeline of incident
    • Record exact root cause and fix applied
    • Calculate total impact (financial, time, clients affected)
    • Identify what went well and what could improve
  • Implement Safeguards

    • Add prevention measures from scenario sections
    • Implement new monitoring/alerts
    • Add automated tests for this failure mode
    • Update runbooks and documentation
  • Stakeholder Communication

    • Send incident report to management
    • Notify affected clients if applicable
    • Document lessons learned
    • Update emergency contact procedures if needed

Recovery Procedures

Full System Reconciliation

Run this complete reconciliation procedure weekly or after any incident:

-- ============================================
-- FULL SYSTEM RECONCILIATION REPORT
-- ============================================

-- 1. Client Balance Reconciliation
WITH client_financials AS (
  SELECT
    c.id,
    c.username,
    c.remaining_balance as current_balance,
    COALESCE(SUM(d.amount), 0) as total_deposits,
    COALESCE(SUM(CASE WHEN p.status = 'completed' THEN p.amount_sats ELSE 0 END), 0) as total_payments,
    COUNT(DISTINCT d.id) as deposit_count,
    COUNT(DISTINCT p.id) as payment_count
  FROM satoshimachine.dca_clients c
  LEFT JOIN satoshimachine.dca_deposits d ON c.id = d.client_id AND d.status = 'confirmed'
  LEFT JOIN satoshimachine.dca_payments p ON c.id = p.client_id
  GROUP BY c.id
)
SELECT
  id,
  username,
  current_balance,
  total_deposits,
  total_payments,
  (total_deposits - total_payments) as calculated_balance,
  (current_balance - (total_deposits - total_payments)) as discrepancy,
  deposit_count,
  payment_count,
  CASE
    WHEN ABS(current_balance - (total_deposits - total_payments)) <= 1 THEN '✅ OK'
    ELSE '⚠️ MISMATCH'
  END as status
FROM client_financials
ORDER BY ABS(current_balance - (total_deposits - total_payments)) DESC;

-- 2. Transaction Processing Verification
SELECT
  COUNT(*) as total_transactions,
  SUM(crypto_atoms) as total_sats_processed,
  SUM(base_amount) as total_distributed,
  SUM(commission_amount) as total_commission,
  MIN(created_at) as first_transaction,
  MAX(created_at) as last_transaction
FROM satoshimachine.lamassu_transactions;

-- 3. Failed/Pending Payments Check
SELECT
  status,
  COUNT(*) as count,
  SUM(amount_sats) as total_amount,
  MIN(created_at) as oldest,
  MAX(created_at) as newest
FROM satoshimachine.dca_payments
GROUP BY status
ORDER BY
  CASE status
    WHEN 'failed' THEN 1
    WHEN 'pending' THEN 2
    WHEN 'completed' THEN 3
  END;

-- 4. Unconfirmed Deposits Check (sitting too long)
SELECT
  id,
  client_id,
  amount,
  status,
  created_at,
  EXTRACT(EPOCH FROM (NOW() - created_at))/3600 as hours_pending
FROM satoshimachine.dca_deposits
WHERE status = 'pending'
  AND created_at < NOW() - INTERVAL '48 hours'
ORDER BY created_at;

-- 5. Commission Calculation Verification (sample check)
SELECT
  id,
  transaction_id,
  crypto_atoms,
  commission_percentage,
  discount,
  base_amount,
  commission_amount,
  ROUND(crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100))) as expected_base,
  base_amount - ROUND(crypto_atoms / (1 + (commission_percentage * (100 - discount) / 100))) as difference
FROM satoshimachine.lamassu_transactions
ORDER BY created_at DESC
LIMIT 20;

Emergency Access Procedures

If Admin Dashboard Inaccessible:

  1. Direct Database Access:
# Connect to LNBits database
sqlite3 /home/padreug/AioLabs/Git/lnbits-extensions/lnbits/data/database.sqlite

# Or PostgreSQL if used
psql -h localhost -U lnbits -d lnbits
  1. Direct Configuration Update:
-- Update Lamassu config directly
UPDATE satoshimachine.lamassu_config
SET polling_enabled = false
WHERE id = (SELECT MAX(id) FROM satoshimachine.lamassu_config);
  1. Manual Client Balance Update:
-- ONLY in emergency when dashboard unavailable
UPDATE satoshimachine.dca_clients
SET remaining_balance = <correct_amount>
WHERE id = <client_id>;
-- MUST document this action in incident log

If Background Task Won't Stop:

# Find LNBits process
ps aux | grep lnbits

# Restart LNBits service (will stop all background tasks)
systemctl restart lnbits
# OR if running manually:
pkill -f lnbits
uv run lnbits

Data Export for Audit

Complete Audit Trail Export:

# Export all DCA-related tables to CSV
sqlite3 -header -csv /path/to/lnbits/database.sqlite \
  "SELECT * FROM satoshimachine.lamassu_transactions;" \
  > lamassu_transactions_export_$(date +%Y%m%d_%H%M%S).csv

sqlite3 -header -csv /path/to/lnbits/database.sqlite \
  "SELECT * FROM satoshimachine.dca_payments;" \
  > dca_payments_export_$(date +%Y%m%d_%H%M%S).csv

sqlite3 -header -csv /path/to/lnbits/database.sqlite \
  "SELECT * FROM satoshimachine.dca_deposits;" \
  > dca_deposits_export_$(date +%Y%m%d_%H%M%S).csv

sqlite3 -header -csv /path/to/lnbits/database.sqlite \
  "SELECT * FROM satoshimachine.dca_clients;" \
  > dca_clients_export_$(date +%Y%m%d_%H%M%S).csv

Combined Audit Report:

-- Complete transaction-to-distribution audit trail
SELECT
  lt.id as transaction_id,
  lt.transaction_id as lamassu_txn_id,
  lt.created_at as transaction_time,
  lt.crypto_atoms as total_sats,
  lt.fiat_code,
  lt.fiat_amount,
  lt.commission_percentage,
  lt.discount,
  lt.base_amount as distributable_sats,
  lt.commission_amount,
  c.id as client_id,
  c.username as client_name,
  dp.amount_sats as client_received,
  dp.status as payment_status,
  dp.payment_hash
FROM satoshimachine.lamassu_transactions lt
LEFT JOIN satoshimachine.dca_payments dp ON dp.lamassu_transaction_id = lt.id
LEFT JOIN satoshimachine.dca_clients c ON dp.client_id = c.id
ORDER BY lt.created_at DESC, c.username;

Prevention Measures

Required Immediate Implementations

1. Idempotency Protection (CRITICAL)

File: transaction_processor.py

async def process_lamassu_transaction(txn_data: dict) -> Optional[LamassuTransaction]:
    """Process transaction with idempotency check"""

    # CRITICAL: Check if already processed
    existing = await get_lamassu_transaction_by_txid(txn_data['id'])
    if existing:
        logger.warning(f"⚠️ Transaction {txn_data['id']} already processed at {existing.created_at}, skipping")
        return None

    # Continue with processing...

2. Database Constraints

File: migrations.py

-- Add unique constraint on transaction_id
ALTER TABLE satoshimachine.lamassu_transactions
ADD CONSTRAINT unique_transaction_id UNIQUE (transaction_id);

-- Prevent negative balances
ALTER TABLE satoshimachine.dca_clients
ADD CONSTRAINT positive_balance CHECK (remaining_balance >= 0);

-- Ensure positive amounts
ALTER TABLE satoshimachine.dca_deposits
ADD CONSTRAINT positive_deposit CHECK (amount > 0);

ALTER TABLE satoshimachine.dca_payments
ADD CONSTRAINT positive_payment CHECK (amount_sats > 0);

3. Transaction Status Tracking

File: models.py

class LamassuTransaction(BaseModel):
    # ... existing fields ...
    status: str = "pending"  # pending, processing, completed, failed
    error_message: Optional[str] = None
    processed_at: Optional[datetime] = None

4. Pre-flight Wallet Validation

File: transaction_processor.py

async def validate_system_ready() -> Tuple[bool, str]:
    """Validate system is ready to process transactions"""

    # Check commission wallet accessible
    try:
        commission_wallet = await get_wallet(config.commission_wallet_id)
        if not commission_wallet:
            return False, "Commission wallet not accessible"
    except Exception as e:
        return False, f"Commission wallet error: {str(e)}"

    # Check for stuck payments
    stuck_payments = await get_stuck_payments(hours=2)
    if stuck_payments:
        return False, f"{len(stuck_payments)} payments stuck for >2 hours"

    # Check database connectivity
    try:
        await db_health_check()
    except Exception as e:
        return False, f"Database health check failed: {str(e)}"

    return True, "System ready"

# Call before processing
ready, message = await validate_system_ready()
if not ready:
    logger.error(f"System not ready: {message}")
    await send_alert_to_admin(message)
    return

Automated Monitoring Implementation

Daily Reconciliation Task

File: tasks.py

@scheduler.scheduled_job("cron", hour=0, minute=0)  # Daily at midnight UTC
async def daily_reconciliation():
    """Run daily balance reconciliation and report discrepancies"""

    logger.info("Starting daily reconciliation...")

    discrepancies = await find_balance_discrepancies()

    if discrepancies:
        report = generate_reconciliation_report(discrepancies)
        await send_alert_to_admin("⚠️ Balance Discrepancies Detected", report)
        await log_reconciliation_issue(discrepancies)
    else:
        logger.info("✅ Daily reconciliation passed - all balances match")

    # Check for stuck payments
    stuck_payments = await get_stuck_payments(hours=24)
    if stuck_payments:
        await send_alert_to_admin(
            f"⚠️ {len(stuck_payments)} Stuck Payments Detected",
            format_stuck_payments_report(stuck_payments)
        )

Connection Health Monitor

@scheduler.scheduled_job("interval", hours=2)  # Every 2 hours
async def check_system_health():
    """Monitor system health and alert on issues"""

    issues = []

    # Check last successful poll
    last_poll = await get_last_successful_poll_time()
    if last_poll and (datetime.utcnow() - last_poll).total_seconds() > 86400:  # 24 hours
        issues.append(f"No successful poll in {(datetime.utcnow() - last_poll).total_seconds() / 3600:.1f} hours")

    # Check wallet accessibility
    try:
        await test_wallet_access(config.commission_wallet_adminkey)
    except Exception as e:
        issues.append(f"Commission wallet inaccessible: {str(e)}")

    # Check database connectivity
    try:
        await test_lamassu_connection()
    except Exception as e:
        issues.append(f"Lamassu database connection failed: {str(e)}")

    if issues:
        await send_alert_to_admin("⚠️ System Health Check Failed", "\n".join(issues))

Alert Configuration

File: config.py or environment variables

# Alert settings
ALERT_EMAIL = "admin@yourdomain.com"
ALERT_WEBHOOK = "https://hooks.slack.com/..."  # Slack/Discord webhook
ALERT_PHONE = "+1234567890"  # For critical alerts (optional)

# Alert thresholds
MAX_STUCK_PAYMENT_HOURS = 2
MAX_POLL_DELAY_HOURS = 24
MAX_BALANCE_DISCREPANCY_SATS = 100

Monitoring & Alerts

Dashboard Indicators to Watch

Critical Indicators (Check Daily)

  • Last Successful Poll Time - Should be within last 2-4 hours (based on polling interval)
  • Failed Payment Count - Should be 0; investigate immediately if > 0
  • Commission Wallet Balance - Should increase proportionally with transactions
  • Active Clients with Balance - Cross-reference with expected DCA participants

Warning Indicators (Check Weekly)

  • ⚠️ Pending Deposits > 48 hours - May indicate confirmation workflow issue
  • ⚠️ Client Balance Reconciliation - Run full reconciliation report
  • ⚠️ Average Commission % - Verify matches configured rates
  • ⚠️ Transaction Processing Time - Should complete within minutes, not hours

Automated Alert Triggers

Implement these alerts in production:

Alert Severity Trigger Condition Response Time
Duplicate Transaction Detected 🔴 CRITICAL Same transaction_id inserted twice Immediate
Payment Stuck > 15 minutes 🔴 CRITICAL Payment status not "completed" after 15min < 30 minutes
Polling Failed > 24 hours 🟠 HIGH No new transactions in 24 hours < 2 hours
Balance Discrepancy > 100 sats 🟠 HIGH Reconciliation finds error > threshold < 4 hours
Wallet Inaccessible 🟠 HIGH Commission wallet returns auth error < 1 hour
Database Connection Failed 🟡 MEDIUM Cannot connect to Lamassu DB < 4 hours
Commission Calculation Anomaly 🟡 MEDIUM Calculated amount differs from formula < 24 hours

Log Files to Monitor

Location: /home/padreug/AioLabs/Git/lnbits-extensions/lnbits/data/logs/

Key Log Patterns:

# Critical errors
grep -i "error\|exception\|failed" lnbits.log | tail -100

# Transaction processing
grep "LamassuTransactionProcessor" lnbits.log | tail -50

# Payment distribution
grep "dca_payment\|distribution" lnbits.log | tail -50

# SSH connection issues
grep -i "ssh\|connection\|timeout" lnbits.log | tail -50

# Wallet API calls
grep "wallet.*api\|payment_hash" lnbits.log | tail -50

Manual Checks (Weekly)

Sunday 00:00 UTC - Weekly Audit:

  1. Run full reconciliation SQL report
  2. Export all tables to CSV for backup
  3. Verify commission wallet balance matches sum of commission_amount
  4. Check for any pending deposits > 7 days old
  5. Review last 20 transactions for calculation correctness
  6. Test database connection from admin dashboard
  7. Test manual poll to verify end-to-end flow
  8. Review error logs for any concerning patterns

Contact Information

System Access

LNBits Admin Dashboard:

  • URL: https://<your-lnbits-host>/satoshimachine
  • Requires superuser authentication

Database Access:

# LNBits database
sqlite3 /home/padreug/AioLabs/Git/lnbits-extensions/lnbits/data/database.sqlite

# Direct table access
sqlite3 /path/to/db "SELECT * FROM satoshimachine.<table_name>;"

Log Files:

# Main logs
tail -f /home/padreug/AioLabs/Git/lnbits-extensions/lnbits/data/logs/lnbits.log

# Error logs only
tail -f /home/padreug/AioLabs/Git/lnbits-extensions/lnbits/data/logs/lnbits.log | grep -i error

Emergency Escalation

Level 1 - System Administrator (First Contact):

  • Name: _______________________
  • Email: _______________________
  • Phone: _______________________
  • Availability: _______________________

Level 2 - Technical Lead (If L1 unavailable):

  • Name: _______________________
  • Email: _______________________
  • Phone: _______________________
  • Availability: _______________________

Level 3 - Business Owner (Financial impact > $X):

  • Name: _______________________
  • Email: _______________________
  • Phone: _______________________
  • Availability: _______________________

External Contacts

Lamassu Administrator:

  • Name: _______________________
  • Email: _______________________
  • Phone: _______________________
  • SSH access issues, database access, ATM questions

LNBits Infrastructure:

  • Name: _______________________
  • Email: _______________________
  • Phone: _______________________
  • Wallet issues, API problems, system downtime

Accountant/Auditor:

  • Name: _______________________
  • Email: _______________________
  • For balance discrepancies requiring financial reconciliation

Appendix: Quick Reference Commands

Emergency Stop

# Stop LNBits service (stops all background tasks)
systemctl stop lnbits

# Or kill process
pkill -f lnbits

Emergency Database Disable Polling

-- Disable automatic polling
UPDATE satoshimachine.lamassu_config
SET polling_enabled = false;

Quick Balance Check

-- All client balances summary
SELECT id, username, remaining_balance, created_at
FROM satoshimachine.dca_clients
ORDER BY remaining_balance DESC;

Last 10 Transactions

SELECT id, transaction_id, created_at, crypto_atoms, base_amount, commission_amount
FROM satoshimachine.lamassu_transactions
ORDER BY created_at DESC
LIMIT 10;

Failed Payments

SELECT * FROM satoshimachine.dca_payments
WHERE status != 'completed'
ORDER BY created_at DESC;

Export Everything (Backup)

#!/bin/bash
# Emergency full backup
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="emergency_backup_${DATE}"
mkdir -p $BACKUP_DIR

sqlite3 -header -csv /path/to/database.sqlite \
  "SELECT * FROM satoshimachine.lamassu_transactions;" \
  > ${BACKUP_DIR}/lamassu_transactions.csv

sqlite3 -header -csv /path/to/database.sqlite \
  "SELECT * FROM satoshimachine.dca_payments;" \
  > ${BACKUP_DIR}/dca_payments.csv

sqlite3 -header -csv /path/to/database.sqlite \
  "SELECT * FROM satoshimachine.dca_deposits;" \
  > ${BACKUP_DIR}/dca_deposits.csv

sqlite3 -header -csv /path/to/database.sqlite \
  "SELECT * FROM satoshimachine.dca_clients;" \
  > ${BACKUP_DIR}/dca_clients.csv

echo "Backup complete in ${BACKUP_DIR}/"

Document Change Log

Version Date Author Changes
1.0 2025-10-19 Claude Code Initial emergency protocols document

Sign-Off

This document has been reviewed and approved for use in production emergency response:

System Administrator: _____________________ Date: _______

Technical Lead: _____________________ Date: _______

Business Owner: _____________________ Date: _______


END OF DOCUMENT

Keep this document accessible at all times. Print and store in emergency response binder. Review and update quarterly or after any major incident.