castle/docs/BQL-BALANCE-QUERIES.md
padreug 397b5e743e Move BQL documentation to Castle repo
Moved BQL-BALANCE-QUERIES.md from beancounter project to Castle docs/ folder.
Updated all code references from misc-docs/ to docs/ for proper documentation
location alongside implementation.

The document contains comprehensive BQL investigation results showing that
BQL is not feasible for Castle's current ledger format where SATS are stored
in posting metadata rather than position amounts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 23:41:48 +01:00

18 KiB

BQL Balance Queries Implementation

Date: November 10, 2025 Status: In Progress Context: Replace manual aggregation with Beancount Query Language (BQL)


Problem

Current get_user_balance() implementation:

  • 115 lines of manual aggregation logic
  • Fetches ALL journal entries (inefficient)
  • Manual regex parsing of amounts
  • Manual looping through entries/postings
  • O(n) complexity for every balance query

Performance Impact:

  • Every balance check fetches entire ledger
  • No database-level filtering
  • CPU-intensive parsing and aggregation
  • Scales poorly as ledger grows

Solution: Use Beancount Query Language (BQL)

Beancount has a built-in query language that can efficiently:

  • Filter accounts (regex patterns)
  • Sum positions (balances)
  • Exclude transactions by flag
  • Group and aggregate
  • All processing done by Beancount engine (optimized C code)

BQL Query Design

Query 1: Get User Balance (SATS + Fiat)

SELECT
    account,
    sum(position) as balance
WHERE
    account ~ ':User-{user_id[:8]}'
    AND (account ~ 'Payable' OR account ~ 'Receivable')
    AND flag != '!'
GROUP BY account

What this does:

  • account ~ ':User-abc12345' - Match user's accounts (regex)
  • account ~ 'Payable' OR account ~ 'Receivable' - Only payable/receivable accounts
  • flag != '!' - Exclude pending transactions
  • sum(position) - Aggregate balances
  • GROUP BY account - Separate totals per account

Result Format (from Fava API):

{
  "data": {
    "rows": [
      ["Liabilities:Payable:User-abc12345", {"SATS": "150000", "EUR": "145.50"}],
      ["Assets:Receivable:User-abc12345", {"SATS": "50000", "EUR": "48.00"}]
    ],
    "types": [
      {"name": "account", "type": "str"},
      {"name": "balance", "type": "Position"}
    ]
  }
}

Query 2: Get All User Balances (Admin View)

SELECT
    account,
    sum(position) as balance
WHERE
    (account ~ 'Payable:User-' OR account ~ 'Receivable:User-')
    AND flag != '!'
GROUP BY account

What this does:

  • Match ALL user accounts (not just one user)
  • Aggregate balances per account
  • Extract user_id from account name in post-processing

Implementation Plan

Step 1: Add General BQL Query Method

Add to fava_client.py:

async def query_bql(self, query_string: str) -> Dict[str, Any]:
    """
    Execute arbitrary Beancount Query Language (BQL) query.

    Args:
        query_string: BQL query (e.g., "SELECT account, sum(position) WHERE ...")

    Returns:
        {
            "rows": [[col1, col2, ...], ...],
            "types": [{"name": "col1", "type": "str"}, ...],
            "column_names": ["col1", "col2", ...]
        }

    Example:
        result = await fava.query_bql("SELECT account, sum(position) WHERE account ~ 'User-abc'")
        for row in result["rows"]:
            account, balance = row
            print(f"{account}: {balance}")
    """
    try:
        async with httpx.AsyncClient(timeout=self.timeout) as client:
            response = await client.get(
                f"{self.base_url}/query",
                params={"query_string": query_string}
            )
            response.raise_for_status()
            result = response.json()

            # Fava returns: {"data": {"rows": [...], "types": [...]}}
            data = result.get("data", {})
            rows = data.get("rows", [])
            types = data.get("types", [])
            column_names = [t.get("name") for t in types]

            return {
                "rows": rows,
                "types": types,
                "column_names": column_names
            }

    except httpx.HTTPStatusError as e:
        logger.error(f"BQL query error: {e.response.status_code} - {e.response.text}")
        logger.error(f"Query was: {query_string}")
        raise
    except httpx.RequestError as e:
        logger.error(f"Fava connection error: {e}")
        raise

Step 2: Implement BQL-Based Balance Query

Add to fava_client.py:

async def get_user_balance_bql(self, user_id: str) -> Dict[str, Any]:
    """
    Get user balance using BQL (efficient, ~10 lines vs 115 lines manual).

    Args:
        user_id: User ID

    Returns:
        {
            "balance": int (sats),
            "fiat_balances": {"EUR": Decimal("100.50")},
            "accounts": [{"account": "...", "sats": 150000}]
        }
    """
    # Build BQL query for this user's Payable/Receivable accounts
    user_id_prefix = user_id[:8]
    query = f"""
        SELECT account, sum(position) as balance
        WHERE account ~ ':User-{user_id_prefix}'
        AND (account ~ 'Payable' OR account ~ 'Receivable')
        AND flag != '!'
        GROUP BY account
    """

    result = await self.query_bql(query)

    # Process results
    total_sats = 0
    fiat_balances = {}
    accounts = []

    for row in result["rows"]:
        account_name, position = row

        # Position is a dict like {"SATS": "150000", "EUR": "145.50"}
        # or a string for single-currency

        if isinstance(position, dict):
            # Extract SATS
            sats_str = position.get("SATS", "0")
            sats_amount = int(sats_str) if sats_str else 0
            total_sats += sats_amount

            accounts.append({
                "account": account_name,
                "sats": sats_amount
            })

            # Extract fiat currencies
            for currency in ["EUR", "USD", "GBP"]:
                if currency in position:
                    fiat_str = position[currency]
                    fiat_amount = Decimal(fiat_str) if fiat_str else Decimal(0)

                    if currency not in fiat_balances:
                        fiat_balances[currency] = Decimal(0)
                    fiat_balances[currency] += fiat_amount

        elif isinstance(position, str):
            # Single currency (parse "150000 SATS" or "145.50 EUR")
            import re
            sats_match = re.match(r'^(-?\d+)\s+SATS$', position)
            if sats_match:
                sats_amount = int(sats_match.group(1))
                total_sats += sats_amount
                accounts.append({
                    "account": account_name,
                    "sats": sats_amount
                })
            else:
                fiat_match = re.match(r'^(-?[\d.]+)\s+([A-Z]{3})$', position)
                if fiat_match and fiat_match.group(2) in ('EUR', 'USD', 'GBP'):
                    fiat_amount = Decimal(fiat_match.group(1))
                    currency = fiat_match.group(2)

                    if currency not in fiat_balances:
                        fiat_balances[currency] = Decimal(0)
                    fiat_balances[currency] += fiat_amount

    logger.info(f"User {user_id[:8]} balance (BQL): {total_sats} sats, fiat: {dict(fiat_balances)}")

    return {
        "balance": total_sats,
        "fiat_balances": fiat_balances,
        "accounts": accounts
    }

Step 3: Implement BQL-Based All Users Balance

async def get_all_user_balances_bql(self) -> List[Dict[str, Any]]:
    """
    Get balances for all users using BQL (efficient admin view).

    Returns:
        [
            {
                "user_id": "abc123",
                "balance": 100000,
                "fiat_balances": {"EUR": Decimal("100.50")},
                "accounts": [...]
            },
            ...
        ]
    """
    query = """
        SELECT account, sum(position) as balance
        WHERE (account ~ 'Payable:User-' OR account ~ 'Receivable:User-')
        AND flag != '!'
        GROUP BY account
    """

    result = await self.query_bql(query)

    # Group by user_id
    user_data = {}

    for row in result["rows"]:
        account_name, position = row

        # Extract user_id from account name
        # Format: "Liabilities:Payable:User-abc12345" or "Assets:Receivable:User-abc12345"
        if ":User-" not in account_name:
            continue

        user_id_with_prefix = account_name.split(":User-")[1]
        # User ID is the first 8 chars (our standard)
        user_id = user_id_with_prefix[:8]

        if user_id not in user_data:
            user_data[user_id] = {
                "user_id": user_id,
                "balance": 0,
                "fiat_balances": {},
                "accounts": []
            }

        # Process position (same logic as single-user query)
        if isinstance(position, dict):
            sats_str = position.get("SATS", "0")
            sats_amount = int(sats_str) if sats_str else 0
            user_data[user_id]["balance"] += sats_amount

            user_data[user_id]["accounts"].append({
                "account": account_name,
                "sats": sats_amount
            })

            for currency in ["EUR", "USD", "GBP"]:
                if currency in position:
                    fiat_str = position[currency]
                    fiat_amount = Decimal(fiat_str) if fiat_str else Decimal(0)

                    if currency not in user_data[user_id]["fiat_balances"]:
                        user_data[user_id]["fiat_balances"][currency] = Decimal(0)
                    user_data[user_id]["fiat_balances"][currency] += fiat_amount

        # (Handle string format similarly...)

    return list(user_data.values())

Testing Strategy

Unit Tests

# tests/test_fava_client_bql.py

async def test_query_bql():
    """Test general BQL query method."""
    fava = get_fava_client()

    result = await fava.query_bql("SELECT account WHERE account ~ 'Assets'")

    assert "rows" in result
    assert "column_names" in result
    assert len(result["rows"]) > 0

async def test_get_user_balance_bql():
    """Test BQL-based user balance query."""
    fava = get_fava_client()

    balance = await fava.get_user_balance_bql("test_user_id")

    assert "balance" in balance
    assert "fiat_balances" in balance
    assert "accounts" in balance
    assert isinstance(balance["balance"], int)

async def test_bql_matches_manual():
    """Verify BQL results match manual aggregation (for migration)."""
    fava = get_fava_client()
    user_id = "test_user_id"

    # Get balance both ways
    bql_balance = await fava.get_user_balance_bql(user_id)
    manual_balance = await fava.get_user_balance(user_id)

    # Should match
    assert bql_balance["balance"] == manual_balance["balance"]
    assert bql_balance["fiat_balances"] == manual_balance["fiat_balances"]

Integration Tests

async def test_bql_performance():
    """BQL should be significantly faster than manual aggregation."""
    import time

    fava = get_fava_client()
    user_id = "test_user_id"

    # Time BQL approach
    start = time.time()
    bql_result = await fava.get_user_balance_bql(user_id)
    bql_time = time.time() - start

    # Time manual approach
    start = time.time()
    manual_result = await fava.get_user_balance(user_id)
    manual_time = time.time() - start

    logger.info(f"BQL: {bql_time:.3f}s, Manual: {manual_time:.3f}s")

    # BQL should be faster (or at least not slower)
    # With large ledgers, BQL should be 2-10x faster
    assert bql_time <= manual_time * 2  # Allow some variance

Migration Strategy

Phase 1: Add BQL Methods (Non-Breaking)

  1. Add query_bql() method
  2. Add get_user_balance_bql() method
  3. Add get_all_user_balances_bql() method
  4. Keep existing methods unchanged

Benefit: Can test BQL in parallel without breaking existing code.

Phase 2: Switch to BQL (Breaking Change)

  1. Rename old methods:

    • get_user_balance()get_user_balance_manual() (deprecated)
    • get_all_user_balances()get_all_user_balances_manual() (deprecated)
  2. Rename new methods:

    • get_user_balance_bql()get_user_balance()
    • get_all_user_balances_bql()get_all_user_balances()
  3. Update all call sites

  4. Test thoroughly

  5. Remove deprecated manual methods after 1-2 sprints


Expected Performance Improvements

Before (Manual Aggregation)

User balance query:
- Fetch ALL entries: ~100-500ms (depends on ledger size)
- Manual parsing: ~50-200ms (CPU-bound)
- Total: 150-700ms

After (BQL)

User balance query:
- BQL query (filtered at source): ~20-50ms
- Minimal parsing: ~5-10ms
- Total: 25-60ms

Improvement: 5-10x faster

Scalability

Manual approach:

  • O(n) where n = total number of entries
  • Gets slower as ledger grows
  • Fetches entire ledger every time

BQL approach:

  • O(log n) with indexing (Beancount internal optimization)
  • Filtered at source (only user's accounts)
  • Constant time as ledger grows (for single user)

Code Reduction

  • Before: get_user_balance() = 115 lines

  • After: get_user_balance_bql() = ~60 lines (with comments and error handling)

  • Net reduction: 55 lines (~48%)

  • Before: get_all_user_balances() = ~100 lines

  • After: get_all_user_balances_bql() = ~70 lines

  • Net reduction: 30 lines (~30%)

Total code reduction: ~85 lines across balance query methods


Risks and Mitigation

Risk 1: BQL Query Syntax Errors

Mitigation:

  • Test queries manually in Fava UI first
  • Add comprehensive error logging
  • Validate query results format

Risk 2: Position Format Variations

Mitigation:

  • Handle both dict and string position formats
  • Add fallback parsing
  • Log unexpected formats for investigation

Risk 3: Regression in Balance Calculations

Mitigation:

  • Run both methods in parallel during transition
  • Compare results and log discrepancies
  • Comprehensive test suite

Test Results and Findings

Date: November 10, 2025 Status: ⚠️ NOT FEASIBLE for Castle's Current Data Structure

Implementation Completed

  1. Analyze current implementation
  2. Design BQL queries
  3. Implement query_bql() method (fava_client.py:494-547)
  4. Implement get_user_balance_bql() method (fava_client.py:549-644)
  5. Implement get_all_user_balances_bql() method (fava_client.py:646-747)
  6. Test against real data

Test Results

BQL query execution works perfectly:

  • Successfully queries Fava's /query endpoint
  • Returns structured results (rows, types, column_names)
  • Can filter accounts by regex patterns
  • Can aggregate positions using sum(position)

Cannot access SATS balances:

  • BQL returns EUR/USD positions correctly
  • BQL CANNOT access posting metadata
  • SATS values stored in posting.meta["sats-equivalent"]
  • No BQL syntax to query metadata fields

Root Cause: Architecture Limitation

Current Castle Ledger Structure:

Posting format:
  Amount: -360.00 EUR          ← Position (BQL can query this)
  Metadata:
    sats-equivalent: 337096    ← Metadata (BQL CANNOT query this)

Test Data:

  • User 375ec158 has 82 EUR postings
  • ALL postings have sats-equivalent metadata
  • ZERO postings have SATS as position amount
  • Manual method: -7,694,356 sats (from metadata)
  • BQL method: 0 sats (cannot access metadata)

BQL Limitation:

-- ✅ This works (queries position):
SELECT account, sum(position) WHERE account ~ 'User-'

-- ❌ This is NOT possible (metadata access):
SELECT account, sum(meta["sats-equivalent"]) WHERE account ~ 'User-'

Why Manual Aggregation is Necessary

  1. SATS are Castle's primary currency for balance tracking
  2. SATS values are in metadata, not positions
  3. BQL has no metadata query capability
  4. Must iterate through postings to read meta["sats-equivalent"]

Performance: Cache Optimization is the Solution

Phase 1 Caching (Already Implemented) provides the performance boost:

  • Account lookups cached (5min TTL)
  • Permission lookups cached (1min TTL)
  • 60-80% reduction in DB queries
  • Addresses the actual bottleneck (database queries, not aggregation)

BQL would not improve performance because:

  • Still need to fetch all postings to read metadata
  • Aggregation is not the bottleneck (it's fast)
  • Database queries are the bottleneck (solved by caching)

Conclusion

Status: ⚠️ BQL Implementation Not Feasible

Recommendation: Keep manual aggregation method with Phase 1 caching

Rationale:

  1. Caching already provides 60-80% performance improvement
  2. SATS metadata requires posting iteration regardless of query method
  3. BQL cannot access the data we need (metadata)
  4. Manual aggregation is well-tested and working correctly

BQL Methods Status:

  • Implemented and committed as reference code
  • ⚠️ NOT used in production (cannot query SATS from metadata)
  • 📝 Kept for future consideration if ledger format changes

Future Consideration: Ledger Format Change

If Castle's ledger format changes to use SATS as position amounts:

; Current format (EUR position, SATS in metadata):
2025-11-10 * "Groceries"
  Expenses:Food                 -360.00 EUR
    sats-equivalent: 337096
  Liabilities:Payable:User-abc  360.00 EUR
    sats-equivalent: 337096

; Hypothetical future format (SATS position, EUR as cost):
2025-11-10 * "Groceries"
  Expenses:Food                 -337096 SATS {360.00 EUR}
  Liabilities:Payable:User-abc  337096 SATS {360.00 EUR}

Then BQL would become feasible:

-- Would work with SATS as position:
SELECT account, sum(position) as balance
WHERE account ~ 'User-' AND currency = 'SATS'

Trade-offs of format change:

  • Would enable BQL optimization
  • Aligns with "Bitcoin-first" philosophy
  • ⚠️ Requires ledger migration
  • ⚠️ Changes reporting currency (impacts existing workflows)
  • ⚠️ Beancount cost syntax has precision limitations

Recommendation: Consider during major version upgrade or architectural redesign.


Next Steps

  1. Analyze current implementation
  2. Design BQL queries
  3. Implement query_bql() method
  4. Implement get_user_balance_bql() method
  5. Test against real data
  6. Implement get_all_user_balances_bql() method
  7. Document findings and limitations
  8. Update call sites (NOT APPLICABLE - BQL not feasible)
  9. Remove manual methods (NOT APPLICABLE - manual method is correct approach)

Implementation By: Claude Code Date: November 10, 2025 Status: Tested and Documented | ⚠️ Not Feasible for Production Use