# BQL Balance Queries Implementation **Date**: November 10, 2025 **Status**: In Progress **Context**: Replace manual aggregation with Beancount Query Language (BQL) --- ## Problem Current `get_user_balance()` implementation: - **115 lines** of manual aggregation logic - Fetches **ALL** journal entries (inefficient) - Manual regex parsing of amounts - Manual looping through entries/postings - O(n) complexity for every balance query **Performance Impact**: - Every balance check fetches entire ledger - No database-level filtering - CPU-intensive parsing and aggregation - Scales poorly as ledger grows --- ## Solution: Use Beancount Query Language (BQL) Beancount has a built-in query language that can efficiently: - Filter accounts (regex patterns) - Sum positions (balances) - Exclude transactions by flag - Group and aggregate - All processing done by Beancount engine (optimized C code) --- ## BQL Query Design ### Query 1: Get User Balance (SATS + Fiat) ```sql SELECT account, sum(position) as balance WHERE account ~ ':User-{user_id[:8]}' AND (account ~ 'Payable' OR account ~ 'Receivable') AND flag != '!' GROUP BY account ``` **What this does**: - `account ~ ':User-abc12345'` - Match user's accounts (regex) - `account ~ 'Payable' OR account ~ 'Receivable'` - Only payable/receivable accounts - `flag != '!'` - Exclude pending transactions - `sum(position)` - Aggregate balances - `GROUP BY account` - Separate totals per account **Result Format** (from Fava API): ```json { "data": { "rows": [ ["Liabilities:Payable:User-abc12345", {"SATS": "150000", "EUR": "145.50"}], ["Assets:Receivable:User-abc12345", {"SATS": "50000", "EUR": "48.00"}] ], "types": [ {"name": "account", "type": "str"}, {"name": "balance", "type": "Position"} ] } } ``` ### Query 2: Get All User Balances (Admin View) ```sql SELECT account, sum(position) as balance WHERE (account ~ 'Payable:User-' OR account ~ 'Receivable:User-') AND flag != '!' GROUP BY account ``` **What this does**: - Match ALL user accounts (not just one user) - Aggregate balances per account - Extract user_id from account name in post-processing --- ## Implementation Plan ### Step 1: Add General BQL Query Method Add to `fava_client.py`: ```python async def query_bql(self, query_string: str) -> Dict[str, Any]: """ Execute arbitrary Beancount Query Language (BQL) query. Args: query_string: BQL query (e.g., "SELECT account, sum(position) WHERE ...") Returns: { "rows": [[col1, col2, ...], ...], "types": [{"name": "col1", "type": "str"}, ...], "column_names": ["col1", "col2", ...] } Example: result = await fava.query_bql("SELECT account, sum(position) WHERE account ~ 'User-abc'") for row in result["rows"]: account, balance = row print(f"{account}: {balance}") """ try: async with httpx.AsyncClient(timeout=self.timeout) as client: response = await client.get( f"{self.base_url}/query", params={"query_string": query_string} ) response.raise_for_status() result = response.json() # Fava returns: {"data": {"rows": [...], "types": [...]}} data = result.get("data", {}) rows = data.get("rows", []) types = data.get("types", []) column_names = [t.get("name") for t in types] return { "rows": rows, "types": types, "column_names": column_names } except httpx.HTTPStatusError as e: logger.error(f"BQL query error: {e.response.status_code} - {e.response.text}") logger.error(f"Query was: {query_string}") raise except httpx.RequestError as e: logger.error(f"Fava connection error: {e}") raise ``` ### Step 2: Implement BQL-Based Balance Query Add to `fava_client.py`: ```python async def get_user_balance_bql(self, user_id: str) -> Dict[str, Any]: """ Get user balance using BQL (efficient, ~10 lines vs 115 lines manual). Args: user_id: User ID Returns: { "balance": int (sats), "fiat_balances": {"EUR": Decimal("100.50")}, "accounts": [{"account": "...", "sats": 150000}] } """ # Build BQL query for this user's Payable/Receivable accounts user_id_prefix = user_id[:8] query = f""" SELECT account, sum(position) as balance WHERE account ~ ':User-{user_id_prefix}' AND (account ~ 'Payable' OR account ~ 'Receivable') AND flag != '!' GROUP BY account """ result = await self.query_bql(query) # Process results total_sats = 0 fiat_balances = {} accounts = [] for row in result["rows"]: account_name, position = row # Position is a dict like {"SATS": "150000", "EUR": "145.50"} # or a string for single-currency if isinstance(position, dict): # Extract SATS sats_str = position.get("SATS", "0") sats_amount = int(sats_str) if sats_str else 0 total_sats += sats_amount accounts.append({ "account": account_name, "sats": sats_amount }) # Extract fiat currencies for currency in ["EUR", "USD", "GBP"]: if currency in position: fiat_str = position[currency] fiat_amount = Decimal(fiat_str) if fiat_str else Decimal(0) if currency not in fiat_balances: fiat_balances[currency] = Decimal(0) fiat_balances[currency] += fiat_amount elif isinstance(position, str): # Single currency (parse "150000 SATS" or "145.50 EUR") import re sats_match = re.match(r'^(-?\d+)\s+SATS$', position) if sats_match: sats_amount = int(sats_match.group(1)) total_sats += sats_amount accounts.append({ "account": account_name, "sats": sats_amount }) else: fiat_match = re.match(r'^(-?[\d.]+)\s+([A-Z]{3})$', position) if fiat_match and fiat_match.group(2) in ('EUR', 'USD', 'GBP'): fiat_amount = Decimal(fiat_match.group(1)) currency = fiat_match.group(2) if currency not in fiat_balances: fiat_balances[currency] = Decimal(0) fiat_balances[currency] += fiat_amount logger.info(f"User {user_id[:8]} balance (BQL): {total_sats} sats, fiat: {dict(fiat_balances)}") return { "balance": total_sats, "fiat_balances": fiat_balances, "accounts": accounts } ``` ### Step 3: Implement BQL-Based All Users Balance ```python async def get_all_user_balances_bql(self) -> List[Dict[str, Any]]: """ Get balances for all users using BQL (efficient admin view). Returns: [ { "user_id": "abc123", "balance": 100000, "fiat_balances": {"EUR": Decimal("100.50")}, "accounts": [...] }, ... ] """ query = """ SELECT account, sum(position) as balance WHERE (account ~ 'Payable:User-' OR account ~ 'Receivable:User-') AND flag != '!' GROUP BY account """ result = await self.query_bql(query) # Group by user_id user_data = {} for row in result["rows"]: account_name, position = row # Extract user_id from account name # Format: "Liabilities:Payable:User-abc12345" or "Assets:Receivable:User-abc12345" if ":User-" not in account_name: continue user_id_with_prefix = account_name.split(":User-")[1] # User ID is the first 8 chars (our standard) user_id = user_id_with_prefix[:8] if user_id not in user_data: user_data[user_id] = { "user_id": user_id, "balance": 0, "fiat_balances": {}, "accounts": [] } # Process position (same logic as single-user query) if isinstance(position, dict): sats_str = position.get("SATS", "0") sats_amount = int(sats_str) if sats_str else 0 user_data[user_id]["balance"] += sats_amount user_data[user_id]["accounts"].append({ "account": account_name, "sats": sats_amount }) for currency in ["EUR", "USD", "GBP"]: if currency in position: fiat_str = position[currency] fiat_amount = Decimal(fiat_str) if fiat_str else Decimal(0) if currency not in user_data[user_id]["fiat_balances"]: user_data[user_id]["fiat_balances"][currency] = Decimal(0) user_data[user_id]["fiat_balances"][currency] += fiat_amount # (Handle string format similarly...) return list(user_data.values()) ``` --- ## Testing Strategy ### Unit Tests ```python # tests/test_fava_client_bql.py async def test_query_bql(): """Test general BQL query method.""" fava = get_fava_client() result = await fava.query_bql("SELECT account WHERE account ~ 'Assets'") assert "rows" in result assert "column_names" in result assert len(result["rows"]) > 0 async def test_get_user_balance_bql(): """Test BQL-based user balance query.""" fava = get_fava_client() balance = await fava.get_user_balance_bql("test_user_id") assert "balance" in balance assert "fiat_balances" in balance assert "accounts" in balance assert isinstance(balance["balance"], int) async def test_bql_matches_manual(): """Verify BQL results match manual aggregation (for migration).""" fava = get_fava_client() user_id = "test_user_id" # Get balance both ways bql_balance = await fava.get_user_balance_bql(user_id) manual_balance = await fava.get_user_balance(user_id) # Should match assert bql_balance["balance"] == manual_balance["balance"] assert bql_balance["fiat_balances"] == manual_balance["fiat_balances"] ``` ### Integration Tests ```python async def test_bql_performance(): """BQL should be significantly faster than manual aggregation.""" import time fava = get_fava_client() user_id = "test_user_id" # Time BQL approach start = time.time() bql_result = await fava.get_user_balance_bql(user_id) bql_time = time.time() - start # Time manual approach start = time.time() manual_result = await fava.get_user_balance(user_id) manual_time = time.time() - start logger.info(f"BQL: {bql_time:.3f}s, Manual: {manual_time:.3f}s") # BQL should be faster (or at least not slower) # With large ledgers, BQL should be 2-10x faster assert bql_time <= manual_time * 2 # Allow some variance ``` --- ## Migration Strategy ### Phase 1: Add BQL Methods (Non-Breaking) 1. Add `query_bql()` method 2. Add `get_user_balance_bql()` method 3. Add `get_all_user_balances_bql()` method 4. Keep existing methods unchanged **Benefit**: Can test BQL in parallel without breaking existing code. ### Phase 2: Switch to BQL (Breaking Change) 1. Rename old methods: - `get_user_balance()` → `get_user_balance_manual()` (deprecated) - `get_all_user_balances()` → `get_all_user_balances_manual()` (deprecated) 2. Rename new methods: - `get_user_balance_bql()` → `get_user_balance()` - `get_all_user_balances_bql()` → `get_all_user_balances()` 3. Update all call sites 4. Test thoroughly 5. Remove deprecated manual methods after 1-2 sprints --- ## Expected Performance Improvements ### Before (Manual Aggregation) ``` User balance query: - Fetch ALL entries: ~100-500ms (depends on ledger size) - Manual parsing: ~50-200ms (CPU-bound) - Total: 150-700ms ``` ### After (BQL) ``` User balance query: - BQL query (filtered at source): ~20-50ms - Minimal parsing: ~5-10ms - Total: 25-60ms Improvement: 5-10x faster ``` ### Scalability **Manual approach**: - O(n) where n = total number of entries - Gets slower as ledger grows - Fetches entire ledger every time **BQL approach**: - O(log n) with indexing (Beancount internal optimization) - Filtered at source (only user's accounts) - Constant time as ledger grows (for single user) --- ## Code Reduction - **Before**: `get_user_balance()` = 115 lines - **After**: `get_user_balance_bql()` = ~60 lines (with comments and error handling) - **Net reduction**: 55 lines (~48%) - **Before**: `get_all_user_balances()` = ~100 lines - **After**: `get_all_user_balances_bql()` = ~70 lines - **Net reduction**: 30 lines (~30%) **Total code reduction**: ~85 lines across balance query methods --- ## Risks and Mitigation ### Risk 1: BQL Query Syntax Errors **Mitigation**: - Test queries manually in Fava UI first - Add comprehensive error logging - Validate query results format ### Risk 2: Position Format Variations **Mitigation**: - Handle both dict and string position formats - Add fallback parsing - Log unexpected formats for investigation ### Risk 3: Regression in Balance Calculations **Mitigation**: - Run both methods in parallel during transition - Compare results and log discrepancies - Comprehensive test suite --- ## Test Results and Findings **Date**: November 10, 2025 **Status**: ⚠️ **NOT FEASIBLE for Castle's Current Data Structure** ### Implementation Completed 1. ✅ Analyze current implementation 2. ✅ Design BQL queries 3. ✅ Implement `query_bql()` method (fava_client.py:494-547) 4. ✅ Implement `get_user_balance_bql()` method (fava_client.py:549-644) 5. ✅ Implement `get_all_user_balances_bql()` method (fava_client.py:646-747) 6. ✅ Test against real data ### Test Results **✅ BQL query execution works perfectly:** - Successfully queries Fava's `/query` endpoint - Returns structured results (rows, types, column_names) - Can filter accounts by regex patterns - Can aggregate positions using `sum(position)` **❌ Cannot access SATS balances:** - BQL returns EUR/USD positions correctly - BQL **CANNOT** access posting metadata - SATS values stored in `posting.meta["sats-equivalent"]` - No BQL syntax to query metadata fields ### Root Cause: Architecture Limitation **Current Castle Ledger Structure:** ``` Posting format: Amount: -360.00 EUR ← Position (BQL can query this) Metadata: sats-equivalent: 337096 ← Metadata (BQL CANNOT query this) ``` **Test Data:** - User 375ec158 has 82 EUR postings - ALL postings have `sats-equivalent` metadata - ZERO postings have SATS as position amount - Manual method: -7,694,356 sats (from metadata) - BQL method: 0 sats (cannot access metadata) **BQL Limitation:** ```sql -- ✅ This works (queries position): SELECT account, sum(position) WHERE account ~ 'User-' -- ❌ This is NOT possible (metadata access): SELECT account, sum(meta["sats-equivalent"]) WHERE account ~ 'User-' ``` ### Why Manual Aggregation is Necessary 1. **SATS are Castle's primary currency** for balance tracking 2. **SATS values are in metadata**, not positions 3. **BQL has no metadata query capability** 4. **Must iterate through postings** to read `meta["sats-equivalent"]` ### Performance: Cache Optimization is the Solution **Phase 1 Caching (Already Implemented)** provides the performance boost: - ✅ Account lookups cached (5min TTL) - ✅ Permission lookups cached (1min TTL) - ✅ 60-80% reduction in DB queries - ✅ Addresses the actual bottleneck (database queries, not aggregation) **BQL would not improve performance** because: - Still need to fetch all postings to read metadata - Aggregation is not the bottleneck (it's fast) - Database queries are the bottleneck (solved by caching) --- ## Conclusion **Status**: ⚠️ **BQL Implementation Not Feasible** **Recommendation**: **Keep manual aggregation method with Phase 1 caching** **Rationale:** 1. ✅ Caching already provides 60-80% performance improvement 2. ✅ SATS metadata requires posting iteration regardless of query method 3. ✅ BQL cannot access the data we need (metadata) 4. ✅ Manual aggregation is well-tested and working correctly **BQL Methods Status**: - ✅ Implemented and committed as reference code - ⚠️ NOT used in production (cannot query SATS from metadata) - 📝 Kept for future consideration if ledger format changes --- ## Future Consideration: Ledger Format Change **If** Castle's ledger format changes to use SATS as position amounts: ```beancount ; Current format (EUR position, SATS in metadata): 2025-11-10 * "Groceries" Expenses:Food -360.00 EUR sats-equivalent: 337096 Liabilities:Payable:User-abc 360.00 EUR sats-equivalent: 337096 ; Hypothetical future format (SATS position, EUR as cost): 2025-11-10 * "Groceries" Expenses:Food -337096 SATS {360.00 EUR} Liabilities:Payable:User-abc 337096 SATS {360.00 EUR} ``` **Then** BQL would become feasible: ```sql -- Would work with SATS as position: SELECT account, sum(position) as balance WHERE account ~ 'User-' AND currency = 'SATS' ``` **Trade-offs of format change:** - ✅ Would enable BQL optimization - ✅ Aligns with "Bitcoin-first" philosophy - ⚠️ Requires ledger migration - ⚠️ Changes reporting currency (impacts existing workflows) - ⚠️ Beancount cost syntax has precision limitations **Recommendation**: Consider during major version upgrade or architectural redesign. --- ## Next Steps 1. ✅ Analyze current implementation 2. ✅ Design BQL queries 3. ✅ Implement `query_bql()` method 4. ✅ Implement `get_user_balance_bql()` method 5. ✅ Test against real data 6. ✅ Implement `get_all_user_balances_bql()` method 7. ✅ Document findings and limitations 8. ❌ Update call sites (NOT APPLICABLE - BQL not feasible) 9. ❌ Remove manual methods (NOT APPLICABLE - manual method is correct approach) --- **Implementation By**: Claude Code **Date**: November 10, 2025 **Status**: ✅ **Tested and Documented** | ⚠️ **Not Feasible for Production Use**