web-app/docs/WEBSOCKET-TROUBLESHOOTING.md
2025-10-20 06:48:21 +02:00

263 lines
No EOL
8.1 KiB
Markdown

# WebSocket Connection Issues - Troubleshooting Report
## Executive Summary
The wallet module's WebSocket connection for real-time balance updates fails to establish when connecting through certain network configurations. While a polling-based fallback was successfully implemented, the root cause of the WebSocket failure remains unresolved.
## Problem Description
### Symptoms
- WebSocket connection to `wss://lnbits.ario.pm/api/v1/ws/<wallet-id>` fails immediately
- Error message: `WebSocket connection failed`
- Connection attempts result in immediate closure
- Issue appears related to network path through WireGuard VPN and/or nginx proxy
### Current Configuration
#### Network Path
```
Client Browser → Internet → nginx (reverse proxy) → WireGuard VPN → LNbits Server
```
#### nginx Configuration
- Reverse proxy at `lnbits.ario.pm`
- Standard WebSocket proxy headers configured
- HTTPS/WSS termination at nginx level
#### LNbits Server
- Running behind WireGuard VPN
- WebSocket endpoint: `/api/v1/ws/<wallet-id>`
- Requires `X-Api-Key` header for authentication
## Root Cause Analysis
### Confirmed Working
- ✅ Standard HTTPS API calls work perfectly
- ✅ Authentication headers are properly passed
- ✅ LNbits server WebSocket endpoint is functional (works in direct connections)
- ✅ Polling fallback successfully retrieves balance updates
### Potential Causes
#### 1. **nginx WebSocket Proxy Configuration**
**Likelihood: HIGH**
Standard nginx configurations often miss critical WebSocket headers:
```nginx
# Required headers that might be missing
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket-specific timeout settings
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
```
**Solution**: Verify nginx configuration includes proper WebSocket upgrade headers and timeout settings.
#### 2. **WireGuard MTU Issues**
**Likelihood: MEDIUM**
WireGuard default MTU (1420) can cause packet fragmentation issues with WebSocket frames:
- WebSocket frames might exceed MTU after VPN encapsulation
- Fragmented packets may be dropped or delayed
**Solution**:
```bash
# In WireGuard config
[Interface]
MTU = 1380 # Reduced MTU to account for overhead
```
#### 3. **NAT/Connection Tracking**
**Likelihood: MEDIUM**
Long-lived WebSocket connections can be terminated by:
- NAT timeout settings
- Connection tracking table exhaustion
- Firewall state timeout
**Solution**:
- Increase NAT timeout values
- Enable WebSocket keepalive/ping frames
- Configure firewall to recognize WebSocket as persistent connection
#### 4. **HTTP/2 Incompatibility**
**Likelihood: MEDIUM**
WebSockets don't work over HTTP/2 connections:
- If nginx is configured for HTTP/2, WebSocket upgrade fails
- Need separate location block or HTTP/1.1 fallback
**Solution**:
```nginx
location /api/v1/ws {
proxy_http_version 1.1; # Force HTTP/1.1
# ... other WebSocket headers
}
```
#### 5. **Header Size/Authentication Issues**
**Likelihood: LOW**
Custom headers might be stripped or modified:
- `X-Api-Key` header might not survive proxy chain
- Header size limits in proxy configuration
**Solution**: Verify headers are properly forwarded through entire chain.
## Diagnostic Steps
### 1. Browser-Level Debugging
```javascript
// Test WebSocket connection directly
const ws = new WebSocket('wss://lnbits.ario.pm/api/v1/ws/wallet-id');
ws.onopen = () => console.log('Connected');
ws.onerror = (error) => console.error('Error:', error);
ws.onclose = (event) => {
console.log('Close code:', event.code);
console.log('Close reason:', event.reason);
console.log('Was clean:', event.wasClean);
};
```
### 2. Network Path Testing
```bash
# Test from different network locations
# 1. Direct to LNbits (bypassing nginx)
wscat -c ws://lnbits-server:5000/api/v1/ws/wallet-id -H "X-Api-Key: key"
# 2. Through nginx (bypassing WireGuard)
wscat -c wss://nginx-server/api/v1/ws/wallet-id -H "X-Api-Key: key"
# 3. Full path (through nginx and WireGuard)
wscat -c wss://lnbits.ario.pm/api/v1/ws/wallet-id -H "X-Api-Key: key"
```
### 3. nginx Logs Analysis
```bash
# Check nginx error logs
tail -f /var/log/nginx/error.log | grep -i websocket
# Enable debug logging for WebSocket
error_log /var/log/nginx/error.log debug;
```
### 4. WireGuard Diagnostics
```bash
# Check for packet drops
wg show
ip -s link show wg0
# Monitor MTU issues
tcpdump -i wg0 -n 'tcp[tcpflags] & (tcp-syn) != 0'
```
## Implemented Workaround
### Polling Fallback Mechanism
```typescript
// WalletWebSocketService.ts
class WalletWebSocketService extends BaseService {
private async startPolling() {
this.stopPolling()
const pollBalance = async () => {
if (!this.isActive) return
try {
const walletDetails = await this.walletAPI.getWalletDetails()
if (walletDetails && walletDetails.balance !== this.lastBalance) {
this.lastBalance = walletDetails.balance
this.store.updateBalance(walletDetails.balance / 1000)
this.emit('balance-updated', walletDetails.balance / 1000)
}
} catch (error) {
console.error('[WalletWebSocketService] Polling error:', error)
}
}
// Initial poll
await pollBalance()
// Set up recurring polls
this.pollInterval = setInterval(pollBalance, 5000) // Poll every 5 seconds
}
}
```
### Fallback Behavior
- Automatically activates when WebSocket connection fails
- Polls `/api/v1/wallets` endpoint every 5 seconds
- Updates balance only when changes detected
- Maintains same event emission pattern as WebSocket
## Recommended Solutions
### Priority 1: nginx Configuration Audit
1. Review nginx WebSocket proxy configuration
2. Add missing WebSocket headers
3. Ensure proper timeout settings
4. Test with HTTP/1.1 forced for WebSocket endpoints
### Priority 2: Network Path Optimization
1. Test WebSocket connection at each network hop
2. Adjust WireGuard MTU if fragmentation detected
3. Review firewall/NAT rules for long-lived connections
### Priority 3: Enhanced Diagnostics
1. Add WebSocket connection diagnostics endpoint
2. Implement client-side connection state reporting
3. Add server-side WebSocket connection logging
### Priority 4: Alternative Approaches
1. Consider Server-Sent Events (SSE) as alternative to WebSockets
2. Implement WebSocket connection through separate subdomain
3. Use WebSocket-specific reverse proxy (e.g., websockify)
## Testing Checklist
- [ ] Verify nginx configuration includes all WebSocket headers
- [ ] Test WebSocket connection from different network locations
- [ ] Check nginx error logs for WebSocket-specific errors
- [ ] Monitor WireGuard interface for packet drops
- [ ] Test with reduced MTU settings
- [ ] Verify authentication headers are properly forwarded
- [ ] Test with HTTP/1.1 forced for WebSocket location
- [ ] Check firewall/NAT timeout settings
- [ ] Test with browser developer tools WebSocket inspector
- [ ] Verify LNbits server WebSocket endpoint directly
## Future Improvements
### Short-term
1. Add connection retry logic with exponential backoff
2. Implement WebSocket heartbeat/ping mechanism
3. Add detailed connection state logging
4. Create health check endpoint for WebSocket connectivity
### Long-term
1. Implement connection quality monitoring
2. Add automatic fallback selection based on network conditions
3. Consider implementing WebRTC DataChannel as alternative
4. Evaluate HTTP/3 WebTransport when available
## References
- [nginx WebSocket Proxy Documentation](https://nginx.org/en/docs/http/websocket.html)
- [WireGuard MTU Considerations](https://www.wireguard.com/netns/#mtu-considerations)
- [WebSocket Protocol RFC 6455](https://datatracker.ietf.org/doc/html/rfc6455)
- [LNbits WebSocket API Documentation](https://github.com/lnbits/lnbits/blob/main/docs/guide/websockets.md)
## Status
**Current State**: Polling fallback operational, WebSocket root cause unresolved
**Last Updated**: 2025-09-20
**Next Steps**: nginx configuration audit planned