Foundations
Core architectural concepts and best practices for building maintainable dashboards.
Dashboard Architecture
Pipeline + Dashboard Pattern
Every dashboard follows this two-component architecture:
┌──────────────┐ ┌──────────────┐
│ Pipeline │ │ Dashboard │
│ │ │ │
│ 1. Extract │ │ 1. Load │
│ 2. Clean │───▶│ 2. Filter │
│ 3. Store │ │ 3. Display │
│ │ │ 4. Interact │
└──────────────┘ └──────────────┘
Batch (daily) Real-time
Pipeline (scheduled): - Runs daily via Cloud Scheduler - Queries BigQuery / APIs - Cleans and structures data - Stores in production BigQuery dataset
Dashboard (on-demand): - User opens dashboard - Fetches pre-processed data - Applies filters and creates visualizations - Responds to user interactions
Three-Layer Pattern
Inside the dashboard, code is organized into three layers:
┌──────────────────────────────────────────────┐
│ 3. Pages (UI Layer) │
│ • Visual Layout │
│ • Interactive Components │
│ • User Callbacks │
└──────────────────────────────────────────────┘
↕ (callbacks)
┌──────────────────────────────────────────────┐
│ 2. Data Layer (Business Logic) │
│ • DataManager │
│ • Datasets (SQL queries) │
│ • Pydantic Models (validation) │
│ • Caching │
└──────────────────────────────────────────────┘
↕ (queries)
┌──────────────────────────────────────────────┐
│ 1. Data Sources (BigQuery) │
│ • Pre-processed tables │
│ • Production datasets │
└──────────────────────────────────────────────┘
Benefits
✅ Separation of Concerns: - Pages handle UI and user interactions - Data layer handles business logic and validation - Sources handle persistence
✅ Reusability: - One dataset used by multiple pages - One component used across dashboards
✅ Testability: - Test data logic separately from UI - Mock data layer for page testing
✅ Maintainability: - Change SQL without touching UI - Update UI without touching data
Core Concepts
Technology Stack
Our team uses these technologies:
- Plotly Dash: Python framework for building dashboards
- BigQuery: Data warehouse for analytics
- Google Cloud Run: Serverless hosting platform
- Redis: Distributed caching layer
- ds-deploy: Our internal tool for project setup and deployment
What is Dash?
Dash is a Python framework that lets you build interactive web apps using only Python:
from dash import Dash, html, Input, Output, callback
app = Dash(__name__)
app.layout = html.Div([
html.H1("Hello World"),
html.Button("Click Me", id="btn"),
html.Div(id="output")
])
@callback(Output("output", "children"), Input("btn", "n_clicks"))
def update(n_clicks):
return f"Clicked {n_clicks} times"
if __name__ == "__main__":
app.run_server(debug=True)
Key Benefits: - ✅ Pure Python (no JavaScript needed) - ✅ Reactive (auto-updates when data changes) - ✅ Batteries included (components, layouts, callbacks)
What are Callbacks?
Callbacks are functions that automatically run when user input changes:
@callback(
Output('output-div', 'children'), # What to update
Input('input-box', 'value') # What triggers it
)
def update_output(input_value):
return f'You typed: {input_value}'
Callback Flow:
User types in text box
↓
Callback triggered
↓
Function runs with new value
↓
Output component updated
↓
UI refreshes automatically
What is DataManager?
DataManager is our internal pattern for managing dashboard data:
from ds_dash_support import DashDataManager
ddm = DashDataManager(
datasets=[publications, authors, journals],
cache=get_cache(),
pydantic_model=DashboardParamsModel,
)
DataManager handles: - ✅ Dataset registration - ✅ Parameter validation (via Pydantic) - ✅ Automatic caching - ✅ Filter coordination - ✅ Query optimization
Code Quality Standards
Type Hints
Always use type hints for function parameters and return types:
# ✅ Good
def get_publications(customer_id: str, year: int) -> pd.DataFrame:
"""Fetch publications for customer"""
return query_bigquery(customer_id, year)
# ❌ Bad
def get_publications(customer_id, year):
return query_bigquery(customer_id, year)
Docstrings
Every function needs a docstring explaining: - What it does - Parameters (type and description) - Return value (type and description)
def calculate_impact(citations: int, years_since_pub: int) -> float:
"""
Calculate publication impact score.
Args:
citations: Number of citations
years_since_pub: Years since publication date
Returns:
Impact score normalized to 0-100
"""
if years_since_pub == 0:
return 0.0
return (citations / years_since_pub) * 10
Code Formatting
Use Black and Ruff for consistent formatting:
Our team standards: - Line length: 100 characters - Indentation: 4 spaces (no tabs) - Imports: Sorted alphabetically, grouped by type - Quotes: Double quotes for strings
Naming Conventions
Variables and Functions: snake_case
Classes: PascalCase
Constants: UPPER_SNAKE_CASE
Private: _leading_underscore
Error Handling
Always handle errors gracefully:
# ✅ Good: Specific error handling
try:
data = query_bigquery(customer_id)
except TimeoutError:
logger.warning("Query timed out, using cached data")
data = get_cached_data()
except ValueError as e:
logger.error(f"Invalid customer_id: {e}")
raise
except Exception as e:
logger.error(f"Unexpected error: {e}")
return pd.DataFrame() # Return empty data, don't crash
# ❌ Bad: Bare except
try:
data = query_bigquery(customer_id)
except:
data = None
Best Practices:
- ✅ Catch specific exceptions
- ✅ Log errors with context
- ✅ Provide fallback behavior
- ✅ Re-raise if can't handle
- ❌ Never use bare except:
- ❌ Don't silence errors