Foundations

Core architectural concepts and best practices for building maintainable dashboards.

Dashboard Architecture

Pipeline + Dashboard Pattern

Every dashboard follows this two-component architecture:

┌──────────────┐    ┌──────────────┐
│   Pipeline   │    │  Dashboard   │
│              │    │              │
│  1. Extract  │    │  1. Load     │
│  2. Clean    │───▶│  2. Filter   │
│  3. Store    │    │  3. Display  │
│              │    │  4. Interact │
└──────────────┘    └──────────────┘
  Batch (daily)      Real-time

Pipeline (scheduled): - Runs daily via Cloud Scheduler - Queries BigQuery / APIs - Cleans and structures data - Stores in production BigQuery dataset

Dashboard (on-demand): - User opens dashboard - Fetches pre-processed data - Applies filters and creates visualizations - Responds to user interactions

Three-Layer Pattern

Inside the dashboard, code is organized into three layers:

┌──────────────────────────────────────────────┐
│          3. Pages (UI Layer)                  │
│  • Visual Layout                             │
│  • Interactive Components                     │
│  • User Callbacks                            │
└──────────────────────────────────────────────┘
                    ↕ (callbacks)
┌──────────────────────────────────────────────┐
│       2. Data Layer (Business Logic)         │
│  • DataManager                               │
│  • Datasets (SQL queries)                     │
│  • Pydantic Models (validation)              │
│  • Caching                                   │
└──────────────────────────────────────────────┘
                    ↕ (queries)
┌──────────────────────────────────────────────┐
│         1. Data Sources (BigQuery)           │
│  • Pre-processed tables                      │
│  • Production datasets                        │
└──────────────────────────────────────────────┘

Benefits

✅ Separation of Concerns: - Pages handle UI and user interactions - Data layer handles business logic and validation - Sources handle persistence

✅ Reusability: - One dataset used by multiple pages - One component used across dashboards

✅ Testability: - Test data logic separately from UI - Mock data layer for page testing

✅ Maintainability: - Change SQL without touching UI - Update UI without touching data

Core Concepts

Technology Stack

Our team uses these technologies:

Plotly Dash: Python framework for building dashboards
BigQuery: Data warehouse for analytics
Google Cloud Run: Serverless hosting platform
Redis: Distributed caching layer
ds-deploy: Our internal tool for project setup and deployment

What is Dash?

Dash is a Python framework that lets you build interactive web apps using only Python:

from dash import Dash, html, Input, Output, callback

app = Dash(__name__)

app.layout = html.Div([
    html.H1("Hello World"),
    html.Button("Click Me", id="btn"),
    html.Div(id="output")
])

@callback(Output("output", "children"), Input("btn", "n_clicks"))
def update(n_clicks):
    return f"Clicked {n_clicks} times"

if __name__ == "__main__":
    app.run_server(debug=True)

Key Benefits: - ✅ Pure Python (no JavaScript needed) - ✅ Reactive (auto-updates when data changes) - ✅ Batteries included (components, layouts, callbacks)

What are Callbacks?

Callbacks are functions that automatically run when user input changes:

@callback(
    Output('output-div', 'children'),  # What to update
    Input('input-box', 'value')         # What triggers it
)
def update_output(input_value):
    return f'You typed: {input_value}'

Callback Flow:

User types in text box
      ↓
Callback triggered
      ↓
Function runs with new value
      ↓
Output component updated
      ↓
UI refreshes automatically

What is DataManager?

DataManager is our internal pattern for managing dashboard data:

from ds_dash_support import DashDataManager

ddm = DashDataManager(
    datasets=[publications, authors, journals],
    cache=get_cache(),
    pydantic_model=DashboardParamsModel,
)

DataManager handles: - ✅ Dataset registration - ✅ Parameter validation (via Pydantic) - ✅ Automatic caching - ✅ Filter coordination - ✅ Query optimization

Code Quality Standards

Type Hints

Always use type hints for function parameters and return types:

# ✅ Good
def get_publications(customer_id: str, year: int) -> pd.DataFrame:
    """Fetch publications for customer"""
    return query_bigquery(customer_id, year)

# ❌ Bad
def get_publications(customer_id, year):
    return query_bigquery(customer_id, year)

Docstrings

Every function needs a docstring explaining: - What it does - Parameters (type and description) - Return value (type and description)

def calculate_impact(citations: int, years_since_pub: int) -> float:
    """
    Calculate publication impact score.

    Args:
        citations: Number of citations
        years_since_pub: Years since publication date

    Returns:
        Impact score normalized to 0-100
    """
    if years_since_pub == 0:
        return 0.0
    return (citations / years_since_pub) * 10

Code Formatting

Use Black and Ruff for consistent formatting:

# Format code
black .

# Check for issues
ruff check .

# Fix automatically
ruff check --fix .

Our team standards: - Line length: 100 characters - Indentation: 4 spaces (no tabs) - Imports: Sorted alphabetically, grouped by type - Quotes: Double quotes for strings

Naming Conventions

Variables and Functions: snake_case

customer_id = "ABC123"
def get_publication_count(year: int) -> int:
    ...

Classes: PascalCase

class DataManager:
    ...

class DashboardParamsModel:
    ...

Constants: UPPER_SNAKE_CASE

MAX_RESULTS = 10000
DEFAULT_YEAR = 2024

Private: _leading_underscore

def _internal_helper():
    """Not part of public API"""
    ...

Error Handling

Always handle errors gracefully:

# ✅ Good: Specific error handling
try:
    data = query_bigquery(customer_id)
except TimeoutError:
    logger.warning("Query timed out, using cached data")
    data = get_cached_data()
except ValueError as e:
    logger.error(f"Invalid customer_id: {e}")
    raise
except Exception as e:
    logger.error(f"Unexpected error: {e}")
    return pd.DataFrame()  # Return empty data, don't crash

# ❌ Bad: Bare except
try:
    data = query_bigquery(customer_id)
except:
    data = None

Best Practices: - ✅ Catch specific exceptions - ✅ Log errors with context - ✅ Provide fallback behavior - ✅ Re-raise if can't handle - ❌ Never use bare except: - ❌ Don't silence errors