Data Layer

Managing data flow from BigQuery to your dashboard with DataManager, datasets, caching, and filtering.

DataManager Concepts

What is DataManager?

DataManager is the central hub for all data in your dashboard. It manages datasets, applies filters, validates parameters, and coordinates caching.

The DataManager Pattern:

Without DataManager:
  Callback 1 → BigQuery
  Callback 2 → BigQuery
  Callback 3 → BigQuery
  ❌ Duplicate queries, no caching, hard to maintain

With DataManager:
  Callback 1 ↘
  Callback 2 → DataManager → BigQuery ✓ (cached)
  Callback 3 ↗
  ✓ Single source of truth
  ✓ Automatic caching
  ✓ Easy to modify

Core Components

1. Pydantic Models (Parameter Validation)

Pydantic validates and converts dashboard parameters before they hit SQL:

from pydantic import computed_field
from ds_dash_support.utils.datamanager.pydantic import DashboardParamsBaseModel

class DashboardParamsModel(DashboardParamsBaseModel):
    # Type validation happens automatically
    min_year: int = 2010       # Must be integer
    customer_id: str | None = None  # String or None
    is_active: bool = True     # Must be boolean

    # Derived parameters
    @computed_field
    @property
    def year_range(self) -> str:
        """Combine min_year and max_year"""
        return f"{self.min_year}-2024"

2. Datasets

A dataset is a named definition of a data source with Pydantic placeholders:

from ds_dash_support.utils.datamanager.datasets import GbqPydanticDataset

annual_pubs = GbqPydanticDataset(
    name="annual_pubs",
    query=f"""
SELECT year, COUNT(*) as count
FROM `project.dataset.publications`
WHERE year >= {{min_year}}
    AND customer_id = '{{customer_id}}'
"""
)

3. DataManager

Central registry that orchestrates everything:

from ds_dash_support import DashDataManager
from data.datasets import annual_pubs, scale_activity
from data.models import DashboardParamsModel

ddm = DashDataManager(
    datasets=[annual_pubs, scale_activity],
    cache=get_cache(name="redis-dev"),
    pydantic_model=DashboardParamsModel,
)

Creating Datasets

Basic Dataset Creation

Step 1: Define Parameters in Pydantic Model

# data/models.py
from pydantic import computed_field
from ds_dash_support.utils.datamanager.pydantic import DashboardParamsBaseModel

class DashboardParamsModel(DashboardParamsBaseModel):
    """Parameters for your dashboard - validated by Pydantic"""
    min_year: int = 2010
    max_year: int = 2024
    customer_id: str | None = None

    @computed_field
    @property
    def year_range(self) -> str:
        """Derived parameter from base fields"""
        return f"{self.min_year}-{self.max_year}"

Step 2: Define the Dataset with SQL

# data/datasets.py
from ds_dash_support.utils.datamanager.datasets import GbqPydanticDataset

annual_pubs = GbqPydanticDataset(
    name="annual_pubs",
    query=f"""
SELECT
    year,
    COUNT(*) as publication_count,
    AVG(impact_factor) as avg_impact
FROM `{{{{ project }}}}.{{{{ dataset }}}}.publications`
WHERE 1=1
    {{"AND customer_id = '{customer_id}'" if customer_id else ""}}
    AND year >= {{min_year}}
    AND year <= {{max_year}}
GROUP BY year
ORDER BY year DESC
""",
    extra_cache_params=["customer_id"],
)

Step 3: Register with DataManager

# data/datamanager.py
from ds_dash_support import DashDataManager
from data.datasets import annual_pubs
from data.models import DashboardParamsModel

ddm = DashDataManager(
    datasets=[annual_pubs],
    cache=get_cache(name="redis-dev"),
    pydantic_model=DashboardParamsModel,
)

Step 4: Use in a Callback

# pages/my_page.py
from data.datamanager import ddm

data_callback = ddm.data_callback(
    flatten_kwargs=True,
    manager_filters=Input("dashboard-manager", "filters"),
)

@data_callback(
    Output("chart", "figure"),
)
def update_chart(data):
    df = data.datasets.annual_pubs
    return px.line(df, x='year', y='publication_count')

Caching Guide

Why Caching Matters

Every time a user clicks a filter, your dashboard queries BigQuery. Without caching, this means slow response times and high costs.

With caching:

First query: Query BigQuery (30 sec) → Store in cache → Return to user
User changes filter: Check cache → Maybe we have it cached!
  - If cached: Return instantly (< 1 sec) ✅
  - If not cached: Query BigQuery (30 sec) → Cache it → Return

Automatic Caching with DataManager

Our team uses cust-tools-dash-support, which includes a DataManager that automatically handles caching for all your datasets:

# In your globals.py or data/__init__.py
from ds_dash_support.utils import DataManager

# DataManager automatically handles caching with sensible defaults
# - Local development: In-memory cache (no setup needed)
# - Cloud Run (dev/staging/prod): Redis cache
# Default TTL: 24 hours (configurable if needed)
dataset_manager = DataManager(datasets=[...])

Caching Standalone Functions

For functions outside of DataManager, use the @cache.memoize() decorator:

from ds_app_cache import get_cache

cache = get_cache(name="my-dashboard", expire=86400)

@cache.memoize(expire=86400)  # Cache for 24 hours
def calculate_impact_factor(customer_id: str, year: int):
    """Custom calculation that takes time."""
    # ... complex calculation ...
    return result

Cache Best Practices

✅ DO: - Use 24-hour cache by default (data refreshes daily) - Clear cache after data updates - Let DataManager handle caching automatically

❌ DON'T: - Cache longer than your data refresh cycle - Manually manage cache unless absolutely necessary - Forget to clear cache after updates

Jinja2 SQL Templating

What is Jinja2?

Jinja2 is a template engine that lets you write dynamic SQL:

{# Static SQL (not useful) #}
SELECT * FROM publications WHERE year = 2024

{# Dynamic SQL with Jinja2 (powerful!) #}
SELECT * FROM publications WHERE year = {{ year }}

Basic Jinja2 Syntax

Variables: {{ variable_name }}

SELECT * FROM `{{ project }}.{{ dataset }}.table`

-- With: project="my-proj", dataset="data"
-- Becomes: SELECT * FROM `my-proj.data.table`

Conditionals: {% if condition %}

WHERE 1=1
    {% if customer %}
        AND customer_id = '{{ customer }}'
    {% endif %}
    {% if min_year %}
        AND year >= {{ min_year }}
    {% endif %}

Loops: {% for item in list %}

SELECT * FROM publications
WHERE category IN (
    {% for cat in categories %}
        '{{ cat }}'
        {%- if not loop.last %},{% endif %}
    {% endfor %}
)

-- With categories=['Oncology', 'Cardiology']
-- Becomes: WHERE category IN ('Oncology', 'Cardiology')

Common Patterns

Pattern 1: Optional WHERE Clauses

SELECT *
FROM publications
WHERE 1=1
    {% if customer_id %}
        AND customer_id = '{{ customer_id }}'
    {% endif %}
    {% if min_year %}
        AND year >= {{ min_year }}
    {% endif %}

Pattern 2: Dynamic Column Selection

SELECT
    id,
    title,
    {% if include_author %}
        author,
    {% endif %}
    {% if include_stats %}
        citation_count,
        impact_factor,
    {% endif %}
    created_date
FROM publications

Filtering System

DashboardFilter

The DashboardFilter class holds all filter values and applies them to datasets:

# Create a filter for customer selection
customer_filter = DashboardFilter(
    filter_id='customer',
    display_name='Customer',
    type='select',
    get_options_callback=lambda: dataset_manager.get_customers(),
    apply_to=['publications', 'authors']
)

Filter Types

Single Select (Dropdown)

filter_element = DsFilter(
    id='category-filter',
    label='Select Category',
    options=[
        {'label': 'Oncology', 'value': 'oncology'},
        {'label': 'Cardiology', 'value': 'cardiology'}
    ],
    value='oncology',
    multi=False
)

Multi-Select

category_filter = DashboardFilter(
    id='categories',
    display_name='Select Categories',
    type='multi_select',
    options=[...],
    default_value=['oncology']
)

Date Range

date_filter = DashboardFilter(
    id='date_range',
    display_name='Date Range',
    type='date_range',
    apply_to=['publications']
)

Optional Helper Modules

The project generated by ds-deploy includes optional helper modules in data/ for managing complex parameter validation and formatting. These are optional - simple dashboards may not need them.

When to Use Helper Modules

Simple Dashboard (No Helpers Needed) - 1-2 filters - Simple parameter → SQL mapping - No computed fields needed

Complex Dashboard (Use Helpers) - Multiple filters with interdependencies - Complex parameter transformations - Custom business logic - Data formatting specific to domain

Helper Module 1: param_parsers.py

Purpose: Convert raw input to structured format

# data/param_parsers.py
from datetime import datetime
from typing import Tuple

def parse_quarter(quarter_str: str) -> Tuple[str, str]:
    """
    Parse quarter string like 'Q1 2024' into date range.

    Args:
        quarter_str: Format "Q1 2024", "Q2 2024", etc

    Returns:
        (start_date, end_date) as "YYYY-MM-DD"
    """
    if not quarter_str:
        return None, None

    parts = quarter_str.split()
    quarter = int(parts[0][1])  # Q1 → 1
    year = int(parts[1])

    start_month = (quarter - 1) * 3 + 1
    end_month = start_month + 2

    start_date = f"{year}-{start_month:02d}-01"
    end_date = f"{year}-{end_month:02d}-28"

    return start_date, end_date

Helper Module 2: param_validators.py

Purpose: Custom validation logic beyond Pydantic

# data/param_validators.py
from pydantic import field_validator, ValidationError

class DashboardParamsModel(DashboardParamsBaseModel):
    min_year: int = 2010
    max_year: int = 2024

    @field_validator('max_year')
    @classmethod
    def max_year_after_min(cls, v, info):
        """Ensure max_year >= min_year"""
        if 'min_year' in info.data and v < info.data['min_year']:
            raise ValueError('max_year must be >= min_year')
        return v

Helper Module 3: query_formatters.py

Purpose: Format parameters for SQL

# data/query_formatters.py
from typing import List

def format_ids_for_sql(ids: List[str]) -> str:
    """
    Format list of IDs for BigQuery IN clause.

    Args:
        ids: ["id1", "id2", "id3"]

    Returns:
        SQL-safe string: "'id1','id2','id3'"
    """
    if not ids:
        return "''"

    # Escape quotes
    escaped = [id.replace("'", "''") for id in ids]
    return "'" + "','".join(escaped) + "'"