Data Layer
Managing data flow from BigQuery to your dashboard with DataManager, datasets, caching, and filtering.
DataManager Concepts
What is DataManager?
DataManager is the central hub for all data in your dashboard. It manages datasets, applies filters, validates parameters, and coordinates caching.
The DataManager Pattern:
Without DataManager:
Callback 1 → BigQuery
Callback 2 → BigQuery
Callback 3 → BigQuery
❌ Duplicate queries, no caching, hard to maintain
With DataManager:
Callback 1 ↘
Callback 2 → DataManager → BigQuery ✓ (cached)
Callback 3 ↗
✓ Single source of truth
✓ Automatic caching
✓ Easy to modify
Core Components
1. Pydantic Models (Parameter Validation)
Pydantic validates and converts dashboard parameters before they hit SQL:
from pydantic import computed_field
from ds_dash_support.utils.datamanager.pydantic import DashboardParamsBaseModel
class DashboardParamsModel(DashboardParamsBaseModel):
# Type validation happens automatically
min_year: int = 2010 # Must be integer
customer_id: str | None = None # String or None
is_active: bool = True # Must be boolean
# Derived parameters
@computed_field
@property
def year_range(self) -> str:
"""Combine min_year and max_year"""
return f"{self.min_year}-2024"
2. Datasets
A dataset is a named definition of a data source with Pydantic placeholders:
from ds_dash_support.utils.datamanager.datasets import GbqPydanticDataset
annual_pubs = GbqPydanticDataset(
name="annual_pubs",
query=f"""
SELECT year, COUNT(*) as count
FROM `project.dataset.publications`
WHERE year >= {{min_year}}
AND customer_id = '{{customer_id}}'
"""
)
3. DataManager
Central registry that orchestrates everything:
from ds_dash_support import DashDataManager
from data.datasets import annual_pubs, scale_activity
from data.models import DashboardParamsModel
ddm = DashDataManager(
datasets=[annual_pubs, scale_activity],
cache=get_cache(name="redis-dev"),
pydantic_model=DashboardParamsModel,
)
Creating Datasets
Basic Dataset Creation
Step 1: Define Parameters in Pydantic Model
# data/models.py
from pydantic import computed_field
from ds_dash_support.utils.datamanager.pydantic import DashboardParamsBaseModel
class DashboardParamsModel(DashboardParamsBaseModel):
"""Parameters for your dashboard - validated by Pydantic"""
min_year: int = 2010
max_year: int = 2024
customer_id: str | None = None
@computed_field
@property
def year_range(self) -> str:
"""Derived parameter from base fields"""
return f"{self.min_year}-{self.max_year}"
Step 2: Define the Dataset with SQL
# data/datasets.py
from ds_dash_support.utils.datamanager.datasets import GbqPydanticDataset
annual_pubs = GbqPydanticDataset(
name="annual_pubs",
query=f"""
SELECT
year,
COUNT(*) as publication_count,
AVG(impact_factor) as avg_impact
FROM `{{{{ project }}}}.{{{{ dataset }}}}.publications`
WHERE 1=1
{{"AND customer_id = '{customer_id}'" if customer_id else ""}}
AND year >= {{min_year}}
AND year <= {{max_year}}
GROUP BY year
ORDER BY year DESC
""",
extra_cache_params=["customer_id"],
)
Step 3: Register with DataManager
# data/datamanager.py
from ds_dash_support import DashDataManager
from data.datasets import annual_pubs
from data.models import DashboardParamsModel
ddm = DashDataManager(
datasets=[annual_pubs],
cache=get_cache(name="redis-dev"),
pydantic_model=DashboardParamsModel,
)
Step 4: Use in a Callback
# pages/my_page.py
from data.datamanager import ddm
data_callback = ddm.data_callback(
flatten_kwargs=True,
manager_filters=Input("dashboard-manager", "filters"),
)
@data_callback(
Output("chart", "figure"),
)
def update_chart(data):
df = data.datasets.annual_pubs
return px.line(df, x='year', y='publication_count')
Caching Guide
Why Caching Matters
Every time a user clicks a filter, your dashboard queries BigQuery. Without caching, this means slow response times and high costs.
With caching:
First query: Query BigQuery (30 sec) → Store in cache → Return to user
User changes filter: Check cache → Maybe we have it cached!
- If cached: Return instantly (< 1 sec) ✅
- If not cached: Query BigQuery (30 sec) → Cache it → Return
Automatic Caching with DataManager
Our team uses cust-tools-dash-support, which includes a DataManager that automatically handles caching for all your datasets:
# In your globals.py or data/__init__.py
from ds_dash_support.utils import DataManager
# DataManager automatically handles caching with sensible defaults
# - Local development: In-memory cache (no setup needed)
# - Cloud Run (dev/staging/prod): Redis cache
# Default TTL: 24 hours (configurable if needed)
dataset_manager = DataManager(datasets=[...])
Caching Standalone Functions
For functions outside of DataManager, use the @cache.memoize() decorator:
from ds_app_cache import get_cache
cache = get_cache(name="my-dashboard", expire=86400)
@cache.memoize(expire=86400) # Cache for 24 hours
def calculate_impact_factor(customer_id: str, year: int):
"""Custom calculation that takes time."""
# ... complex calculation ...
return result
Cache Best Practices
✅ DO: - Use 24-hour cache by default (data refreshes daily) - Clear cache after data updates - Let DataManager handle caching automatically
❌ DON'T: - Cache longer than your data refresh cycle - Manually manage cache unless absolutely necessary - Forget to clear cache after updates
Jinja2 SQL Templating
What is Jinja2?
Jinja2 is a template engine that lets you write dynamic SQL:
{# Static SQL (not useful) #}
SELECT * FROM publications WHERE year = 2024
{# Dynamic SQL with Jinja2 (powerful!) #}
SELECT * FROM publications WHERE year = {{ year }}
Basic Jinja2 Syntax
Variables: {{ variable_name }}
SELECT * FROM `{{ project }}.{{ dataset }}.table`
-- With: project="my-proj", dataset="data"
-- Becomes: SELECT * FROM `my-proj.data.table`
Conditionals: {% if condition %}
WHERE 1=1
{% if customer %}
AND customer_id = '{{ customer }}'
{% endif %}
{% if min_year %}
AND year >= {{ min_year }}
{% endif %}
Loops: {% for item in list %}
SELECT * FROM publications
WHERE category IN (
{% for cat in categories %}
'{{ cat }}'
{%- if not loop.last %},{% endif %}
{% endfor %}
)
-- With categories=['Oncology', 'Cardiology']
-- Becomes: WHERE category IN ('Oncology', 'Cardiology')
Common Patterns
Pattern 1: Optional WHERE Clauses
SELECT *
FROM publications
WHERE 1=1
{% if customer_id %}
AND customer_id = '{{ customer_id }}'
{% endif %}
{% if min_year %}
AND year >= {{ min_year }}
{% endif %}
Pattern 2: Dynamic Column Selection
SELECT
id,
title,
{% if include_author %}
author,
{% endif %}
{% if include_stats %}
citation_count,
impact_factor,
{% endif %}
created_date
FROM publications
Filtering System
DashboardFilter
The DashboardFilter class holds all filter values and applies them to datasets:
# Create a filter for customer selection
customer_filter = DashboardFilter(
filter_id='customer',
display_name='Customer',
type='select',
get_options_callback=lambda: dataset_manager.get_customers(),
apply_to=['publications', 'authors']
)
Filter Types
Single Select (Dropdown)
filter_element = DsFilter(
id='category-filter',
label='Select Category',
options=[
{'label': 'Oncology', 'value': 'oncology'},
{'label': 'Cardiology', 'value': 'cardiology'}
],
value='oncology',
multi=False
)
Multi-Select
category_filter = DashboardFilter(
id='categories',
display_name='Select Categories',
type='multi_select',
options=[...],
default_value=['oncology']
)
Date Range
date_filter = DashboardFilter(
id='date_range',
display_name='Date Range',
type='date_range',
apply_to=['publications']
)
Optional Helper Modules
The project generated by ds-deploy includes optional helper modules in data/ for managing complex parameter validation and formatting. These are optional - simple dashboards may not need them.
When to Use Helper Modules
Simple Dashboard (No Helpers Needed) - 1-2 filters - Simple parameter → SQL mapping - No computed fields needed
Complex Dashboard (Use Helpers) - Multiple filters with interdependencies - Complex parameter transformations - Custom business logic - Data formatting specific to domain
Helper Module 1: param_parsers.py
Purpose: Convert raw input to structured format
# data/param_parsers.py
from datetime import datetime
from typing import Tuple
def parse_quarter(quarter_str: str) -> Tuple[str, str]:
"""
Parse quarter string like 'Q1 2024' into date range.
Args:
quarter_str: Format "Q1 2024", "Q2 2024", etc
Returns:
(start_date, end_date) as "YYYY-MM-DD"
"""
if not quarter_str:
return None, None
parts = quarter_str.split()
quarter = int(parts[0][1]) # Q1 → 1
year = int(parts[1])
start_month = (quarter - 1) * 3 + 1
end_month = start_month + 2
start_date = f"{year}-{start_month:02d}-01"
end_date = f"{year}-{end_month:02d}-28"
return start_date, end_date
Helper Module 2: param_validators.py
Purpose: Custom validation logic beyond Pydantic
# data/param_validators.py
from pydantic import field_validator, ValidationError
class DashboardParamsModel(DashboardParamsBaseModel):
min_year: int = 2010
max_year: int = 2024
@field_validator('max_year')
@classmethod
def max_year_after_min(cls, v, info):
"""Ensure max_year >= min_year"""
if 'min_year' in info.data and v < info.data['min_year']:
raise ValueError('max_year must be >= min_year')
return v
Helper Module 3: query_formatters.py
Purpose: Format parameters for SQL
# data/query_formatters.py
from typing import List
def format_ids_for_sql(ids: List[str]) -> str:
"""
Format list of IDs for BigQuery IN clause.
Args:
ids: ["id1", "id2", "id3"]
Returns:
SQL-safe string: "'id1','id2','id3'"
"""
if not ids:
return "''"
# Escape quotes
escaped = [id.replace("'", "''") for id in ids]
return "'" + "','".join(escaped) + "'"