Skip to content

ESD Developer Guide

This documentation covers most essential and also good-to-haves setup instructions if you develop tools for ESD team and projects.

The keywords such as "must" or "should" are to be interpreted as described in RFC 2119.

Python and Environments

We mainly use Python 3.11+ versions and the tooling is continuously updated to allow for the newest updates. Do choose pyenv as your first environment management tool. We do not support anaconda. Install pyenv on your machine and create your project's virtual environment with:

pyenv install <python_version>
# to set 3.11 globally: pyenv global 3.11
pyenv virtualenv <python_version> <environment_name>
pyenv activate <environment_name>
pyenv deactivate
Feel free to refer to this guide for the extended setup, local vs global and managing multiple environments.

Tooling and Development

Most of the team works in Visual Studio Code editor. If that's your choice as well, consider adding following extensions: - autoDocstring for easier documentation writing. - black as a code formatter and isort for sorting imports.

Typing

Do use type hints and following styling: - Use normal rules for colons, that is, no space before and one space after a colon: text: str. - Use spaces around the = sign when combining an argument annotation with a default value: align: bool = True. - Use spaces around the -> arrow: def headline(...) -> str.

When it comes to annotations, i.e. Tuple[str, str], feel free to refer to the official documentation and this guide for all the variations.

UV (Primary Dependency Manager)

We use uv as our primary dependency management tool. UV is an extremely fast Python package installer and resolver, written in Rust. Install it following the official installation guide.

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Or with pip
pip install uv

Setting up access to the team Python registry

The custom solutions team manages a private Python registry of packages. See the Team Registry documentation for complete setup instructions.

Quick setup:

# Install keyring for authentication
uv tool install keyring --with keyrings.google-artifactregistry-auth

# Create uv config file
mkdir -p ~/.config/uv
cat > ~/.config/uv/uv.toml << 'EOF'
keyring-provider = "subprocess"

[[index]]
name = "cust"
url = "https://oauth2accesstoken@us-central1-python.pkg.dev/ds-esd-shared/python/simple"
EOF

Developing Python projects with UV

For existing projects with a pyproject.toml file:

# Sync dependencies (creates virtual environment automatically)
uv sync

# Add a new dependency
uv add <package>

# Add a dev dependency
uv add --dev <package>

# Run a command in the virtual environment
uv run python script.py
uv run pytest

For new projects:

# Initialize a new project
uv init my-project
cd my-project

# Add dependencies
uv add requests pandas

UV automatically creates and manages virtual environments in .venv within your project directory.

For more information, refer to the official uv documentation.

Poetry (Legacy)

Some existing projects still use poetry for dependency management. For these projects:

# Install dependencies
poetry install

# Add a dependency
poetry add <package>

New projects should use UV instead of Poetry.

Docker

The team uses Docker for building application images for deployment. You will want to install docker following the directions for your operating system from the Docker website.

Many of the images we utilize are store in private Google Cloud Platform Artifact registries. To configure Docker to authenticate with our private registry run, the following. The gcloud command is available from Google and also used for Google Big Query authentication.

gcloud auth configure-docker us-central1-docker.pkg.dev

GitHub Access

While calling ds-deploy commands or using any other tools you need to have your personal GITHUB_TOKEN assigned. Navigate to GitHub Settings -> "Personal access tokens" -> "Generate new token" with appropriate scopes. Make sure to export this variable to your environment .env or into your shell profile i.e. .zshrc. It should look like:

export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"

Testing

We use Python unittest standard library.

To run tests with UV:

uv run python -m unittest discover
uv run python -m unittest -k Test{package_name}

See more in the unittest documentation.

Linting

In case you choose to use a linter for your project, we recommend Ruff that supports pyproject.toml and is integrated with the Visual Studio Code.

Documentation

We love documentation and try our best to include it as much as we can.

Docstrings

Your code must include docstrings, be it for a class or function, formatted in the Google style. The autoDocstring VSCode extension mentioned above automatically completes these for you from the type annotations you provided.

MkDocs

We use MkDocs with the Material theme for writing our documentation in Markdown format (including the one you are reading right now). It is good practice to run the MkDocs site locally while making changes:

# Serve documentation locally with live reload
uv run mkdocs serve

Whenever the documentation is ready, build it with:

uv run mkdocs build

We use gcloud storage for storing and publishing our documentation. Request your access with the ESD Infrastructure team. Documentation is typically published automatically via GitHub Actions.

If your project/package documentation needs to be updated, consider adding the building+publishing sequence to the GitHub Actions workflow.

GitHub, Versioning and Commits

We often work together and use GitHub as our primary full development lifecycle tool.

Development convention

As a standard rule we follow: feature branch -> develop branch -> main. Make sure to checkout your feature branch from develop if one exists. When you are done with work, open the Pull Request using the GitHub UI. Provide all the necessary details for the easier review process. If a review is needed, mention reviewers too. After the feature branch merge is complete, the updated develop branch can be merged into main.

Commit Messages

Do use the conventional commits prefixes on top of the commit messages. We mostly use these prefixes:

ci: changes to our CI configuration files and scripts.
docs: documentation only changes.
feat: a new feature.
chore: routine tasks and maintenance.
fix: a bug fix.
refactor: code refactoring.
style: formatting (changes that do not affect the meaning of the code).
test: adding or modifying tests.
Each commit message must be prefixed with a type (e.g., feat, fix, etc.) followed by a scope (optional but recommended for better organization), and then a description. An example: fix(api): resolve issue with incorrect response format.

Prefixing commit messages also allows CI/CD pipeline to automatically publish new versions when these are merged to the main. Read more below.

Versioning

We use Commitizen as a release management tool. By default, commitizen uses conventional commits mentioned above and the semantic versioning major.minor.patch.

PATCH

You should increment the PATCH version when you make backward compatible bug fixes. fix: a commit of the type fix patches a bug in your codebase (this correlates with PATCH in Semantic Versioning)

The PATCH number is always reset to 0 after a minor version bump. If you fix a bug but in doing so have to rewrite other parts of the code or add additional features, this instead might warrant a bump to the MINOR version number.

MINOR

You should increment the MINOR version when you add functionality in a way that is backward compatible. feat: a commit of the type feat introduces a new functionality to the codebase (this correlates with MINOR in Semantic Versioning)

The MINOR number is reset to 0 with a major version bump.

MAJOR

You must increment the MAJOR version if you introduce any breaking changes that are not backward compatible. Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable. Version 1.0.0 defines the public API.

The best way to start working with Commitizen is by checking their official documentation and the ds-analyst-tools CI/CD pipeline (.github/workflows/) and configuration files.

Continuous Integration / Continuous Deployment

We love CI/CD. The best way to start with it is to copy the existing workflow files into your project or package. If your project is part of the digital-science organization the GitHub Actions runners have been already configured for you. Refer to the official documentation for any additional information.

When developing on ds-analyst-tools do check if any of your code, including the tests, uses environmental variables. Make sure to add any to the Variables List by navigating to the repository Settings -> Secrets and variables -> Actions and in the workflow files themselves with:

...
test:
    runs-on: ubuntu-latest
    steps:
        - name: Set environment variables
          run: export NEW_VARIABLE=${{ secrets.NEW_VARIABLE }}
...

In code it can look like:

import os
# api credentials are saved as a string of login:password
login, password = os.environ.get("{NEW_VARIABLE}").split(":")

Other tools

Docker

First get Docker.

Occasionally, you might need to build docker images and containers. Feel free to refer to the documentation for the general setup.

If you have a MacOS machine, make sure to build with --platform=linux/amd64 for our cloudrun services.

Adds-on

Install pre-commit and the projects pre-commit configuration pre-commit install. Pre-commit will run isort and black by default.