ESD Developer Guide
This documentation covers most essential and also good-to-haves setup instructions if you develop tools for ESD team and projects.
The keywords such as "must" or "should" are to be interpreted as described in RFC 2119.
Python and Environments
We mainly use Python 3.11+ versions and the tooling is continuously updated to allow for the newest updates. Do choose pyenv as your first environment management tool. We do not support anaconda. Install pyenv on your machine and create your project's virtual environment with:
pyenv install <python_version>
# to set 3.11 globally: pyenv global 3.11
pyenv virtualenv <python_version> <environment_name>
pyenv activate <environment_name>
pyenv deactivate
Tooling and Development
Most of the team works in Visual Studio Code editor. If that's your choice as well, consider adding following extensions:
- autoDocstring for easier documentation writing.
- black as a code formatter and isort for sorting imports.
Typing
Do use type hints and following styling:
- Use normal rules for colons, that is, no space before and one space after a colon: text: str.
- Use spaces around the = sign when combining an argument annotation with a default value: align: bool = True.
- Use spaces around the -> arrow: def headline(...) -> str.
When it comes to annotations, i.e. Tuple[str, str], feel free to refer to the official documentation and this guide for all the variations.
UV (Primary Dependency Manager)
We use uv as our primary dependency management tool. UV is an extremely fast Python package installer and resolver, written in Rust. Install it following the official installation guide.
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# Or with pip
pip install uv
Setting up access to the team Python registry
The custom solutions team manages a private Python registry of packages. See the Team Registry documentation for complete setup instructions.
Quick setup:
# Install keyring for authentication
uv tool install keyring --with keyrings.google-artifactregistry-auth
# Create uv config file
mkdir -p ~/.config/uv
cat > ~/.config/uv/uv.toml << 'EOF'
keyring-provider = "subprocess"
[[index]]
name = "cust"
url = "https://oauth2accesstoken@us-central1-python.pkg.dev/ds-esd-shared/python/simple"
EOF
Developing Python projects with UV
For existing projects with a pyproject.toml file:
# Sync dependencies (creates virtual environment automatically)
uv sync
# Add a new dependency
uv add <package>
# Add a dev dependency
uv add --dev <package>
# Run a command in the virtual environment
uv run python script.py
uv run pytest
For new projects:
# Initialize a new project
uv init my-project
cd my-project
# Add dependencies
uv add requests pandas
UV automatically creates and manages virtual environments in .venv within your project directory.
For more information, refer to the official uv documentation.
Poetry (Legacy)
Some existing projects still use poetry for dependency management. For these projects:
New projects should use UV instead of Poetry.
Docker
The team uses Docker for building application images for deployment. You will want to install docker following the directions for your operating system from the Docker website.
Many of the images we utilize are store in private Google Cloud Platform Artifact registries. To configure Docker to authenticate with our private registry run, the following. The gcloud command is available from Google and also used for Google Big Query authentication.
GitHub Access
While calling ds-deploy commands or using any other tools you need to have your personal GITHUB_TOKEN assigned. Navigate to GitHub Settings -> "Personal access tokens" -> "Generate new token" with appropriate scopes. Make sure to export this variable to your environment .env or into your shell profile i.e. .zshrc. It should look like:
Testing
We use Python unittest standard library.
To run tests with UV:
See more in the unittest documentation.
Linting
In case you choose to use a linter for your project, we recommend Ruff that supports pyproject.toml and is integrated with the Visual Studio Code.
Documentation
We love documentation and try our best to include it as much as we can.
Docstrings
Your code must include docstrings, be it for a class or function, formatted in the Google style. The autoDocstring VSCode extension mentioned above automatically completes these for you from the type annotations you provided.
MkDocs
We use MkDocs with the Material theme for writing our documentation in Markdown format (including the one you are reading right now). It is good practice to run the MkDocs site locally while making changes:
Whenever the documentation is ready, build it with:
We use gcloud storage for storing and publishing our documentation. Request your access with the ESD Infrastructure team. Documentation is typically published automatically via GitHub Actions.
If your project/package documentation needs to be updated, consider adding the building+publishing sequence to the GitHub Actions workflow.
GitHub, Versioning and Commits
We often work together and use GitHub as our primary full development lifecycle tool.
Development convention
As a standard rule we follow: feature branch -> develop branch -> main.
Make sure to checkout your feature branch from develop if one exists. When you are done with work, open the Pull Request using the GitHub UI. Provide all the necessary details for the easier review process. If a review is needed, mention reviewers too. After the feature branch merge is complete, the updated develop branch can be merged into main.
Commit Messages
Do use the conventional commits prefixes on top of the commit messages. We mostly use these prefixes:
ci: changes to our CI configuration files and scripts.
docs: documentation only changes.
feat: a new feature.
chore: routine tasks and maintenance.
fix: a bug fix.
refactor: code refactoring.
style: formatting (changes that do not affect the meaning of the code).
test: adding or modifying tests.
fix(api): resolve issue with incorrect response format.
Prefixing commit messages also allows CI/CD pipeline to automatically publish new versions when these are merged to the main. Read more below.
Versioning
We use Commitizen as a release management tool. By default, commitizen uses conventional commits mentioned above and the semantic versioning major.minor.patch.
PATCH
You should increment the PATCH version when you make backward compatible bug fixes.
fix: a commit of the type fix patches a bug in your codebase (this correlates with PATCH in Semantic Versioning)
The PATCH number is always reset to 0 after a minor version bump. If you fix a bug but in doing so have to rewrite other parts of the code or add additional features, this instead might warrant a bump to the MINOR version number.
MINOR
You should increment the MINOR version when you add functionality in a way that is backward compatible.
feat: a commit of the type feat introduces a new functionality to the codebase (this correlates with MINOR in Semantic Versioning)
The MINOR number is reset to 0 with a major version bump.
MAJOR
You must increment the MAJOR version if you introduce any breaking changes that are not backward compatible. Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable. Version 1.0.0 defines the public API.
The best way to start working with Commitizen is by checking their official documentation and the ds-analyst-tools CI/CD pipeline (.github/workflows/) and configuration files.
Continuous Integration / Continuous Deployment
We love CI/CD. The best way to start with it is to copy the existing workflow files into your project or package. If your project is part of the digital-science organization the GitHub Actions runners have been already configured for you. Refer to the official documentation for any additional information.
When developing on ds-analyst-tools do check if any of your code, including the tests, uses environmental variables. Make sure to add any to the Variables List by navigating to the repository Settings -> Secrets and variables -> Actions and in the workflow files themselves with:
...
test:
runs-on: ubuntu-latest
steps:
- name: Set environment variables
run: export NEW_VARIABLE=${{ secrets.NEW_VARIABLE }}
...
In code it can look like:
import os
# api credentials are saved as a string of login:password
login, password = os.environ.get("{NEW_VARIABLE}").split(":")
Other tools
Docker
First get Docker.
Occasionally, you might need to build docker images and containers. Feel free to refer to the documentation for the general setup.
If you have a MacOS machine, make sure to build with --platform=linux/amd64 for our cloudrun services.
Adds-on
Install pre-commit and the projects pre-commit configuration pre-commit install. Pre-commit will run isort and black by default.