buttermilk: opinionated data tools for HASS scholars

AI and data tools for HASS researchers, putting culture first.

Developed by and for @QUT-DMRC scholars, this repo aims to provide standard flows that help scholars make their own pipelines to collect data and use machine learning, generative AI, and computational techniques as part of rich analysis and experimentation that is driven by theory and deep understanding of cultural context. We try to:

Provide a set of research-backed analysis tools that help scholars bring cultural expertise to computational methods.
Help HASS scholars with easy, well-documented, and proven tools for data collection and analysis.
Make ScholarOps™ easier with opinionated defaults that take care of logging and archiving in standard formats.
Create a space for collaboration, experimentation, and evaluation of computational methods for HASS scholars.

Q: Why 'buttermilk'??
A: It's cultured and flows...

Q: What's MLOps?
A: A general term for standardised approaches to machine learning workflows that 
helps you organize your project, collaborate, iteratively improve your analysis 
and track versioned changes, monitor onging performance, reproduce experiments, 
and verify and compare results.

The "pipeline" we are building is documented and versioned. We're aiming to make it easy for HASS scholars to use AI tools in a way that is understandable, traceable, and reproducible.

📚 Documentation

→ Complete Documentation

Quick Start

Installation Guide - Set up your environment
Quick Start - Get running in 5 minutes
Your First Flow - Build a custom flow

User Guide

Running Flows - Complete flow execution guide
Configuration - Hydra configuration management
API Reference - REST API documentation
CLI Reference - Command-line interface

Core Concepts

Buttermilk is built around a few core concepts that help structure your research and data processing:

Flows: Complete research or data processing pipelines
Jobs: Basic units of work for processing individual records
Records: Immutable data structures with rich metadata
Agents: Specialized components for specific tasks (AI models, data collection)
Orchestrators: Coordinate and manage flow execution
Configuration (Hydra): Flexible, hierarchical configuration management

For detailed explanations, see Core Concepts.

Usage

Buttermilk provides several components and features to facilitate HASS research:

Currently available:

Multimodal support for current-generation foundation models (Gemini, Claude, Llama, GPT) and plug-in support for other analysis tool APIs.
A prompt templating system for evaluating, improving, and reusing prompt components.
Standard cloud logging, flexible data storage options, secure credential management (e.g., Azure KeyVault, Google Secrets), built-in database storage (e.g., BigQuery), and tracing capabilities (e.g., Promptflow, Langchain).
An API and CLI for integrating components and orchestrating complex workflows.
Support for running code locally, on remote GPUs, or in cloud compute environments (Azure/Google Compute, with AWS Lambda planned).

Future Development:

Tutorial workbooks demonstrating complete research pipeline examples.
A distributed queue system (e.g., pub/sub) for managing batch runs.
A web interface and example notebooks for assessing, tracking, and comparing performance.

Contributing and Current Status

Buttermilk is actively under development. We welcome contributions and feedback! If you're interested in getting involved, please contact nic to discuss ideas, planning, or how to contribute.

For Contributors

Contributing Guide - Development process and standards
Architecture Guide - System architecture and design
Creating Agents - Build custom agents
Testing Guide - Testing best practices

Contributing to Documentation

We warmly welcome contributions to improve Buttermilk's documentation! Clear, concise, and up-to-date documentation is crucial for helping HASS scholars and developers effectively use and contribute to the project.

Documentation Style

Docstrings (Python Code): Please follow the Google Python Style Guide for all docstrings within the Python code. This includes clear descriptions of modules, classes, functions, methods, arguments, and return values.
Markdown Files (e.g., README.md, docs/*.md): Aim for clarity, conciseness, and accuracy. Use standard Markdown formatting. Ensure that examples are easy to follow and reproduce.
General Principles:
- Write for the target audience (HASS scholars, developers).
- Be explicit and avoid jargon where possible, or explain it clearly.
- Keep documentation consistent with the current state of the codebase.

Keeping Documentation Up-to-Date

As features are added or modified, please ensure that corresponding documentation is also updated. This includes:

Updating module, class, and function docstrings.
Revising relevant sections in README.md or other documentation files in the docs/ directory.
Ensuring examples and command-line usage instructions are still accurate.

Process for Documentation Changes

Identify Areas for Improvement: This could be missing information, unclear explanations, outdated instructions, or typos.
Make Your Changes: Edit the relevant files. For new concepts or substantial additions, consider discussing them in an issue first.
Submit Changes: Documentation changes should be submitted via Pull Requests (PRs) to the main repository. Please clearly describe the documentation changes made in your PR description.

We appreciate your help in making Buttermilk more accessible and understandable!

Installation

Create a new environment and install using uv:

pip install uv
uv install

Authenticate to cloud providers, where your relevant secrets are stored.

GOOGLE_CLOUD_PROJECT=<project>
gcloud auth login --update-adc --enable-gdrive-access --project ${GOOGLE_CLOUD_PROJECT} --billing-project ${GOOGLE_CLOUD_PROJECT}
gcloud auth application-default set-quota-project ${GOOGLE_CLOUD_PROJECT}
gcloud config set project ${GOOGLE_CLOUD_PROJECT}

Configurations are stored as YAML files in conf/. You can select options at runtime using hydra.

Deployment

Production Deployment

For production deployments, Buttermilk supports automatic credential loading to avoid interactive login flows:

Weave/W&B: Set WANDB_API_KEY, WANDB_PROJECT, WANDB_ENTITY environment variables
Cloud Services: Use service account keys or workload identity
Secrets: Store credentials in Google Cloud Secret Manager or Azure Key Vault

See External Configuration Guide and Weave Credentials Guide for complete deployment documentation.

Container Example

docker run -e WANDB_API_KEY=$WANDB_API_KEY \
           -e WANDB_PROJECT=my-research \
           buttermilk:latest

Instructions for bots

Read CLAUDE.md

Name		Name	Last commit message	Last commit date
Latest commit History 1,721 Commits
.claude		.claude
.clinerules		.clinerules
.devcontainer		.devcontainer
.gemini		.gemini
.github		.github
.vscode		.vscode
buttermilk		buttermilk
conf		conf
deploy		deploy
docs		docs
schemas		schemas
scripts		scripts
templates		templates
tests		tests
.aiexclude		.aiexclude
.envrc		.envrc
.gitignore		.gitignore
.mcp.json		.mcp.json
.mcp.json.disabled		.mcp.json.disabled
CLAUDE.md		CLAUDE.md
GEMINI.md		GEMINI.md
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
test_clean_empty_values.py		test_clean_empty_values.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

buttermilk: opinionated data tools for HASS scholars

📚 Documentation

Quick Start

User Guide

Core Concepts

Usage

Contributing and Current Status

For Contributors

Contributing to Documentation

Documentation Style

Keeping Documentation Up-to-Date

Process for Documentation Changes

Installation

Deployment

Production Deployment

Container Example

Instructions for bots

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Uh oh!

Languages

qut-dmrc/buttermilk

Folders and files

Latest commit

History

Repository files navigation

buttermilk: opinionated data tools for HASS scholars

📚 Documentation

Quick Start

User Guide

Core Concepts

Usage

Contributing and Current Status

For Contributors

Contributing to Documentation

Documentation Style

Keeping Documentation Up-to-Date

Process for Documentation Changes

Installation

Deployment

Production Deployment

Container Example

Instructions for bots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Packages