Skip to content

Commit 1cb77f0

Browse files
committed
Updated README with sample code
1 parent 05670b7 commit 1cb77f0

File tree

1 file changed

+81
-55
lines changed

1 file changed

+81
-55
lines changed

README.md

Lines changed: 81 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -1,102 +1,128 @@
1-
<p align="center">
1+
<!-- <p align="center">
22
<img src='logo.png' width='200'>
3-
</p>
3+
</p> -->
44

5-
# arxiv2025_repa_prm
5+
# SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling
66
[![Arxiv](https://img.shields.io/badge/Arxiv-YYMM.NNNNN-red?style=flat-square&logo=arxiv&logoColor=white)](https://put-here-your-paper.com)
77
[![Hugging Face Model](https://img.shields.io/badge/HuggingFace-Model-yellow)](https://huggingface.co/UKPLab/Llama-3-8b-spare-prm-math)
88
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0)
99
[![Python Versions](https://img.shields.io/badge/Python-3.10-blue.svg?style=flat&logo=python&logoColor=white)](https://www.python.org/)
1010
[![CI](https://github.com/UKPLab/arxiv2025-repa-prm/actions/workflows/main.yml/badge.svg)](https://github.com/UKPLab/arxiv2025-repa-prm/actions/workflows/main.yml)
1111

12-
This is the official template for new Python projects at UKP Lab. It was adapted for the needs of UKP Lab from the excellent [python-project-template](https://github.com/rochacbruno/python-project-template/) by [rochacbruno](https://github.com/rochacbruno).
12+
## Description:
1313

14-
It should help you start your project and give you continuous status updates on the development through [GitHub Actions](https://docs.github.com/en/actions).
14+
This repository includes the training, inference and evaluation code used in our Arxiv 2025 paper - [SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling]().
1515

16-
> **Abstract:** Process or step-wise supervision has played a crucial role in advancing complex multi-step reasoning capabilities of Large Language Models (LLMs). However, efficient, high-quality automated process annotation remains a significant challenge. To address this, we introduce **S**ingle-**P**ass **A**nnotation with **R**eference-Guided **E**valuation (**SPARE**), a novel structured framework that enables single-pass, per-step annotation by aligning each solution step to one or multiple steps in a reference solution, accompanied by explicit reasoning for evaluation. We show that reference-guided step-level evaluation effectively facilitates process supervision on four datasets spanning three domains: mathematical reasoning, multi-hop compositional question answering, and spatial reasoning. We demonstrate that *SPARE*, when compared to baselines, improves reasoning performance when used for: (1) fine-tuning models in an offline RL setup for inference-time greedy-decoding, and (2) training reward models for ranking/aggregating multiple LLM-generated outputs. Additionally, *SPARE* achieves competitive performance on challenging mathematical datasets while offering 2.6 times greater efficiency, requiring only 38% of the runtime, compared to tree search-based automatic annotation.
16+
<!-- This is the official template for new Python projects at UKP Lab. It was adapted for the needs of UKP Lab from the excellent [python-project-template](https://github.com/rochacbruno/python-project-template/) by [rochacbruno](https://github.com/rochacbruno).
1717
18-
Contact person: [Md Imbesat Hassan Rizvi](mailto:[email protected])
18+
It should help you start your project and give you continuous status updates on the development through [GitHub Actions](https://docs.github.com/en/actions). -->
1919

20-
[UKP Lab](https://www.ukp.tu-darmstadt.de/) | [TU Darmstadt](https://www.tu-darmstadt.de/
21-
)
20+
> **Abstract:** Process or step-wise supervision has played a crucial role in advancing complex multi-step reasoning capabilities of Large Language Models (LLMs). However, efficient, high-quality automated process annotation remains a significant challenge. To address this, we introduce **S**ingle-**P**ass **A**nnotation with **R**eference-Guided **E**valuation (**SPARE**), a novel structured framework that enables single-pass, per-step annotation by aligning each solution step to one or multiple steps in a reference solution, accompanied by explicit reasoning for evaluation. We show that reference-guided step-level evaluation effectively facilitates process supervision on four datasets spanning three domains: mathematical reasoning, multi-hop compositional question answering, and spatial reasoning. We demonstrate that *SPARE*, when compared to baselines, improves reasoning performance when used for: (1) fine-tuning models in an offline RL setup for inference-time greedy-decoding, and (2) training reward models for ranking/aggregating multiple LLM-generated outputs. Additionally, *SPARE* achieves competitive performance on challenging mathematical datasets while offering 2.6 times greater efficiency, requiring only 38% of the runtime, compared to tree search-based automatic annotation.
2221

23-
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
22+
## Installation
2423

24+
Create a `conda` / `mamba` / `venv` virtual environment and install the dependencies in `requirements.txt`. E.g.:
2525

26-
## Getting Started
26+
```bash
27+
mamba create -n spare
28+
mamba activate spare
29+
pip install -r requirements.txt
30+
```
2731

28-
> **DO NOT CLONE OR FORK**
32+
## Running the experiments
2933

30-
If you want to set up this template:
34+
The parameters of the experiments are specified in their respecive `config` files:
3135

32-
1. Request a repository on UKP Lab's GitHub by following the standard procedure on the wiki. It will install the template directly. Alternatively, set it up in your personal GitHub account by clicking **[Use this template](https://github.com/rochacbruno/python-project-template/generate)**.
33-
2. Wait until the first run of CI finishes. Github Actions will commit to your new repo with a "✅ Ready to clone and code" message.
34-
3. Delete optional files:
35-
- If you don't need automatic documentation generation, you can delete folder `docs`, file `.github\workflows\docs.yml` and `mkdocs.yml`
36-
- If you don't want automatic testing, you can delete folder `tests` and file `.github\workflows\tests.yml`
37-
- If you do not wish to have a project page, delete folder `static` and files `.nojekyll`, `index.html`
38-
4. Prepare a virtual environment:
3936
```bash
40-
python -m venv .venv
41-
source .venv/bin/activate
42-
pip install .
43-
pip install -r requirements-dev.txt # Only needed for development
37+
config/
38+
├── eval-config.yaml
39+
├── infer-config.yaml
40+
├── infer-rm-config.yaml
41+
├── private-config.yaml
42+
├── train-po-config.yaml
43+
├── train-sft-config.yaml
44+
└── train-tc-rm-config.yaml
4445
```
45-
5. Adapt anything else (for example this file) to your project.
4646

47-
6. Read the file [ABOUT_THIS_TEMPLATE.md](ABOUT_THIS_TEMPLATE.md) for more information about development.
47+
The private api keys such as for using OpenAI models or logging through Neptune API can be provided in the `private-config.yaml` file.
4848

49-
## Usage
49+
To run a desired task e.g. token classification based reward model (`tc-rm`), execute the following command:
5050

51-
### Using the classes
51+
```bash
52+
python train_rm.py # to use the default location of the train-tc-rm-config
53+
# OR alternatively
54+
python train_rm.py --config my-train-tc-rm-config.yaml
55+
```
5256

53-
To import classes/methods of `arxiv2025_repa_prm` from inside the package itself you can use relative imports:
57+
A trained SPARE-PRM model based on Llama-3-8b is provided for direct-use at [![Hugging Face Model](https://img.shields.io/badge/HuggingFace-Model-yellow)](https://huggingface.co/UKPLab/Llama-3-8b-spare-prm-math). A sample code to use it is given below:
5458

55-
```py
56-
from .base import BaseClass # Notice how I omit the package name
59+
```python
60+
from transformers import AutoTokenizer
61+
from transformers import AutoModelForCausalLM
62+
import torch
5763

58-
BaseClass().something()
59-
```
64+
incorrect_token = "-"
65+
correct_token = "+"
66+
step_tag = " ки" # space in the beginning required for correct Llama tokenization
6067

61-
To import classes/methods from outside the package (e.g. when you want to use the package in some other project) you can instead refer to the package name:
68+
tokenizer = AutoTokenizer.from_pretrained("UKPLab/Llama-3-8b-spare-prm-math")
6269

63-
```py
64-
from arxiv2025_repa_prm import BaseClass # Notice how I omit the file name
65-
from arxiv2025_repa_prm.subpackage import SubPackageClass # Here it's necessary because it's a subpackage
70+
step_target_ids = tokenizer.convert_tokens_to_ids([incorrect_token, correct_token])
71+
step_tag_id = tokenizer.encode(step_tag)[-1]
6672

67-
BaseClass().something()
68-
SubPackageClass().something()
69-
```
73+
device = "cuda:0"
74+
model = AutoModelForCausalLM.from_pretrained("UKPLab/Llama-3-8b-spare-prm-math").to(device).eval()
7075

71-
### Using scripts
76+
# include this instruction as it was left as it is during the PRM training.
77+
instruction = "You are an expert at solving challenging math problems spanning across various categories and difficulties such as Algebra, Number Theory, Geometry, Counting and Probability, Precalculus etc. For a given math problem, your task is to generate a step-by-step reasoning-based solution providing an answer to the question. Identify the correct concepts, formulas and heuristics that needs to be applied and then derive the contents of the reasoning steps from the given contexts and accurate calculations from the previous reasoning steps."
78+
question = "Yann and Camille go to a restaurant. </S>\nIf there are 10 items on the menu, and each orders one dish, how many different combinations of meals can Yann and Camille order if they refuse to order the same dish? (It does matter who orders what---Yann ordering chicken and Camille ordering fish is different from Yann ordering fish and Camille ordering chicken.)"
79+
correct_generation = "Let's think step by step.\nYann can order 1 of the 10 dishes. ки\nWhen he picks a dish, there are 9 left for Camille to choose from. ки\nThus, there are $10\\cdot 9=\\boxed{90}$ possible combinations.\nHence, the answer is 90. ки\n"
80+
incorrect_generation = "Let's think step by step.\nWithout any restrictions, Yann and Camille could both order the same dish out of the 10 options, for a total of $10 \\cdot 9$ dishes. ки\nHowever, since Yann orders one of the 9 dishes that Camille didn't order (and vice versa), the number of possible combinations becomes $10 \\cdot 9 - 8 = \\boxed{72}$.\nHence, the answer is 72. ки\n"
7281

73-
This is how you can use `arxiv2025_repa_prm` from command line:
82+
for generation in (correct_generation, incorrect_generation):
83+
message = [
84+
dict(role="system", content=instruction),
85+
dict(role="user", content=question),
86+
dict(role="user", content=generation),
87+
]
7488

75-
```bash
76-
$ python -m arxiv2025_repa_prm
77-
```
89+
input_ids = tokenizer.apply_chat_template(message, tokenize=True, return_tensors="pt").to(device)
7890

79-
### Expected results
91+
with torch.no_grad():
92+
logits = model(input_ids).logits[:,:,step_target_ids]
93+
scores = logits.softmax(dim=-1)[:,:,1] # correct_token at index 1 in the step_target_ids
94+
step_scores = scores[input_ids == step_tag_id]
95+
print(step_scores)
96+
97+
# tensor([0.9561, 0.9496, 0.9527]) - correct_generation
98+
# tensor([0.6638, 0.6755]) - incorrect_generation
99+
```
80100

81-
After running the experiments, you should expect the following results:
101+
Contact person: [Md Imbesat Hassan Rizvi](mailto:[email protected])
82102

83-
(Feel free to describe your expected results here...)
103+
[UKP Lab](https://www.ukp.tu-darmstadt.de/) | [TU Darmstadt](https://www.tu-darmstadt.de/
104+
)
84105

85-
### Parameter description
106+
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
86107

87-
* `x, --xxxx`: This parameter does something nice
88108

89-
* ...
109+
<!-- ## Getting Started
90110
91-
* `z, --zzzz`: This parameter does something even nicer
111+
> **DO NOT CLONE OR FORK**
92112
93-
## Development
113+
If you want to set up this template:
94114
95-
Read the FAQs in [ABOUT_THIS_TEMPLATE.md](ABOUT_THIS_TEMPLATE.md) to learn more about how this template works and where you should put your classes & methods. Make sure you've correctly installed `requirements-dev.txt` dependencies
115+
1. Request a repository on UKP Lab's GitHub by following the standard procedure on the wiki. It will install the template directly. Alternatively, set it up in your personal GitHub account by clicking **[Use this template](https://github.com/rochacbruno/python-project-template/generate)**.
116+
2. Wait until the first run of CI finishes. Github Actions will commit to your new repo with a "✅ Ready to clone and code" message.
117+
3. Delete optional files:
118+
- If you don't need automatic documentation generation, you can delete folder `docs`, file `.github\workflows\docs.yml` and `mkdocs.yml`
119+
- If you don't want automatic testing, you can delete folder `tests` and file `.github\workflows\tests.yml`
120+
- If you do not wish to have a project page, delete folder `static` and files `.nojekyll`, `index.html`
121+
4. Read the file [ABOUT_THIS_TEMPLATE.md](ABOUT_THIS_TEMPLATE.md) for more information about development. -->
96122

97123
## Cite
98124

99-
Please use the following citation:
125+
If you use this repository, our trained SPARE-PRM model or our work, please cite:
100126

101127
```
102128
@misc{rizvi2024spare,

0 commit comments

Comments
 (0)