Inference Foundations: Batching, Modes, and Fast Runs#
This tutorial shows how inference speed changes across QSTN questionnaire presentations using a real local vLLM server.
Main idea in plain words: QSTN batches multiple llm_prompts, not multiple questions inside one prompt.
Speed is mostly about how much work you can do in parallel per round, so more llm_prompts usually helps more than adding more questions to each prompt.
Everything in this notebook uses OPEN responses (no RGM). We also include a reasoning parsing example to show why token budget matters for thinking models.
1. Imports#
We use the same core imports as the other tutorials:
AsyncOpenAIfor the local OpenAI-compatible vLLM serverLLMPromptfor questionnaire/prompt constructionconduct_survey_*for inference modesraw_responsesfor parsed dataframe output
import statistics
import time
import pandas as pd
from openai import AsyncOpenAI
from qstn.parser import raw_responses
from qstn.prompt_builder import LLMPrompt, generate_likert_options
from qstn.survey_manager import (
conduct_survey_battery,
conduct_survey_sequential,
conduct_survey_single_item,
)
from qstn.utilities import placeholder
/home/maxi/anaconda3/envs/qstn_dev/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
2. Inference Setup (Local vLLM Server)#
This notebook assumes a normal local server setup.
vllm serve Qwen/Qwen3-VL-2B-Thinking --max-model-len 8192 --port 8000
We connect through AsyncOpenAI to http://localhost:8000/v1.
SEED = 42
MODEL_ID = "Qwen/Qwen3-VL-2B-Thinking"
OPENAI_API_KEY = "EMPTY"
OPENAI_API_BASE = "http://localhost:8000/v1"
generator = AsyncOpenAI(
api_key=OPENAI_API_KEY,
base_url=OPENAI_API_BASE,
timeout=120.0,
max_retries=4,
)
print(f"Model: {MODEL_ID}")
print(f"Base URL: {OPENAI_API_BASE}")
Model: Qwen/Qwen3-VL-2B-Thinking
Base URL: http://localhost:8000/v1
4. First Live Inference (Single-Item)#
We start with conduct_survey_single_item so you can see clean, per-question responses for each persona.
This gives a straightforward baseline before timing comparisons.
single_item_results = conduct_survey_single_item(
generator,
llm_prompts=demo_llm_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
temperature=0.0,
top_p=1.0,
max_tokens=120,
)
single_item_df = raw_responses(single_item_results)[demo_llm_prompts[0]][
["questionnaire_item_id", "question", "llm_response", "reasoning"]
]
display(single_item_df)
| questionnaire_item_id | question | llm_response | reasoning | |
|---|---|---|---|---|
| 0 | q1 | The government should invest more in public tr... | Okay, the user wants me to respond as a 23-yea... | None |
| 1 | q2 | Remote work should remain a legal right for of... | Okay, the user wants me to respond as a 23-yea... | None |
| 2 | q3 | Nuclear power should be part of the country's ... | Okay, the user wants me to respond as a 23-yea... | None |
5. Reasoning Parsing: Low vs High max_tokens#
This comparison uses the same normal survey prompt and only changes max_tokens.
For thinking models, larger token budgets can make the difference between truncated output and fully parseable reasoning + final answer.
reasoning_questionnaire = make_questionnaire(n_questions=1)
reasoning_llm_prompt = make_llm_prompts(PERSONAS[:1], reasoning_questionnaire)[0]
reasoning_prompts = [reasoning_llm_prompt]
low_token_results = conduct_survey_single_item(
generator,
llm_prompts=reasoning_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
temperature=0.0,
top_p=1.0,
max_tokens=32,
)
high_token_results = conduct_survey_single_item(
generator,
llm_prompts=reasoning_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
temperature=0.0,
top_p=1.0,
max_tokens=6000,
)
low_df = raw_responses(low_token_results)[reasoning_llm_prompt].copy()
high_df = raw_responses(high_token_results)[reasoning_llm_prompt].copy()
comparison_df = pd.DataFrame(
[
{
"run": "low_max_tokens",
"max_tokens": 32,
"answer_chars": len(str(low_df.loc[0, "llm_response"])),
"reasoning_chars": (
len(str(low_df.loc[0, "reasoning"])) if low_df.loc[0, "reasoning"] else 0
),
"answer_preview": str(low_df.loc[0, "llm_response"])[:140],
},
{
"run": "high_max_tokens",
"max_tokens": 6000,
"answer_chars": len(str(high_df.loc[0, "llm_response"])),
"reasoning_chars": (
len(str(high_df.loc[0, "reasoning"])) if high_df.loc[0, "reasoning"] else 0
),
"answer_preview": str(high_df.loc[0, "llm_response"])[:140],
},
]
)
display(comparison_df)
| run | max_tokens | answer_chars | reasoning_chars | answer_preview | |
|---|---|---|---|---|---|
| 0 | low_max_tokens | 32 | 139 | 0 | Okay, the user wants me to respond as a 23-yea... |
| 1 | high_max_tokens | 6000 | 21534 | 2049 | **Answer: 4 - Agree** \n\n*(As a 23-year-old ... |
print("High max_tokens parsed reasoning (preview):")
print(str(high_df.loc[0, "reasoning"])[:600])
print("\nHigh max_tokens final answer:")
print(high_df.loc[0, "llm_response"])
High max_tokens parsed reasoning (preview):
Okay, the user wants me to respond as a 23-year-old university student in a large city with a limited budget. They're asking for an opinion survey answer about government investing in public transport despite slight tax increases.
Hmm, I need to embody this persona authentically. The student is young, urban, and budget-conscious - that means I should avoid sounding like a policy wonk. Real talk: they'd care about practical impacts, not abstract ideals.
*checks mental notes* Key angles to hit:
- Must show genuine frustration with current transport options (that "boring" bus detail feels re
High max_tokens final answer:
**Answer: 4 - Agree**
*(As a 23-year-old student in a big city with a tight budget, I’m tired of the endless traffic and the way public transport is so unreliable. I’ve seen how it’s a huge pain point for everyone—especially when you’re juggling classes and a job. But I’m not going to let the government ignore it. If they invest more in buses and trains, even if it means a tiny bit more taxes, it’s worth it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’d rather pay for this than let the city get worse. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay
6. Batching Mental Model#
Two terms make this easy:
batch width: how manyllm_promptsare processed in parallelrounds: how many question steps are needed
Wider batches usually improve throughput. More rounds usually increase total wall-clock time.
Important: QSTN batches across prompts (llm_prompts). It does not batch multiple question steps from the same prompt as one parallel unit.
scenario_a = {"label": "A: wide batch", "n_llm_prompts": 6, "questions_per_prompt": 2}
scenario_b = {"label": "B: deep questionnaire", "n_llm_prompts": 2, "questions_per_prompt": 6}
geometry_df = pd.DataFrame(
[
{
"scenario": s["label"],
"batch_width": s["n_llm_prompts"],
"rounds_single_item_or_sequential": s["questions_per_prompt"],
"total_answers": s["n_llm_prompts"] * s["questions_per_prompt"],
}
for s in [scenario_a, scenario_b]
]
)
display(geometry_df)
| scenario | batch_width | rounds_single_item_or_sequential | total_answers | |
|---|---|---|---|---|
| 0 | A: wide batch | 6 | 2 | 12 |
| 1 | B: deep questionnaire | 2 | 6 | 12 |
Useful intuition:
Single-item: high batching clarity, no cross-question context.
Sequential: keeps conversation context across questions, but each next step depends on previous answers.
Battery: asks all questions in one prompt, so it often needs fewer rounds.
QSTN parallelism mainly comes from the number of
llm_promptsyou run together.
7. Mini Benchmark: Wider vs Deeper#
This section compares equal total work with two shapes:
A: many prompts, fewer questions each
B: fewer prompts, more questions each
Same total answers, different batching geometry.
def build_prompts_for_shape(n_llm_prompts: int, questions_per_prompt: int) -> list[LLMPrompt]:
if n_llm_prompts > len(PERSONAS):
raise ValueError(
f"Requested {n_llm_prompts} prompts but only {len(PERSONAS)} personas are defined."
)
questionnaire = make_questionnaire(questions_per_prompt)
personas = PERSONAS[:n_llm_prompts]
return make_llm_prompts(personas, questionnaire)
def time_mode(
mode_fn, llm_prompts: list[LLMPrompt], repeats: int = 2, **generation_kwargs
) -> float:
durations = []
for _ in range(repeats):
start = time.perf_counter()
_ = mode_fn(
generator,
llm_prompts=llm_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
**generation_kwargs,
)
durations.append(time.perf_counter() - start)
return statistics.mean(durations)
benchmark_kwargs = {
"temperature": 0.0,
"top_p": 1.0,
"max_tokens": 120,
}
benchmark_rows = []
for shape in [scenario_a, scenario_b]:
prompts = build_prompts_for_shape(
n_llm_prompts=shape["n_llm_prompts"],
questions_per_prompt=shape["questions_per_prompt"],
)
elapsed = time_mode(
conduct_survey_single_item,
prompts,
repeats=2,
**benchmark_kwargs,
)
benchmark_rows.append(
{
"scenario": shape["label"],
"n_llm_prompts": shape["n_llm_prompts"],
"questions_per_prompt": shape["questions_per_prompt"],
"estimated_rounds": shape["questions_per_prompt"],
"elapsed_seconds": elapsed,
}
)
benchmark_df = pd.DataFrame(benchmark_rows)
display(benchmark_df)
| scenario | n_llm_prompts | questions_per_prompt | estimated_rounds | elapsed_seconds | |
|---|---|---|---|---|---|
| 0 | A: wide batch | 6 | 2 | 2 | 5.994777 |
| 1 | B: deep questionnaire | 2 | 6 | 6 | 10.931068 |
Interpretation:
In most setups, shape A (wider) beats shape B (deeper).
Exact numbers change with hardware, model, and server load.
8. All Three Presentations: Time Comparison#
Here we benchmark single_item, sequential, and battery on the exact same prompts and questions.
This is the cleanest direct comparison of practical speed.
comparison_prompts = build_prompts_for_shape(n_llm_prompts=2, questions_per_prompt=3)
shared_kwargs = {
"temperature": 0.0,
"top_p": 1.0,
"max_tokens": 120,
}
records = []
start = time.perf_counter()
_ = conduct_survey_single_item(
generator,
llm_prompts=comparison_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
**shared_kwargs,
)
records.append({"mode": "single_item", "elapsed_seconds": time.perf_counter() - start})
start = time.perf_counter()
_ = conduct_survey_sequential(
generator,
llm_prompts=comparison_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
**shared_kwargs,
)
records.append({"mode": "sequential", "elapsed_seconds": time.perf_counter() - start})
start = time.perf_counter()
_ = conduct_survey_battery(
generator,
llm_prompts=comparison_prompts,
client_model_name=MODEL_ID,
api_concurrency=3,
seed=SEED,
print_progress=False,
item_separator="\n",
**shared_kwargs,
)
records.append({"mode": "battery", "elapsed_seconds": time.perf_counter() - start})
mode_compare_df = (
pd.DataFrame(records).sort_values("elapsed_seconds", ascending=True).reset_index(drop=True)
)
mode_compare_df["rank"] = mode_compare_df.index + 1
mode_compare_df["is_fastest"] = mode_compare_df["rank"] == 1
display(mode_compare_df[["rank", "mode", "elapsed_seconds", "is_fastest"]])
| rank | mode | elapsed_seconds | is_fastest | |
|---|---|---|---|---|
| 0 | 1 | battery | 1.754532 | True |
| 1 | 2 | single_item | 5.252201 | False |
| 2 | 3 | sequential | 5.299915 | False |
Battery mode will be faster, as long you don’t have very long questionnaires. If you have very long questionnaires, the attention cost of the transformer algorithm increases, and single_item and sequential with prefix caching might outperform battery again.
9. Inference Options Tour#
These are the options most people use first:
seedfor reproducibilityprint_progressandprint_conversationfor observabilityn_save_step+intermediate_save_filefor checkpointingitem_separatorfor battery formattingclient_model_name+api_concurrencyfor API routing and parallelism
options_reference = pd.DataFrame(
[
{"option": "seed", "why": "Reproducible runs"},
{"option": "print_progress", "why": "Progress bar visibility"},
{"option": "print_conversation", "why": "Inspect prompt and answer flow"},
{"option": "n_save_step + intermediate_save_file", "why": "Periodic CSV checkpoints"},
{"option": "item_separator", "why": "Question separator for battery mode"},
{
"option": "client_model_name + api_concurrency",
"why": "Backend model routing and API parallelism",
},
]
)
display(options_reference)
| option | why | |
|---|---|---|
| 0 | seed | Reproducible runs |
| 1 | print_progress | Progress bar visibility |
| 2 | print_conversation | Inspect prompt and answer flow |
| 3 | n_save_step + intermediate_save_file | Periodic CSV checkpoints |
| 4 | item_separator | Question separator for battery mode |
| 5 | client_model_name + api_concurrency | Backend model routing and API parallelism |
checkpoint_path = "/tmp/qstn_inference_foundations_checkpoint.csv"
_ = conduct_survey_battery(
generator,
llm_prompts=comparison_prompts,
client_model_name=MODEL_ID,
api_concurrency=4,
seed=SEED,
print_progress=False,
print_conversation=False,
n_save_step=1,
intermediate_save_file=checkpoint_path,
item_separator="\n---\n",
temperature=0.0,
top_p=1.0,
max_tokens=80,
)
print(f"Checkpoint file written: {checkpoint_path}")
Checkpoint file written: /tmp/qstn_inference_foundations_checkpoint.csv
10. Generation Kwargs Passthrough#
You can pass generation kwargs directly in conduct_survey_* calls.
This keeps tuning simple and close to backend-native parameters.
Rule: If your OpenAI-compatible vLLM server supports a generation kwarg, pass it directly to
conduct_survey_*.
_ = conduct_survey_single_item(
generator,
llm_prompts=demo_llm_prompts,
client_model_name=MODEL_ID,
api_concurrency=4,
seed=SEED,
print_progress=False,
temperature=0.0,
top_p=0.95,
max_tokens=96,
)
print("Ran kwargs passthrough demo.")
Ran kwargs passthrough demo.
Backend mapping overview:
Backend path |
Where kwargs go |
|---|---|
vLLM via |
QSTN forwards generation kwargs to backend generation calls |
OpenAI-compatible API via |
QSTN forwards kwargs to |
11. Recap and Mode Selection Cheat Sheet#
Use this section as your practical default guide.
Key takeaways:
batteryis usually the fastest mode.Important exception: if the shared prefix is very short but each question is very long, attention cost can dominate and reduce the battery advantage.
Only
batteryandsequentialkeep full questionnaire context across questions.single_itemresets context for each question.sequentialcan be close tosingle_itemin speed when prefix/state caching is effective and fits reliably in available RAM.For thinking models, increase
max_tokensif reasoning or final answers get truncated.
Quick decision table:
Goal |
Recommended mode |
Why |
|---|---|---|
Fastest default for most workloads |
|
Often the fewest rounds and highest throughput |
Preserve context across questions |
|
Later answers can depend on earlier turns |
Isolated per-question answers |
|
Fresh context each question, easier per-item analysis |
Large long-question batteries where attention cost spikes |
Compare |
Very long combined prompts can reduce battery’s speed edge |
Sequential near single-item speed |
Depends on cache fit |
Works best when prefix/state caching is stable in RAM |