Inference Foundations: Batching, Modes, and Fast Runs#

This tutorial shows how inference speed changes across QSTN questionnaire presentations using a real local vLLM server.

Main idea in plain words: QSTN batches multiple llm_prompts, not multiple questions inside one prompt. Speed is mostly about how much work you can do in parallel per round, so more llm_prompts usually helps more than adding more questions to each prompt.

Everything in this notebook uses OPEN responses (no RGM). We also include a reasoning parsing example to show why token budget matters for thinking models.

1. Imports#

We use the same core imports as the other tutorials:

  • AsyncOpenAI for the local OpenAI-compatible vLLM server

  • LLMPrompt for questionnaire/prompt construction

  • conduct_survey_* for inference modes

  • raw_responses for parsed dataframe output

import statistics
import time

import pandas as pd
from openai import AsyncOpenAI

from qstn.parser import raw_responses
from qstn.prompt_builder import LLMPrompt, generate_likert_options
from qstn.survey_manager import (
    conduct_survey_battery,
    conduct_survey_sequential,
    conduct_survey_single_item,
)
from qstn.utilities import placeholder
/home/maxi/anaconda3/envs/qstn_dev/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

2. Inference Setup (Local vLLM Server)#

This notebook assumes a normal local server setup.

vllm serve Qwen/Qwen3-VL-2B-Thinking --max-model-len 8192 --port 8000

We connect through AsyncOpenAI to http://localhost:8000/v1.

SEED = 42
MODEL_ID = "Qwen/Qwen3-VL-2B-Thinking"
OPENAI_API_KEY = "EMPTY"
OPENAI_API_BASE = "http://localhost:8000/v1"

generator = AsyncOpenAI(
    api_key=OPENAI_API_KEY,
    base_url=OPENAI_API_BASE,
    timeout=120.0,
    max_retries=4,
)

print(f"Model: {MODEL_ID}")
print(f"Base URL: {OPENAI_API_BASE}")
Model: Qwen/Qwen3-VL-2B-Thinking
Base URL: http://localhost:8000/v1

3. Shared Prompt Setup#

This notebook now uses a more practical setup: several realistic personas and real public-opinion questions.

That makes the generated answers more meaningful and easier to interpret than synthetic placeholder questions.

PERSONAS = [
    "A 23-year-old university student in a large city with limited budget.",
    "A 41-year-old parent commuting daily by car from the suburbs.",
    "A 36-year-old small business owner focused on taxes and stability.",
    "A 68-year-old retired engineer who values long-term infrastructure planning.",
    "A 29-year-old nurse working shifts and relying on public transport.",
    "A 52-year-old factory worker concerned about local jobs and energy prices.",
]

SURVEY_QUESTIONS = [
    "The government should invest more in public transport, even if taxes increase slightly.",
    "Remote work should remain a legal right for office jobs when possible.",
    "Nuclear power should be part of the country's long-term energy strategy.",
    "Housing policy should prioritize affordable rentals over market-rate development.",
    "Social media platforms should be required to verify user identities.",
    "Universities should be tuition-free for all domestic students.",
    "The voting age should be lowered to 16 for national elections.",
    "Healthcare spending should increase, even if it requires cuts in other budgets.",
]

LIKERT_TEXTS = [
    "Strongly disagree",
    "Disagree",
    "Neutral",
    "Agree",
    "Strongly agree",
]

system_prompt_template = (
    "Take the perspective of this persona and answer like they would in a public opinion survey: "
    "{persona}"
)
prompt_template = (
    "Please answer the following survey item:\\n"
    f"{placeholder.PROMPT_QUESTIONS}\\n"
    f"{placeholder.PROMPT_OPTIONS}"
)


def open_answer_options():
    return generate_likert_options(
        n=5,
        answer_texts=LIKERT_TEXTS,
        list_prompt_template="Answer options: {options}",
    )


def make_questionnaire(n_questions: int) -> pd.DataFrame:
    rows = []
    for i in range(n_questions):
        base_question = SURVEY_QUESTIONS[i % len(SURVEY_QUESTIONS)]
        rows.append(
            {
                "questionnaire_item_id": f"q{i+1}",
                "question_content": base_question,
            }
        )
    return pd.DataFrame(rows)


def make_llm_prompts(personas: list[str], questionnaire: pd.DataFrame) -> list[LLMPrompt]:
    prompts = []
    for idx, persona in enumerate(personas):
        llm_prompt = LLMPrompt(
            questionnaire_source=questionnaire,
            questionnaire_name=f"persona_{idx+1}",
            system_prompt=system_prompt_template.format(persona=persona),
            prompt=prompt_template,
            seed=SEED,
        )
        llm_prompt.prepare_prompt(answer_options=open_answer_options())
        prompts.append(llm_prompt)
    return prompts
demo_questionnaire = make_questionnaire(n_questions=3)
demo_llm_prompts = make_llm_prompts(PERSONAS[:3], demo_questionnaire)

display(demo_questionnaire)
print(f"llm_prompts: {len(demo_llm_prompts)}")
print(f"questions per prompt: {len(demo_llm_prompts[0])}")
questionnaire_item_id question_content
0 q1 The government should invest more in public tr...
1 q2 Remote work should remain a legal right for of...
2 q3 Nuclear power should be part of the country's ...
llm_prompts: 3
questions per prompt: 3

4. First Live Inference (Single-Item)#

We start with conduct_survey_single_item so you can see clean, per-question responses for each persona.

This gives a straightforward baseline before timing comparisons.

single_item_results = conduct_survey_single_item(
    generator,
    llm_prompts=demo_llm_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=3,
    seed=SEED,
    print_progress=False,
    temperature=0.0,
    top_p=1.0,
    max_tokens=120,
)

single_item_df = raw_responses(single_item_results)[demo_llm_prompts[0]][
    ["questionnaire_item_id", "question", "llm_response", "reasoning"]
]
display(single_item_df)
questionnaire_item_id question llm_response reasoning
0 q1 The government should invest more in public tr... Okay, the user wants me to respond as a 23-yea... None
1 q2 Remote work should remain a legal right for of... Okay, the user wants me to respond as a 23-yea... None
2 q3 Nuclear power should be part of the country's ... Okay, the user wants me to respond as a 23-yea... None

5. Reasoning Parsing: Low vs High max_tokens#

This comparison uses the same normal survey prompt and only changes max_tokens.

For thinking models, larger token budgets can make the difference between truncated output and fully parseable reasoning + final answer.

reasoning_questionnaire = make_questionnaire(n_questions=1)
reasoning_llm_prompt = make_llm_prompts(PERSONAS[:1], reasoning_questionnaire)[0]
reasoning_prompts = [reasoning_llm_prompt]

low_token_results = conduct_survey_single_item(
    generator,
    llm_prompts=reasoning_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=3,
    seed=SEED,
    print_progress=False,
    temperature=0.0,
    top_p=1.0,
    max_tokens=32,
)

high_token_results = conduct_survey_single_item(
    generator,
    llm_prompts=reasoning_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=3,
    seed=SEED,
    print_progress=False,
    temperature=0.0,
    top_p=1.0,
    max_tokens=6000,
)

low_df = raw_responses(low_token_results)[reasoning_llm_prompt].copy()
high_df = raw_responses(high_token_results)[reasoning_llm_prompt].copy()

comparison_df = pd.DataFrame(
    [
        {
            "run": "low_max_tokens",
            "max_tokens": 32,
            "answer_chars": len(str(low_df.loc[0, "llm_response"])),
            "reasoning_chars": (
                len(str(low_df.loc[0, "reasoning"])) if low_df.loc[0, "reasoning"] else 0
            ),
            "answer_preview": str(low_df.loc[0, "llm_response"])[:140],
        },
        {
            "run": "high_max_tokens",
            "max_tokens": 6000,
            "answer_chars": len(str(high_df.loc[0, "llm_response"])),
            "reasoning_chars": (
                len(str(high_df.loc[0, "reasoning"])) if high_df.loc[0, "reasoning"] else 0
            ),
            "answer_preview": str(high_df.loc[0, "llm_response"])[:140],
        },
    ]
)

display(comparison_df)
run max_tokens answer_chars reasoning_chars answer_preview
0 low_max_tokens 32 139 0 Okay, the user wants me to respond as a 23-yea...
1 high_max_tokens 6000 21534 2049 **Answer: 4 - Agree** \n\n*(As a 23-year-old ...
print("High max_tokens parsed reasoning (preview):")
print(str(high_df.loc[0, "reasoning"])[:600])
print("\nHigh max_tokens final answer:")
print(high_df.loc[0, "llm_response"])
High max_tokens parsed reasoning (preview):
Okay, the user wants me to respond as a 23-year-old university student in a large city with a limited budget. They're asking for an opinion survey answer about government investing in public transport despite slight tax increases. 

Hmm, I need to embody this persona authentically. The student is young, urban, and budget-conscious - that means I should avoid sounding like a policy wonk. Real talk: they'd care about practical impacts, not abstract ideals. 

*checks mental notes* Key angles to hit: 
- Must show genuine frustration with current transport options (that "boring" bus detail feels re

High max_tokens final answer:
**Answer: 4 - Agree**  

*(As a 23-year-old student in a big city with a tight budget, I’m tired of the endless traffic and the way public transport is so unreliable. I’ve seen how it’s a huge pain point for everyone—especially when you’re juggling classes and a job. But I’m not going to let the government ignore it. If they invest more in buses and trains, even if it means a tiny bit more taxes, it’s worth it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’d rather pay for this than let the city get worse. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay for this than keep paying for the current system, which is so slow and crowded. I know it’s not easy, but I think it’s the right thing to do. I’m not sure if it’s going to be a big deal, but I think it’s important. I’m not going to let the government ignore it. I’d rather pay

6. Batching Mental Model#

Two terms make this easy:

  • batch width: how many llm_prompts are processed in parallel

  • rounds: how many question steps are needed

Wider batches usually improve throughput. More rounds usually increase total wall-clock time.

Important: QSTN batches across prompts (llm_prompts). It does not batch multiple question steps from the same prompt as one parallel unit.

scenario_a = {"label": "A: wide batch", "n_llm_prompts": 6, "questions_per_prompt": 2}
scenario_b = {"label": "B: deep questionnaire", "n_llm_prompts": 2, "questions_per_prompt": 6}

geometry_df = pd.DataFrame(
    [
        {
            "scenario": s["label"],
            "batch_width": s["n_llm_prompts"],
            "rounds_single_item_or_sequential": s["questions_per_prompt"],
            "total_answers": s["n_llm_prompts"] * s["questions_per_prompt"],
        }
        for s in [scenario_a, scenario_b]
    ]
)

display(geometry_df)
scenario batch_width rounds_single_item_or_sequential total_answers
0 A: wide batch 6 2 12
1 B: deep questionnaire 2 6 12

Useful intuition:

  • Single-item: high batching clarity, no cross-question context.

  • Sequential: keeps conversation context across questions, but each next step depends on previous answers.

  • Battery: asks all questions in one prompt, so it often needs fewer rounds.

  • QSTN parallelism mainly comes from the number of llm_prompts you run together.

7. Mini Benchmark: Wider vs Deeper#

This section compares equal total work with two shapes:

  • A: many prompts, fewer questions each

  • B: fewer prompts, more questions each

Same total answers, different batching geometry.

def build_prompts_for_shape(n_llm_prompts: int, questions_per_prompt: int) -> list[LLMPrompt]:
    if n_llm_prompts > len(PERSONAS):
        raise ValueError(
            f"Requested {n_llm_prompts} prompts but only {len(PERSONAS)} personas are defined."
        )

    questionnaire = make_questionnaire(questions_per_prompt)
    personas = PERSONAS[:n_llm_prompts]
    return make_llm_prompts(personas, questionnaire)


def time_mode(
    mode_fn, llm_prompts: list[LLMPrompt], repeats: int = 2, **generation_kwargs
) -> float:
    durations = []
    for _ in range(repeats):
        start = time.perf_counter()
        _ = mode_fn(
            generator,
            llm_prompts=llm_prompts,
            client_model_name=MODEL_ID,
            api_concurrency=3,
            seed=SEED,
            print_progress=False,
            **generation_kwargs,
        )
        durations.append(time.perf_counter() - start)
    return statistics.mean(durations)
benchmark_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_tokens": 120,
}

benchmark_rows = []
for shape in [scenario_a, scenario_b]:
    prompts = build_prompts_for_shape(
        n_llm_prompts=shape["n_llm_prompts"],
        questions_per_prompt=shape["questions_per_prompt"],
    )

    elapsed = time_mode(
        conduct_survey_single_item,
        prompts,
        repeats=2,
        **benchmark_kwargs,
    )

    benchmark_rows.append(
        {
            "scenario": shape["label"],
            "n_llm_prompts": shape["n_llm_prompts"],
            "questions_per_prompt": shape["questions_per_prompt"],
            "estimated_rounds": shape["questions_per_prompt"],
            "elapsed_seconds": elapsed,
        }
    )

benchmark_df = pd.DataFrame(benchmark_rows)
display(benchmark_df)
scenario n_llm_prompts questions_per_prompt estimated_rounds elapsed_seconds
0 A: wide batch 6 2 2 5.994777
1 B: deep questionnaire 2 6 6 10.931068

Interpretation:

  • In most setups, shape A (wider) beats shape B (deeper).

  • Exact numbers change with hardware, model, and server load.

8. All Three Presentations: Time Comparison#

Here we benchmark single_item, sequential, and battery on the exact same prompts and questions.

This is the cleanest direct comparison of practical speed.

comparison_prompts = build_prompts_for_shape(n_llm_prompts=2, questions_per_prompt=3)

shared_kwargs = {
    "temperature": 0.0,
    "top_p": 1.0,
    "max_tokens": 120,
}

records = []

start = time.perf_counter()
_ = conduct_survey_single_item(
    generator,
    llm_prompts=comparison_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=3,
    seed=SEED,
    print_progress=False,
    **shared_kwargs,
)
records.append({"mode": "single_item", "elapsed_seconds": time.perf_counter() - start})

start = time.perf_counter()
_ = conduct_survey_sequential(
    generator,
    llm_prompts=comparison_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=3,
    seed=SEED,
    print_progress=False,
    **shared_kwargs,
)
records.append({"mode": "sequential", "elapsed_seconds": time.perf_counter() - start})

start = time.perf_counter()
_ = conduct_survey_battery(
    generator,
    llm_prompts=comparison_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=3,
    seed=SEED,
    print_progress=False,
    item_separator="\n",
    **shared_kwargs,
)
records.append({"mode": "battery", "elapsed_seconds": time.perf_counter() - start})

mode_compare_df = (
    pd.DataFrame(records).sort_values("elapsed_seconds", ascending=True).reset_index(drop=True)
)
mode_compare_df["rank"] = mode_compare_df.index + 1
mode_compare_df["is_fastest"] = mode_compare_df["rank"] == 1
display(mode_compare_df[["rank", "mode", "elapsed_seconds", "is_fastest"]])
rank mode elapsed_seconds is_fastest
0 1 battery 1.754532 True
1 2 single_item 5.252201 False
2 3 sequential 5.299915 False

Battery mode will be faster, as long you don’t have very long questionnaires. If you have very long questionnaires, the attention cost of the transformer algorithm increases, and single_item and sequential with prefix caching might outperform battery again.

9. Inference Options Tour#

These are the options most people use first:

  • seed for reproducibility

  • print_progress and print_conversation for observability

  • n_save_step + intermediate_save_file for checkpointing

  • item_separator for battery formatting

  • client_model_name + api_concurrency for API routing and parallelism

options_reference = pd.DataFrame(
    [
        {"option": "seed", "why": "Reproducible runs"},
        {"option": "print_progress", "why": "Progress bar visibility"},
        {"option": "print_conversation", "why": "Inspect prompt and answer flow"},
        {"option": "n_save_step + intermediate_save_file", "why": "Periodic CSV checkpoints"},
        {"option": "item_separator", "why": "Question separator for battery mode"},
        {
            "option": "client_model_name + api_concurrency",
            "why": "Backend model routing and API parallelism",
        },
    ]
)

display(options_reference)
option why
0 seed Reproducible runs
1 print_progress Progress bar visibility
2 print_conversation Inspect prompt and answer flow
3 n_save_step + intermediate_save_file Periodic CSV checkpoints
4 item_separator Question separator for battery mode
5 client_model_name + api_concurrency Backend model routing and API parallelism
checkpoint_path = "/tmp/qstn_inference_foundations_checkpoint.csv"

_ = conduct_survey_battery(
    generator,
    llm_prompts=comparison_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=4,
    seed=SEED,
    print_progress=False,
    print_conversation=False,
    n_save_step=1,
    intermediate_save_file=checkpoint_path,
    item_separator="\n---\n",
    temperature=0.0,
    top_p=1.0,
    max_tokens=80,
)

print(f"Checkpoint file written: {checkpoint_path}")
Checkpoint file written: /tmp/qstn_inference_foundations_checkpoint.csv

10. Generation Kwargs Passthrough#

You can pass generation kwargs directly in conduct_survey_* calls.

This keeps tuning simple and close to backend-native parameters.

Rule: If your OpenAI-compatible vLLM server supports a generation kwarg, pass it directly to conduct_survey_*.

_ = conduct_survey_single_item(
    generator,
    llm_prompts=demo_llm_prompts,
    client_model_name=MODEL_ID,
    api_concurrency=4,
    seed=SEED,
    print_progress=False,
    temperature=0.0,
    top_p=0.95,
    max_tokens=96,
)

print("Ran kwargs passthrough demo.")
Ran kwargs passthrough demo.

Backend mapping overview:

Backend path

Where kwargs go

vLLM via conduct_survey_*

QSTN forwards generation kwargs to backend generation calls

OpenAI-compatible API via conduct_survey_*

QSTN forwards kwargs to client.chat.completions.create(...)

11. Recap and Mode Selection Cheat Sheet#

Use this section as your practical default guide.

Key takeaways:

  • battery is usually the fastest mode.

  • Important exception: if the shared prefix is very short but each question is very long, attention cost can dominate and reduce the battery advantage.

  • Only battery and sequential keep full questionnaire context across questions.

  • single_item resets context for each question.

  • sequential can be close to single_item in speed when prefix/state caching is effective and fits reliably in available RAM.

  • For thinking models, increase max_tokens if reasoning or final answers get truncated.

Quick decision table:

Goal

Recommended mode

Why

Fastest default for most workloads

conduct_survey_battery

Often the fewest rounds and highest throughput

Preserve context across questions

conduct_survey_sequential

Later answers can depend on earlier turns

Isolated per-question answers

conduct_survey_single_item

Fresh context each question, easier per-item analysis

Large long-question batteries where attention cost spikes

Compare battery vs single_item

Very long combined prompts can reduce battery’s speed edge

Sequential near single-item speed

Depends on cache fit

Works best when prefix/state caching is stable in RAM