Guide: Using Base Models#

Base models are trained to continue text. Unlike chat models, they have not been taught to wait for a user instruction and then provide a polished assistant response. That makes them a little less predictable, but also useful when the continuation itself is what we want to study.

The paper Post-training makes large language models less human-like reports that base models were often better aligned with human responses than their post-trained counterparts across a large collection of behavioral tasks, so for your experiments it might be interesting to also compare base model responses with your instruction tuned responses.

This tutorial shows how to setup surveys with base models and show some common tricks to get correct responses by the base model.

1. Set Up a Small Survey#

We start with one neutral question and four lettered options. Keeping the example small makes it easy to see what the base model receives and what it chooses to continue.

The QSTN prompt-building steps are the same as in the other guides. The important difference comes later, when we render and run the prompt in completion mode.

import pandas as pd

from qstn.logger import configure_logging
from qstn.prompt_builder import LLMPrompt, generate_likert_options
from qstn.survey_manager import conduct_survey_single_item
from qstn.utilities import placeholder
from qstn.utilities.constants import QuestionnairePresentation

configure_logging(level="WARNING", force=True)

We create a simple questionnaire with one question and 4 different answer options.

questionnaire = pd.DataFrame(
    [
        {
            "questionnaire_item_id": 1,
            "question_content": "On a quiet afternoon, what would you most enjoy doing?",
        }
    ]
)

option_labels = [
    "Read a book",
    "Take a walk",
    "Cook a meal",
    "Listen to music",
]

answer_options = generate_likert_options(
    n=len(option_labels),
    answer_texts=option_labels,
    idx_type="char_upper",
    start_idx=0,
    list_prompt_template="Choose one option: {options}",
)

We create a simple LLM Prompt now.

survey = LLMPrompt(
    questionnaire_name="quiet_afternoon",
    questionnaire_source=questionnaire,
    system_prompt="This is a short preference survey.",
    prompt=f"{placeholder.PROMPT_QUESTIONS}",
)
_ = survey.prepare_prompt(
    question_stem=(
        f"{placeholder.QUESTION_CONTENT}\n"
        f"{placeholder.PROMPT_OPTIONS}"
    ),
    answer_options=answer_options,
)

2. Load a Small Base Model#

We use Qwen/Qwen3-0.6B-Base, a compact pretrained model.

The first run may take a moment while the model is downloaded. Later runs reuse the same model object.

from vllm import LLM

model_id = "Qwen/Qwen3-0.6B-Base"
model = LLM(
    model_id,
    max_model_len=2048,
    gpu_memory_utilization=0.75,
    seed=42,
)

3. First Try: No Assistant Prefix#

For a base model, the structure of the prompt matters a lot.

Since we now have no more system prompt or chat templates, the input now is just plain text:

# Make sure to select the inference mode "completion" for base models
_, rendered_base_text = survey.get_prompt_for_questionnaire_type(
    questionnaire_type=QuestionnairePresentation.SINGLE_ITEM,
    item_position=0,
    inference_mode="completion",
)
print(rendered_base_text)
This is a short preference survey.
On a quiet afternoon, what would you most enjoy doing?
Choose one option: A: Read a book, B: Take a walk, C: Cook a meal, D: Listen to music

If we want to add our own structure to guide the model to the generation we want we can add custom markers for the system prompt, user prefix and separator between them.

no_prefix = survey.duplicate().set_base_model_prompt_template(
    system_prefix="Study context:",
    user_prefix="Survey question:",
    assistant_prefix=None,
    separator="\n\n",
)

_, rendered_no_prefix = no_prefix.get_prompt_for_questionnaire_type(
    questionnaire_type=QuestionnairePresentation.SINGLE_ITEM,
    item_position=0,
    inference_mode="completion",
)
print(rendered_no_prefix)
Study context:
This is a short preference survey.

Survey question:
On a quiet afternoon, what would you most enjoy doing?
Choose one option: A: Read a book, B: Take a walk, C: Cook a meal, D: Listen to music

For now, we would have a prompt very similar to how we would ask an Instruct model. Let’s see what happens:

no_prefix_results = conduct_survey_single_item(
    model,
    no_prefix,
    inference_mode="completion",
    print_progress=False,
    max_tokens=16,
)

print(no_prefix_results[0].to_dataframe()["llm_response"].iloc[0])
. Each option is answered by a percentage, with the highest percentage responding to each

The model is simply continuing text, so it can generate additional instructions or answer option. That behavior is not an error, because that is exactly what base models are trained to do: predict the next tokens.

If we want to generate answers to the question, we will have to design the prompt in the correct way.

4. Cue the Shape of the Answer#

Now we end the prompt with Answer: (. The opening parenthesis makes a letter such as A or B a natural next token. We also ask vLLM to stop when it reaches the closing parenthesis.

Notice that the generated response contains only the continuation. The prefix itself is already part of the prompt.

answer_prefix = survey.duplicate().set_base_model_prompt_template(
    system_prefix="Study context:",
    user_prefix="Survey question:",
    assistant_prefix="Answer: (",
    separator="\n\n",
)

_, rendered_answer_prefix = answer_prefix.get_prompt_for_questionnaire_type(
    questionnaire_type=QuestionnairePresentation.SINGLE_ITEM,
    item_position=0,
    inference_mode="completion",
)
print(rendered_answer_prefix)
Study context:
This is a short preference survey.

Survey question:
On a quiet afternoon, what would you most enjoy doing?
Choose one option: A: Read a book, B: Take a walk, C: Cook a meal, D: Listen to music

Answer: (
answer_prefix_results = conduct_survey_single_item(
    model,
    answer_prefix,
    inference_mode="completion",
    print_progress=False,
    temperature=0.0,
    max_tokens=8,
    stop=[")"],
)

print(answer_prefix_results[0].to_dataframe()["llm_response"].iloc[0])
A

And now the model actually generates one of the answer options.

5. Compare Choices with Log Probabilities#

Sometimes we want more than one generated letter. We may also want to know how strongly the model preferred each option at the point where it answered. A LogprobResponseGenerationMethod asks the backend for those first-token probabilities.

The settings below do three useful jobs:

  • allowed_choices_template="{options}" takes the valid choices from the answer options.

  • output_index_only=True uses the short labels A through D rather than the full answer text.

  • token_limit=1 and token_position=0 focus on the model’s first generated token.

With these options, we will get the next-token probability for each of the labels.

from qstn.inference import LogprobResponseGenerationMethod
from qstn.parser import parse_logprobs

logprob_method = LogprobResponseGenerationMethod(
    allowed_choices_template="{options}",
    output_index_only=True,
    token_position=0,
    token_limit=1,
    top_logprobs=20,
)

We simply update the survey to now use the logprob options as our response generation method.

logprob_options = answer_options
logprob_options.response_generation_method = logprob_method

logprob_survey = survey.duplicate()
_ = logprob_survey.prepare_prompt(
    question_stem=(
        f"{placeholder.QUESTION_CONTENT}\n"
        f"{placeholder.PROMPT_OPTIONS}"
    ),
    answer_options=logprob_options,
)

And now we get the probabilites for each of these tokens.

logprob_results = conduct_survey_single_item(
    model,
    logprob_survey,
    inference_mode="completion",
    print_progress=False,
)

choice_labels = logprob_options.response_generation_method.allowed_choices
probability_frame = parse_logprobs(
    logprob_results,
    allowed_choices=choice_labels,
)[logprob_survey]

generated_choice = logprob_results[0].to_dataframe()["llm_response"].iloc[0]
print("Generated choice:", generated_choice)
display(
    probability_frame[["questionnaire_item_id", *choice_labels]].round(3)
)
Generated choice: A
questionnaire_item_id A B C D
0 1 0.658 0.227 0.054 0.061

6. Use a Transcript-Style Delimiter#

Delimiters are another useful pattern. They make the prompt look like a transcript with a marked response span. Here the model continues after Participant response: <<, and generation stops at >>.

This pattern is especially handy when your data already resembles interviews, experiments, or question-and-answer records.

transcript_prefix = survey.duplicate().set_base_model_prompt_template(
    system_prefix="Study context:",
    user_prefix="Survey question:",
    assistant_prefix="Participant response: <<",
    separator="\n\n",
)

_, rendered_transcript = transcript_prefix.get_prompt_for_questionnaire_type(
    questionnaire_type=QuestionnairePresentation.SINGLE_ITEM,
    item_position=0,
    inference_mode="completion",
)
print(rendered_transcript)
Study context:
This is a short preference survey.

Survey question:
On a quiet afternoon, what would you most enjoy doing?
Choose one option: A: Read a book, B: Take a walk, C: Cook a meal, D: Listen to music

Participant response: <<
transcript_results = conduct_survey_single_item(
    model,
    transcript_prefix,
    inference_mode="completion",
    print_progress=False,
    temperature=0.0,
    max_tokens=12,
    stop=[">>"],
)

print(transcript_results[0].to_dataframe()["llm_response"].iloc[0])
A

7. Compare the Continuations#

Putting the three outputs next to each other makes the main lesson visible: a short ending cue can substantially change what a base model considers a natural continuation.

comparison = pd.DataFrame(
    {
        "prompt ending": [
            "No assistant prefix",
            "Answer: (",
            "Participant response: <<",
        ],
        "generated continuation": [
            no_prefix_results[0].to_dataframe()["llm_response"].iloc[0],
            answer_prefix_results[0].to_dataframe()["llm_response"].iloc[0],
            transcript_results[0].to_dataframe()["llm_response"].iloc[0],
        ],
    }
)

print("Comparison between different prompt methods:")
display(comparison)

print("Probabilites for the next token:")
display(
    probability_frame[["questionnaire_item_id", *choice_labels]].round(3)
)
Comparison between different prompt methods:
prompt ending generated continuation
0 No assistant prefix . Each option is answered by a percentage, wit...
1 Answer: ( A
2 Participant response: << A
Probabilites for the next token:
questionnaire_item_id A B C D
0 1 0.658 0.227 0.054 0.061

8. What to Take Away#

  • Use inference_mode="completion" when you want to inspect the rendered base-model prompt.

  • Use inference_mode="completion" when you run the survey.

  • set_base_model_prompt_template() controls role-like prefixes and the separator between blocks.

  • A prefix such as Answer: ( or a marked response span can make the desired continuation much more likely.

  • LogprobResponseGenerationMethod can restrict the answer labels and expose their relative probabilities.

  • Prefixes are guidance, not a hard guarantee. Different models, questions, and sampling settings can still produce unexpected text.

When every response must be valid, use a suitable Response Generation Method rather than relying on prompting alone.