QSTN

QSTN#

Prompt Generation#

class qstn.prompt_builder.BaseModelPromptTemplate(user_prefix=None, assistant_prefix=None, separator='\n', system_prefix=None)[source]#

Bases: object

Template used to render chat-style turns for base-model prompts.

Parameters:

user_prefix (str | None)
assistant_prefix (str | None)
separator (str)
system_prefix (str | None)

assistant_prefix: str | None = None#

separator: str = '\n'#

system_prefix: str | None = None#

user_prefix: str | None = None#

class qstn.prompt_builder.LLMPrompt(questionnaire_source=None, questionnaire_name='Questionnaire', system_prompt='You will be given questions and possible answer options for each. Please reason about each question before answering.', prompt='{{QUESTION_PLACEHOLDER}}\n{{OPTIONS_PLACEHOLDER}}', verbose=False, seed=42)[source]#

Bases: object

Main class for setting up and managing the prompt in the LLM experiment.

This class handles loading questions: from a predefined questionnaire, preparing prompts, managing answer options, and generating prompt structures for different interview types.

Parameters:

questionnaire_source (str | DataFrame)
questionnaire_name (str)
system_prompt (str | None)
prompt (str)
verbose (bool)
seed (int)

DEFAULT_JSON_STRUCTURE: list[str] = ['reasoning', 'answer']#

DEFAULT_PROMPT_STRUCTURE: str = '{{QUESTION_PLACEHOLDER}}\n{{OPTIONS_PLACEHOLDER}}'#

DEFAULT_QUESTIONNAIRE_ID: str = 'Questionnaire'#

DEFAULT_SYSTEM_PROMPT: str = 'You will be given questions and possible answer options for each. Please reason about each question before answering.'#

DEFAULT_TASK_INSTRUCTION: str = ''#

add_image(image, *, item_id=None)[source]#

Add an image globally or to one questionnaire item.

Parameters:

image (ImageSource) – Image input, URL, data URL, or local image path.
item_id (Any) – Questionnaire item receiving the image. If omitted, the image applies to the full prompt.

Returns:

The current prompt object for fluent configuration.

Return type:

LLMPrompt

base_model_prompt_template: BaseModelPromptTemplate | None#

calculate_input_token_estimate(model_id, tokenizer_backend, questionnaire_type=QuestionnairePresentation.SINGLE_ITEM, inference_mode='chat', item_separator='\n', previous_response_token_estimate=100)[source]#

Estimate the largest input-token context for a questionnaire prompt.

Parameters:

model_id (str) – Model identifier for the selected tokenizer backend.
tokenizer_backend (str) – Tokenizer backend, either “tiktoken” or “transformers”.
questionnaire_type (QuestionnairePresentation) – Type of questionnaire prompt.
inference_mode (str) – If “chat”, count chat message inputs. If “generation”, count the rendered base-model prompt.
item_separator (str) – Separator used between items for battery prompts.
previous_response_token_estimate (int) – Estimated tokens per previous assistant answer in sequential presentation. Image tokens are not included in the estimate.

Returns:

Estimated largest input-token context for a single model request.

Return type:

int

duplicate()[source]#

Create a deep copy of the current interview instance.

Returns:: A deep copy of the current object.
Return type:: LLMQuestionnaire

generate_question_prompt(questionnaire_items)[source]#

Generate the prompt string for a single interview question.

Parameters:: questionnaire_items (InterviewItem) – The question to prompt.
Returns:: The formatted prompt for the question.
Return type:: str

get_images(*, item_id=None, include_global=True)[source]#

Return prompt-wide and optionally item-specific images.

Parameters:

item_id (Any) – Questionnaire item whose images should be included.
include_global (bool) – Whether prompt-wide images should be returned first.

Returns:

Immutable image collection.

Return type:

tuple[ImageInput, …]

get_prompt_for_questionnaire_type(questionnaire_type=QuestionnairePresentation.SINGLE_ITEM, item_id=None, item_position=0, item_separator='\n', inference_mode='chat')[source]#

Generate the full prompt for a questionnaire presentation.

Parameters:

questionnaire_type (QuestionnairePresentation) – Presentation mode used to render the questionnaire.
item_id (str | int | None) – Questionnaire item ID used for item-specific presentations. If supplied, it takes precedence over item_position.
item_position (int | None) – Questionnaire position used when item_id is omitted.
item_separator (str) – Text separating rendered questions in battery mode.
inference_mode (Literal['chat', 'completion']) – Return chat content or a rendered completion prompt.

Returns:

The system prompt and user content. Image-free user content remains a string; image-bearing chat content is returned as ordered text and ImageInput blocks.

Raises:

ValueError – If the requested item, presentation, or inference mode is invalid, or images are used with completion mode.

Return type:

tuple[str | None, PromptContent]

get_question(position)[source]#

Return a question by positional index.

Parameters:: position (int)
Return type:: QuestionnaireItem

get_question_item_id(position)[source]#

Return the questionnaire item id at a given index.

Parameters:: position (int)
Return type:: Any

get_questions()[source]#

Get an immutable snapshot of loaded interview questions.

Returns:: Loaded questions.
Return type:: Tuple[QuestionnaireItem, …]

insert_questions(items, position=None)[source]#

Inserts one or more questions into the questionnaire.

Parameters:

items (Union[QuestionnaireItem, List[QuestionnaireItem]]) – A single QuestionnaireItem or a list of items to insert.
position (int) – The index where the questions should be inserted. Default [None] adds them at the end.

Return type:

None

load_questionnaire_format(questionnaire_source)[source]#

Load questionnaire items from a CSV file or pandas DataFrame.

The source must include questionnaire_item_id. It may also include question text, stems, prefilled responses, answer option columns, Likert generation columns, and simple response-generation presets. List-like columns must contain Python lists or Python-list strings, for example [“No”, “Yes”].

Parameters:: questionnaire_source (str or pd.Dataframe) – Path to a CSV file or a DataFrame.
Returns:: The updated instance with loaded questions.
Return type:: Self

prepare_prompt(question_stem: str | None = None, answer_options: AnswerOptions | None = None, prefilled_responses: dict[int, str] | None = None, randomized_item_order: bool = False) → Self[source]#

prepare_prompt(question_stem: list[str] | None = None, answer_options: dict[str, AnswerOptions] | None = None, prefilled_responses: dict[int, str] | None = None, randomized_item_order: bool = False) → Self

Prepare the interview by assigning question stems, answer options, and prefilled responses.

Parameters:

question_stem (str or List[str], optional) – Single or list of question stems.
answer_options (AnswerOptions or Dict[int, AnswerOptions], optional) – Answer options for all or per question.
prefilled_responses (Dict[int, str], optional) – If you provide prefilled responses, they will be used
question. (to fill the answers instead of prompting the LLM for that)
randomized_item_order (bool) – If True, randomize the order of questions.

Returns:

The updated instance with prepared questions.

Return type:

Self

prompt: str#

questionnaire_name: str#

property questions: tuple[QuestionnaireItem, ...]#: Read-only view of questionnaire items.

remove_question(position)[source]#

Remove the question at a given index.

Parameters:: position (int)
Return type:: None

render_base_model_prompt(system_message, prompts, assistant_messages=None)[source]#

Render chat-style turns into the exact prompt used for base-model generation.

Parameters:

system_message (str | None) – Optional system text to place before the turns.
prompts (list[str]) – User turns to render.
assistant_messages (list[str] | None) – Assistant history between user turns.

Returns:

Rendered base-model prompt.

Return type:

str

replace_question(position, questionnaire_item)[source]#

Replace the question at a given index.

Parameters:

position (int)
questionnaire_item (QuestionnaireItem)

Return type:

None

set_base_model_prompt_template(template=None, user_prefix=None, assistant_prefix=None, separator='\n', system_prefix=None)[source]#

Set the template used when rendering prompts for base-model completion mode.

Parameters:

template (BaseModelPromptTemplate | None) – Existing template object to store.
user_prefix (str | None) – Prefix placed before each user turn.
assistant_prefix (str | None) – Prefix placed before assistant turns and final cue.
separator (str) – Text inserted between rendered conversation blocks.
system_prefix (str | None) – Optional prefix placed before the system prompt.

Returns:

The current prompt object for fluent configuration.

Return type:

LLMPrompt

set_images(images, *, item_id=None)[source]#

Replace global images or the images for one questionnaire item.

Parameters:

images (Sequence[ImageSource]) – Images, URLs, data URLs, or local image paths to store.
item_id (Any) – Questionnaire item receiving the images. If omitted, replaces prompt-wide images.

Returns:

The current prompt object for fluent configuration.

Return type:

LLMPrompt

system_prompt: str | None#

verbose: bool#

class qstn.prompt_builder.ResponseGenerationPreset(*values)[source]#

Bases: StrEnum

Named response-generation methods supported by questionnaire loading.

CHOICE = 'choice'#

JSON_DISTRIBUTION = 'json_distribution'#

JSON_REASONING = 'json_reasoning'#

JSON_SINGLE = 'json_single'#

LOGPROB = 'logprob'#

NONE = 'none'#

qstn.prompt_builder.generate_likert_options(n, answer_texts, only_from_to_scale=False, random_order=False, reversed_order=False, even_order=False, add_middle_category=False, str_middle_cat='Neutral', add_refusal=False, refusal_code='-99', start_idx=1, list_prompt_template='Options are: {options}', scale_prompt_template='Options range from {start} to {end}', index_answer_separator=': ', options_separator=', ', idx_type='integer', response_generation_method=None)[source]#

Generates a set of options and a prompt for a Likert-style scale.

This function creates a numeric or alphabetic scale of a specified size (n), optionally attaching textual labels to the scale. It provides extensive control over ordering, formatting, and the final prompt string.

Parameters:

n (int) – The number of options to generate (e.g., 5 for a 5-point scale).
answer_texts (Optional[List[str]]) – A list of text labels for each option. Its length must equal n if provided.
only_from_to_scale (bool, optional) – If True, the prompt will only show the min and max of the scale (e.g., “1 to 5”). Defaults to False.
random_order (bool, optional) – If True, the options are randomized. Defaults to False.
reversed_order (bool, optional) – If True, the options are in reversed input order. Defaults to False.
even_order (bool, optional) – If True, options the center option will be removed. E.g., for n=5: 1, 2, 4, 5
add_middle_category (bool, optional) – If True, a middle category will be added. The name can be specified, by default it is “Neutral”. E.g., for n=4: 1, 2, 3: Neutral, 4, 5
str_middle_cat (str, optional) – The label for the middle category if add_middle_category is True. Defaults to “Neutral”.
add_refusal (bool, optional) – If True, an additional option for “Don’t know / Refuse to answer” will be added. Defaults to False.
refusal_code (str, optional) – The code assigned to the refusal option if add_refusal is True. Defaults to “-99”.
start_idx (int, optional) – The starting index for the scale (usually 0 or 1). Defaults to 1.
list_prompt_template (str, optional) – The template for prompts that list all options.
scale_prompt_template (str, optional) – The template for prompts that only show the range.
index_answer_separator (str, optional) – The string used to separate an index from its text label (e.g., “1: Strongly Agree”). Defaults to “: “.
options_separator (str, optional) – The string used to separate options when listed in the prompt. Defaults to “, “.
idx_type (_IDX_TYPES, optional) – The type of index to use: “integer”, “upper” (A, B, C), or “lower” (a, b, c). Defaults to “integer”.
response_generation_method (Optional[ResponseGenerationMethod], optional) – An object controlling how the final response object is generated. Defaults to None.

Raises:

ValueError – If answer_texts is provided and its length does not match n.

Returns:

An object containing the generated list of option strings and the final formatted prompt ready for display.

Return type:

AnswerOptions

Example

# Generate a classic 5-point "Strongly Disagree" to "Strongly Agree" scale
labels = [
    "Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"
]
options = SurveyOptionGenerator.generate_likert_options(n=5, answer_texts=labels)

qstn.prompt_builder.messages_to_base_model_prompt(messages, prompt_template=None)[source]#

Render chat-style messages into a plain prompt for base models.

Parameters:

messages (Sequence[dict[str, str]])
prompt_template (BaseModelPromptTemplate | None)

Return type:

str

Survey Manager#

Module for managing and conducting surveys using LLM models.

This module provides functions to conduct surveys in different ways: - Single-item - battery - sequential

Usage example:#

from qstn import survey_manager
from qstn.prompt_builder import LLMPrompt
from qstn.utilities import placeholder
from vllm import LLM

import pandas as pd

questionnaire = [
    {"questionnaire_item_id": 1, "question_content": "The Democratic Party?"},
    {"questionnaire_item_id": 2, "question_content": "The Republican Party?"},
]
party_questionnaire = pd.DataFrame(questionnaire)


system_prompt = (
    "Act as if you were a black middle aged man from New York! "
    "Answer in a single short sentence!"
)
prompt = (
    "Please tell us how you feel about the following parties: "
    + placeholder.PROMPT_QUESTIONS
)

questionnaire = LLMPrompt(
    questionnaire_name="political_parties",
    questionnaire_source=party_questionnaire,
    system_prompt=system_prompt,
    prompt=prompt,
)

model_id = "meta-llama/Llama-3.2-3B-Instruct"
chat_generator = LLM(model_id, max_model_len=5000, seed=42)

results = survey_manager.conduct_survey_single_item(
    chat_generator,
    questionnaire,
    client_model_name=model_id,
    print_conversation=True,
    # We can use the same inference arguments for inference, as we would for vllm or OpenAI
    temperature=0.8,
    max_tokens=5000,
)

class qstn.survey_manager.SurveyCreator[source]#

Bases: object

Helper class to create LLM prompts from a population CSV/DataFrame and questionnaire.

classmethod from_dataframe(survey_dataframe, questionnaire_dataframe)[source]#

Generates LLMPrompt objects from two pandas DataFrames.

Parameters:

survey_dataframe (pandas.DataFrame) – A DataFrame containing survey data (questionnaire_name, system_prompt, and questionnaire_instruction).
questionnaire_dataframe (pandas.DataFrame) – A DataFrame containing the questions.

Returns:

A list of LLMQuestionnaire objects.

Return type:

list[LLMPrompt]

classmethod from_path(survey_path, questionnaire_path)[source]#

Generates LLMPrompt objects from two CSV files (population/survey and questionnaire). :param survey_path: The path to the survey CSV file. :type survey_path: str

Returns:

A list of LLMQuestionnaire objects.

Parameters:

survey_path (str)
questionnaire_path (str)

Return type:

list[LLMPrompt]

qstn.survey_manager.conduct_survey_battery(model, llm_prompts, client_model_name=None, api_concurrency=10, n_save_step=None, intermediate_save_file=None, print_conversation=False, print_progress=True, seed=42, item_separator='\n', inference_mode='chat', **generation_kwargs)[source]#

Conducts the entire survey in one single LLM prompt (battery presentation).

System Prompt -> User Prompt with all questions -> LLM Answers all questions

Parameters:

model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.
llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.
client_model_name (str, optional) – Name of model when using OpenAI client.
api_concurrency (int) – Number of concurrent API requests. Defaults to 10.
print_conversation (bool) – If True, prints all conversations to stdout. Default False.
print_progress (bool) – If True, shows a tqdm progress bar. Default True.
n_save_step (int, optional) – Save intermediate results every n steps.
intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.
seed (int) – Random seed for reproducibility. Defaults to 42.
item_separator (str) – The str that separates each question. Defaults to a newline.
inference_mode (str) – Use “chat” for message-based models or “completion” for base-model text generation. Defaults to “chat”.
generation_kwargs (Any) – Additional generation parameters that will be given to vllm, vllm.SamplingParams, or OpenAI-compatible generation calls.

Returns:

A list of results containing the survey data and: LLM responses for each provided prompt.

Return type:

List(InferenceResult)

qstn.survey_manager.conduct_survey_sequential(model, llm_prompts, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, n_save_step=None, intermediate_save_file=None, seed=42, inference_mode='chat', **generation_kwargs)[source]#

Conducts the survey in multiple chat calls, where all questions and answers are kept in context (sequential presentation).

System Prompt -> User Prompt with first question -> LLM Answer to first question -> User Prompt with second question -> ….

Parameters:

model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.
llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.
client_model_name (str, optional) – Name of model when using OpenAI client.
api_concurrency (int) – Number of concurrent API requests. Defaults to 10.
print_conversation (bool) – If True, prints all conversations to stdout. Default False.
print_progress (bool) – If True, shows a tqdm progress bar. Default True.
n_save_step (int, optional) – Save intermediate results every n steps.
intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.
seed (int) – Random seed for reproducibility. Defaults to 42.
inference_mode (str) – Use “chat” for message-based models or “completion” for base-model text generation. Defaults to “chat”.
generation_kwargs (Any) – Additional generation parameters that will be given to vllm, vllm.SamplingParams, or OpenAI-compatible generation calls.

Returns:

A list of results containing the survey data and: LLM responses for each provided prompt.

Return type:

List(InferenceResult)

qstn.survey_manager.conduct_survey_single_item(model, llm_prompts, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, n_save_step=None, intermediate_save_file=None, seed=42, inference_mode='chat', **generation_kwargs)[source]#

Conducts a survey by asking each question in a new context (single item presentation).

System Prompt -> User Prompt with one question -> LLM Answer for one question -> Reset Context -> New instance with System Prompt

Parameters:

model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.
llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.
client_model_name (str, optional) – Name of model when using OpenAI client.
api_concurrency (int) – Number of concurrent API requests. Defaults to 10.
print_conversation (bool) – If True, prints all conversations to stdout. Default False.
print_progress (bool) – If True, shows a tqdm progress bar. Default True.
n_save_step (int, optional) – Save intermediate results every n steps.
intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.
seed (int) – Random seed for reproducibility. Defaults to 42.
inference_mode (str) – Use “chat” for message-based models or “completion” for base-model text generation. Defaults to “chat”.
generation_kwargs (Any) – Additional generation parameters that will be given to vllm, vllm.SamplingParams, or OpenAI-compatible generation calls.

Returns:

A list of results containing the survey data and: LLM responses for each provided prompt.

Return type:

List(InferenceResult)

QSTN

Contents

QSTN#

Prompt Generation#

Survey Manager#

Usage example:#

Subpackages#