QSTN#
Prompt Generation#
- class qstn.prompt_builder.BaseModelPromptTemplate(user_prefix='User:', assistant_prefix='Assistant:', separator='\n', system_prefix=None)[source]#
Bases:
objectTemplate used to render chat-style turns for base-model prompts.
- Parameters:
user_prefix (str | None)
assistant_prefix (str | None)
separator (str)
system_prefix (str | None)
- assistant_prefix: str | None = 'Assistant:'#
- separator: str = '\n'#
- system_prefix: str | None = None#
- user_prefix: str | None = 'User:'#
- class qstn.prompt_builder.LLMPrompt(questionnaire_source=None, questionnaire_name='Questionnaire', system_prompt='You will be given questions and possible answer options for each. Please reason about each question before answering.', prompt='{{QUESTION_PLACEHOLDER}}\n{{OPTIONS_PLACEHOLDER}}', verbose=False, seed=42)[source]#
Bases:
objectMain class for setting up and managing the prompt in the LLM experiment.
- This class handles loading questions
from a predefined questionnaire, preparing prompts, managing answer options, and generating prompt structures for different interview types.
- Parameters:
questionnaire_source (str | DataFrame)
questionnaire_name (str)
system_prompt (str | None)
prompt (str)
verbose (bool)
seed (int)
- DEFAULT_JSON_STRUCTURE: list[str] = ['reasoning', 'answer']#
- DEFAULT_PROMPT_STRUCTURE: str = '{{QUESTION_PLACEHOLDER}}\n{{OPTIONS_PLACEHOLDER}}'#
- DEFAULT_QUESTIONNAIRE_ID: str = 'Questionnaire'#
- DEFAULT_SYSTEM_PROMPT: str = 'You will be given questions and possible answer options for each. Please reason about each question before answering.'#
- DEFAULT_TASK_INSTRUCTION: str = ''#
- base_model_prompt_template: BaseModelPromptTemplate | None#
- calculate_input_token_estimate(model_id, tokenizer_backend, questionnaire_type=QuestionnairePresentation.SINGLE_ITEM, inference_type='chat', item_separator='\n', previous_response_token_estimate=100)[source]#
Estimate the largest input-token context for a questionnaire prompt.
- Parameters:
model_id (str) – Model identifier for the selected tokenizer backend.
tokenizer_backend (str) – Tokenizer backend, either “tiktoken” or “transformers”.
questionnaire_type (QuestionnairePresentation) – Type of questionnaire prompt.
inference_type (str) – If “chat”, count chat message inputs. If “generation”, count the rendered base-model prompt.
item_separator (str) – Separator used between items for battery prompts.
previous_response_token_estimate (int) – Estimated tokens per previous assistant answer in sequential presentation.
- Returns:
Estimated largest input-token context for a single model request.
- Return type:
int
- duplicate()[source]#
Create a deep copy of the current interview instance.
- Returns:
A deep copy of the current object.
- Return type:
LLMQuestionnaire
- generate_question_prompt(questionnaire_items)[source]#
Generate the prompt string for a single interview question.
- Parameters:
questionnaire_items (InterviewItem) – The question to prompt.
- Returns:
The formatted prompt for the question.
- Return type:
str
- get_prompt_for_questionnaire_type(questionnaire_type=QuestionnairePresentation.SINGLE_ITEM, item_id=None, item_position=0, item_separator='\n', inference_type='chat')[source]#
Generate the full prompt for a given questionnaire presentation.
- Parameters:
quesitonnaire_type (QuestionnairePresentation) – The type of questionnaire prompt to generate.
item_id (str) – The id of the questionnaire_item that should be shown. If both item_id and item_position are provided, only item_id is considered.
item_position (int) – The question at that position will be shown. If both item_id and item_position are provided, only item_id is considered. Defaults to the first question.
item_separator (str) – For QuestionnairePresentation.BATTERY decides the str that seperates each question.
inference_type (str) – If “chat”, return system and user messages. If “generation”, return the exact rendered base-model prompt.
questionnaire_type (QuestionnairePresentation)
- Returns:
- The first element corresponds to the system_prompt,
the second element to the prompt.
- Return type:
Tuple(str | None, str)
- get_question(position)[source]#
Return a question by positional index.
- Parameters:
position (int)
- Return type:
- get_question_item_id(position)[source]#
Return the questionnaire item id at a given index.
- Parameters:
position (int)
- Return type:
Any
- get_questions()[source]#
Get an immutable snapshot of loaded interview questions.
- Returns:
Loaded questions.
- Return type:
Tuple[QuestionnaireItem, …]
- insert_questions(items, position=None)[source]#
Inserts one or more questions into the questionnaire.
- Parameters:
items (Union[QuestionnaireItem, List[QuestionnaireItem]]) – A single QuestionnaireItem or a list of items to insert.
position (int) – The index where the questions should be inserted. Default [None] adds them at the end.
- Return type:
None
- load_questionnaire_format(questionnaire_source)[source]#
Load questionnaire items from a CSV file or pandas DataFrame.
The source must include questionnaire_item_id. It may also include question text, stems, prefilled responses, answer option columns, Likert generation columns, and simple response-generation presets. List-like columns must contain Python lists or Python-list strings, for example [“No”, “Yes”].
- Parameters:
questionnaire_source (str or pd.Dataframe) – Path to a CSV file or a DataFrame.
- Returns:
The updated instance with loaded questions.
- Return type:
Self
- prepare_prompt(question_stem: str | None = None, answer_options: AnswerOptions | None = None, prefilled_responses: dict[int, str] | None = None, randomized_item_order: bool = False) Self[source]#
- prepare_prompt(question_stem: list[str] | None = None, answer_options: dict[str, AnswerOptions] | None = None, prefilled_responses: dict[int, str] | None = None, randomized_item_order: bool = False) Self
Prepare the interview by assigning question stems, answer options, and prefilled responses.
- Parameters:
question_stem (str or List[str], optional) – Single or list of question stems.
answer_options (AnswerOptions or Dict[int, AnswerOptions], optional) – Answer options for all or per question.
prefilled_responses (Dict[int, str], optional) – If you provide prefilled responses, they will be used
question. (to fill the answers instead of prompting the LLM for that)
randomized_item_order (bool) – If True, randomize the order of questions.
- Returns:
The updated instance with prepared questions.
- Return type:
Self
- prompt: str#
- questionnaire_name: str#
- property questions: tuple[QuestionnaireItem, ...]#
Read-only view of questionnaire items.
- remove_question(position)[source]#
Remove the question at a given index.
- Parameters:
position (int)
- Return type:
None
- render_base_model_prompt(system_message, prompts, assistant_messages=None)[source]#
Render chat-style turns into the exact prompt used for base-model generation.
- Parameters:
system_message (str | None) – Optional system text to place before the turns.
prompts (list[str]) – User turns to render.
assistant_messages (list[str] | None) – Assistant history between user turns.
- Returns:
Rendered base-model prompt.
- Return type:
str
- replace_question(position, questionnaire_item)[source]#
Replace the question at a given index.
- Parameters:
position (int)
questionnaire_item (QuestionnaireItem)
- Return type:
None
- set_base_model_prompt_template(template=None, user_prefix='User:', assistant_prefix='Assistant:', separator='\n', system_prefix=None)[source]#
Set the template used when rendering prompts for base-model completion mode.
- Parameters:
template (BaseModelPromptTemplate | None) – Existing template object to store.
user_prefix (str | None) – Prefix placed before each user turn.
assistant_prefix (str | None) – Prefix placed before assistant turns and final cue.
separator (str) – Text inserted between rendered conversation blocks.
system_prefix (str | None) – Optional prefix placed before the system prompt.
- Returns:
The current prompt object for fluent configuration.
- Return type:
- system_prompt: str | None#
- verbose: bool#
- class qstn.prompt_builder.ResponseGenerationPreset(*values)[source]#
Bases:
StrEnumNamed response-generation methods supported by questionnaire loading.
- CHOICE = 'choice'#
- JSON_DISTRIBUTION = 'json_distribution'#
- JSON_REASONING = 'json_reasoning'#
- JSON_SINGLE = 'json_single'#
- LOGPROB = 'logprob'#
- NONE = 'none'#
- qstn.prompt_builder.generate_likert_options(n, answer_texts, only_from_to_scale=False, random_order=False, reversed_order=False, even_order=False, add_middle_category=False, str_middle_cat='Neutral', add_refusal=False, refusal_code='-99', start_idx=1, list_prompt_template='Options are: {options}', scale_prompt_template='Options range from {start} to {end}', index_answer_separator=': ', options_separator=', ', idx_type='integer', response_generation_method=None)[source]#
Generates a set of options and a prompt for a Likert-style scale.
This function creates a numeric or alphabetic scale of a specified size (n), optionally attaching textual labels to the scale. It provides extensive control over ordering, formatting, and the final prompt string.
- Parameters:
n (int) – The number of options to generate (e.g., 5 for a 5-point scale).
answer_texts (Optional[List[str]]) – A list of text labels for each option. Its length must equal n if provided.
only_from_to_scale (bool, optional) – If True, the prompt will only show the min and max of the scale (e.g., “1 to 5”). Defaults to False.
random_order (bool, optional) – If True, the options are randomized. Defaults to False.
reversed_order (bool, optional) – If True, the options are in reversed input order. Defaults to False.
even_order (bool, optional) – If True, options the center option will be removed. E.g., for n=5: 1, 2, 4, 5
add_middle_category (bool, optional) – If True, a middle category will be added. The name can be specified, by default it is “Neutral”. E.g., for n=4: 1, 2, 3: Neutral, 4, 5
str_middle_cat (str, optional) – The label for the middle category if add_middle_category is True. Defaults to “Neutral”.
add_refusal (bool, optional) – If True, an additional option for “Don’t know / Refuse to answer” will be added. Defaults to False.
refusal_code (str, optional) – The code assigned to the refusal option if add_refusal is True. Defaults to “-99”.
start_idx (int, optional) – The starting index for the scale (usually 0 or 1). Defaults to 1.
list_prompt_template (str, optional) – The template for prompts that list all options.
scale_prompt_template (str, optional) – The template for prompts that only show the range.
index_answer_separator (str, optional) – The string used to separate an index from its text label (e.g., “1: Strongly Agree”). Defaults to “: “.
options_separator (str, optional) – The string used to separate options when listed in the prompt. Defaults to “, “.
idx_type (_IDX_TYPES, optional) – The type of index to use: “integer”, “upper” (A, B, C), or “lower” (a, b, c). Defaults to “integer”.
response_generation_method (Optional[ResponseGenerationMethod], optional) – An object controlling how the final response object is generated. Defaults to None.
- Raises:
ValueError – If answer_texts is provided and its length does not match n.
- Returns:
An object containing the generated list of option strings and the final formatted prompt ready for display.
- Return type:
Example
# Generate a classic 5-point "Strongly Disagree" to "Strongly Agree" scale labels = [ "Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree" ] options = SurveyOptionGenerator.generate_likert_options(n=5, answer_texts=labels)
- qstn.prompt_builder.messages_to_base_model_prompt(messages, prompt_template=None)[source]#
Render chat-style messages into a plain prompt for base models.
- Parameters:
messages (Sequence[dict[str, str]])
prompt_template (BaseModelPromptTemplate | None)
- Return type:
str
Survey Manager#
Module for managing and conducting surveys using LLM models.
This module provides functions to conduct surveys in different ways: - Single-item - battery - sequential
Usage example:#
from qstn import survey_manager
from qstn.prompt_builder import LLMPrompt
from qstn.utilities import placeholder
from vllm import LLM
import pandas as pd
questionnaire = [
{"questionnaire_item_id": 1, "question_content": "The Democratic Party?"},
{"questionnaire_item_id": 2, "question_content": "The Republican Party?"},
]
party_questionnaire = pd.DataFrame(questionnaire)
system_prompt = (
"Act as if you were a black middle aged man from New York! "
"Answer in a single short sentence!"
)
prompt = (
"Please tell us how you feel about the following parties: "
+ placeholder.PROMPT_QUESTIONS
)
questionnaire = LLMPrompt(
questionnaire_name="political_parties",
questionnaire_source=party_questionnaire,
system_prompt=system_prompt,
prompt=prompt,
)
model_id = "meta-llama/Llama-3.2-3B-Instruct"
chat_generator = LLM(model_id, max_model_len=5000, seed=42)
results = survey_manager.conduct_survey_single_item(
chat_generator,
questionnaire,
client_model_name=model_id,
print_conversation=True,
# We can use the same inference arguments for inference, as we would for vllm or OpenAI
temperature=0.8,
max_tokens=5000,
)
- class qstn.survey_manager.SurveyCreator[source]#
Bases:
objectHelper class to create LLM prompts from a population CSV/DataFrame and questionnaire.
- classmethod from_dataframe(survey_dataframe, questionnaire_dataframe)[source]#
Generates LLMPrompt objects from two pandas DataFrames.
- Parameters:
survey_dataframe (pandas.DataFrame) – A DataFrame containing survey data (questionnaire_name, system_prompt, and questionnaire_instruction).
questionnaire_dataframe (pandas.DataFrame) – A DataFrame containing the questions.
- Returns:
A list of LLMQuestionnaire objects.
- Return type:
list[LLMPrompt]
- classmethod from_path(survey_path, questionnaire_path)[source]#
Generates LLMPrompt objects from two CSV files (population/survey and questionnaire). :param survey_path: The path to the survey CSV file. :type survey_path: str
- Returns:
A list of LLMQuestionnaire objects.
- Parameters:
survey_path (str)
questionnaire_path (str)
- Return type:
list[LLMPrompt]
- qstn.survey_manager.conduct_survey_battery(model, llm_prompts, client_model_name=None, api_concurrency=10, n_save_step=None, intermediate_save_file=None, print_conversation=False, print_progress=True, seed=42, item_separator='\n', inference_mode='chat', **generation_kwargs)[source]#
Conducts the entire survey in one single LLM prompt (battery presentation).
System Prompt -> User Prompt with all questions -> LLM Answers all questions
- Parameters:
model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.
llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.
client_model_name (str, optional) – Name of model when using OpenAI client.
api_concurrency (int) – Number of concurrent API requests. Defaults to 10.
print_conversation (bool) – If True, prints all conversations to stdout. Default False.
print_progress (bool) – If True, shows a tqdm progress bar. Default True.
n_save_step (int, optional) – Save intermediate results every n steps.
intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.
seed (int) – Random seed for reproducibility. Defaults to 42.
item_separator (str) – The str that separates each question. Defaults to a newline.
inference_mode (str) – Use “chat” for message-based models or “completion” for base-model text generation. Defaults to “chat”.
generation_kwargs (Any) – Additional generation parameters that will be given to vllm, vllm.SamplingParams, or OpenAI-compatible generation calls.
- Returns:
- A list of results containing the survey data and
LLM responses for each provided prompt.
- Return type:
List(InferenceResult)
- qstn.survey_manager.conduct_survey_sequential(model, llm_prompts, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, n_save_step=None, intermediate_save_file=None, seed=42, inference_mode='chat', **generation_kwargs)[source]#
Conducts the survey in multiple chat calls, where all questions and answers are kept in context (sequential presentation).
System Prompt -> User Prompt with first question -> LLM Answer to first question -> User Prompt with second question -> ….
- Parameters:
model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.
llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.
client_model_name (str, optional) – Name of model when using OpenAI client.
api_concurrency (int) – Number of concurrent API requests. Defaults to 10.
print_conversation (bool) – If True, prints all conversations to stdout. Default False.
print_progress (bool) – If True, shows a tqdm progress bar. Default True.
n_save_step (int, optional) – Save intermediate results every n steps.
intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.
seed (int) – Random seed for reproducibility. Defaults to 42.
inference_mode (str) – Use “chat” for message-based models or “completion” for base-model text generation. Defaults to “chat”.
generation_kwargs (Any) – Additional generation parameters that will be given to vllm, vllm.SamplingParams, or OpenAI-compatible generation calls.
- Returns:
- A list of results containing the survey data and
LLM responses for each provided prompt.
- Return type:
List(InferenceResult)
- qstn.survey_manager.conduct_survey_single_item(model, llm_prompts, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, n_save_step=None, intermediate_save_file=None, seed=42, inference_mode='chat', **generation_kwargs)[source]#
Conducts a survey by asking each question in a new context (single item presentation).
System Prompt -> User Prompt with one question -> LLM Answer for one question -> Reset Context -> New instance with System Prompt
- Parameters:
model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.
llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.
client_model_name (str, optional) – Name of model when using OpenAI client.
api_concurrency (int) – Number of concurrent API requests. Defaults to 10.
print_conversation (bool) – If True, prints all conversations to stdout. Default False.
print_progress (bool) – If True, shows a tqdm progress bar. Default True.
n_save_step (int, optional) – Save intermediate results every n steps.
intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.
seed (int) – Random seed for reproducibility. Defaults to 42.
inference_mode (str) – Use “chat” for message-based models or “completion” for base-model text generation. Defaults to “chat”.
generation_kwargs (Any) – Additional generation parameters that will be given to vllm, vllm.SamplingParams, or OpenAI-compatible generation calls.
- Returns:
- A list of results containing the survey data and
LLM responses for each provided prompt.
- Return type:
List(InferenceResult)
Subpackages#
- Inference & Guided Decoding
- Dynamic Pydantic
- Response Generation Methods
ChoiceResponseGenerationMethodConstraintsJSONItemJSONObjectJSONReasoningResponseGenerationMethodJSONResponseGenerationMethodJSONSingleResponseGenerationMethodJSONVerbalizedDistributionLogprobResponseGenerationMethodLogprobResponseGenerationMethod.token_positionLogprobResponseGenerationMethod.token_limitLogprobResponseGenerationMethod.top_logprobsLogprobResponseGenerationMethod.allowed_choicesLogprobResponseGenerationMethod.ignore_reasoningLogprobResponseGenerationMethod.system_prompt_templateLogprobResponseGenerationMethod.output_index_onlyLogprobResponseGenerationMethod.get_automatic_prompt()
ResponseGenerationMethodconstrain_json_response_options()copy_json_response_generation_method()resolve_battery_response_generation_method()
- Inference
- Parser
- Utilities
- Constants
- Placeholder
- Prompt Perturbation
- Prompt Templates
- Survey Objects
AnswerOptionsAnswerTextsAnswerTexts.full_answersAnswerTexts.answer_textsAnswerTexts.indicesAnswerTexts.index_answer_seperatorAnswerTexts.option_seperatorsAnswerTexts.only_scaleAnswerTexts.answer_textsAnswerTexts.full_answersAnswerTexts.get_list_answer_texts()AnswerTexts.get_scale_answer_texts()AnswerTexts.index_answer_seperatorAnswerTexts.indicesAnswerTexts.only_scaleAnswerTexts.option_seperators
InferenceResultQuestionLLMResponseTupleQuestionnaireItem
- Util Functions