QSTN#

Prompt Generation#

class qstn.prompt_builder.LLMPrompt(questionnaire_source=None, questionnaire_name='Questionnaire', system_prompt='You will be given questions and possible answer options for each. Please reason about each question before answering.', prompt='{{QUESTION_PLACEHOLDER}}\n{{OPTIONS_PLACEHOLDER}}', verbose=False, seed=42)[source]#

Bases: object

Main class for setting up and managing the prompt in the LLM experiment.

This class handles loading questions

from a predefined questionnaire, preparing prompts, managing answer options, and generating prompt structures for different interview types.

Parameters:
  • questionnaire_source (str | DataFrame)

  • questionnaire_name (str)

  • system_prompt (str | None)

  • prompt (str)

  • verbose (bool)

  • seed (int)

DEFAULT_JSON_STRUCTURE: list[str] = ['reasoning', 'answer']#
DEFAULT_PROMPT_STRUCTURE: str = '{{QUESTION_PLACEHOLDER}}\n{{OPTIONS_PLACEHOLDER}}'#
DEFAULT_QUESTIONNAIRE_ID: str = 'Questionnaire'#
DEFAULT_SYSTEM_PROMPT: str = 'You will be given questions and possible answer options for each. Please reason about each question before answering.'#
DEFAULT_TASK_INSTRUCTION: str = ''#
duplicate()[source]#

Create a deep copy of the current interview instance.

Returns:

A deep copy of the current object.

Return type:

LLMQuestionnaire

generate_question_prompt(questionnaire_items)[source]#

Generate the prompt string for a single interview question.

Parameters:

questionnaire_items (InterviewItem) – The question to prompt.

Returns:

The formatted prompt for the question.

Return type:

str

get_prompt_for_questionnaire_type(questionnaire_type=QuestionnairePresentation.SINGLE_ITEM, item_id=None, item_position=0, item_separator='\n')[source]#

Generate the full prompt for a given questionnaire presentation.

Parameters:
  • quesitonnaire_type (QuestionnairePresentation) – The type of questionnaire prompt to generate.

  • item_id (str) – The id of the questionnaire_item that should be shown. If both item_id and item_position are provided, only item_id is considered.

  • item_position (int) – The question at that position will be shown. If both item_id and item_position are provided, only item_id is considered. Defaults to the first question.

  • item_separator (str) – For QuestionnairePresentation.BATTERY decides the str that seperates each question.

  • questionnaire_type (QuestionnairePresentation)

Returns:

The first element corresponds to the system_prompt,

the second element to the prompt.

Return type:

Tuple(str | None, str)

get_question(position)[source]#

Return a question by positional index.

Parameters:

position (int)

Return type:

QuestionnaireItem

get_question_item_id(position)[source]#

Return the questionnaire item id at a given index.

Parameters:

position (int)

Return type:

Any

get_questions()[source]#

Get an immutable snapshot of loaded interview questions.

Returns:

Loaded questions.

Return type:

Tuple[QuestionnaireItem, …]

insert_questions(items, position=None)[source]#

Inserts one or more questions into the questionnaire.

Parameters:
  • items (Union[QuestionnaireItem, List[QuestionnaireItem]]) – A single QuestionnaireItem or a list of items to insert.

  • position (int) – The index where the questions should be inserted. Default [None] adds them at the end.

Return type:

None

load_questionnaire_format(questionnaire_source)[source]#

Load the questionnaire format from a CSV file or a pandas DataFrame.

The CSV or pd.Dataframe must have the columns: questionnaire_item_id, question_content Optionally it can also have a question_stem.

Parameters:

questionnaire_source (str or pd.Dataframe) – Path to a valid CSV file or pd.Dataframe.

Returns:

The updated instance with loaded questions.

Return type:

Self

prepare_prompt(question_stem: str | None = None, answer_options: AnswerOptions | None = None, prefilled_responses: dict[int, str] | None = None, randomized_item_order: bool = False) Self[source]#
prepare_prompt(question_stem: list[str] | None = None, answer_options: dict[str, AnswerOptions] | None = None, prefilled_responses: dict[int, str] | None = None, randomized_item_order: bool = False) Self

Prepare the interview by assigning question stems, answer options, and prefilled responses.

Parameters:
  • question_stem (str or List[str], optional) – Single or list of question stems.

  • answer_options (AnswerOptions or Dict[int, AnswerOptions], optional) – Answer options for all or per question.

  • prefilled_responses (Dict[int, str], optional) – If you provide prefilled responses, they will be used

  • question. (to fill the answers instead of prompting the LLM for that)

  • randomized_item_order (bool) – If True, randomize the order of questions.

Returns:

The updated instance with prepared questions.

Return type:

Self

property questions: tuple[QuestionnaireItem, ...]#

Read-only view of questionnaire items.

remove_question(position)[source]#

Remove the question at a given index.

Parameters:

position (int)

Return type:

None

replace_question(position, questionnaire_item)[source]#

Replace the question at a given index.

Parameters:
Return type:

None

qstn.prompt_builder.generate_likert_options(n, answer_texts, only_from_to_scale=False, random_order=False, reversed_order=False, even_order=False, add_middle_category=False, str_middle_cat='Neutral', add_refusal=False, refusal_code='-99', start_idx=1, list_prompt_template='Options are: {options}', scale_prompt_template='Options range from {start} to {end}', index_answer_separator=': ', options_separator=', ', idx_type='integer', response_generation_method=None)[source]#

Generates a set of options and a prompt for a Likert-style scale.

This function creates a numeric or alphabetic scale of a specified size (n), optionally attaching textual labels to the scale. It provides extensive control over ordering, formatting, and the final prompt string.

Parameters:
  • n (int) – The number of options to generate (e.g., 5 for a 5-point scale).

  • answer_texts (Optional[List[str]]) – A list of text labels for each option. Its length must equal n if provided.

  • only_from_to_scale (bool, optional) – If True, the prompt will only show the min and max of the scale (e.g., “1 to 5”). Defaults to False.

  • random_order (bool, optional) – If True, the options are randomized. Defaults to False.

  • reversed_order (bool, optional) – If True, the options are in reversed input order. Defaults to False.

  • even_order (bool, optional) – If True, options the center option will be removed. E.g., for n=5: 1, 2, 4, 5

  • add_middle_category (bool, optional) – If True, a middle category will be added. The name can be specified, by default it is “Neutral”. E.g., for n=4: 1, 2, 3: Neutral, 4, 5

  • str_middle_cat (str, optional) – The label for the middle category if add_middle_category is True. Defaults to “Neutral”.

  • add_refusal (bool, optional) – If True, an additional option for “Don’t know / Refuse to answer” will be added. Defaults to False.

  • refusal_code (str, optional) – The code assigned to the refusal option if add_refusal is True. Defaults to “-99”.

  • start_idx (int, optional) – The starting index for the scale (usually 0 or 1). Defaults to 1.

  • list_prompt_template (str, optional) – The template for prompts that list all options.

  • scale_prompt_template (str, optional) – The template for prompts that only show the range.

  • index_answer_separator (str, optional) – The string used to separate an index from its text label (e.g., “1: Strongly Agree”). Defaults to “: “.

  • options_separator (str, optional) – The string used to separate options when listed in the prompt. Defaults to “, “.

  • idx_type (_IDX_TYPES, optional) – The type of index to use: “integer”, “upper” (A, B, C), or “lower” (a, b, c). Defaults to “integer”.

  • response_generation_method (Optional[ResponseGenerationMethod], optional) – An object controlling how the final response object is generated. Defaults to None.

Raises:

ValueError – If answer_texts is provided and its length does not match n.

Returns:

An object containing the generated list of option strings and the final formatted prompt ready for display.

Return type:

AnswerOptions

Example

# Generate a classic 5-point "Strongly Disagree" to "Strongly Agree" scale
labels = [
    "Strongly Disagree", "Disagree", "Neutral", "Agree", "Strongly Agree"
]
options = SurveyOptionGenerator.generate_likert_options(n=5, answer_texts=labels)

Survey Manager#

Module for managing and conducting surveys using LLM models.

This module provides functions to conduct surveys in different ways: - Single-item - battery - sequential

Usage example:#

from qstn import survey_manager
from qstn.prompt_builder import LLMPrompt
from qstn.utilities import placeholder
from vllm import LLM

import pandas as pd

questionnaire = [
    {"questionnaire_item_id": 1, "question_content": "The Democratic Party?"},
    {"questionnaire_item_id": 2, "question_content": "The Republican Party?"},
]
party_questionnaire = pd.DataFrame(questionnaire)


system_prompt = (
    "Act as if you were a black middle aged man from New York! "
    "Answer in a single short sentence!"
)
prompt = (
    "Please tell us how you feel about the following parties: "
    + placeholder.PROMPT_QUESTIONS
)

questionnaire = LLMPrompt(
    questionnaire_name="political_parties",
    questionnaire_source=party_questionnaire,
    system_prompt=system_prompt,
    prompt=prompt,
)

model_id = "meta-llama/Llama-3.2-3B-Instruct"
chat_generator = LLM(model_id, max_model_len=5000, seed=42)

results = survey_manager.conduct_survey_single_item(
    chat_generator,
    questionnaire,
    client_model_name=model_id,
    print_conversation=True,
    # We can use the same inference arguments for inference, as we would for vllm or OpenAI
    temperature=0.8,
    max_tokens=5000,
)
class qstn.survey_manager.SurveyCreator[source]#

Bases: object

Helper class to create LLM prompts from a population CSV/DataFrame and questionnaire.

classmethod from_dataframe(survey_dataframe, questionnaire_dataframe)[source]#

Generates LLMPrompt objects from two pandas DataFrames.

Parameters:
  • survey_dataframe (pandas.DataFrame) – A DataFrame containing survey data (questionnaire_name, system_prompt, and questionnaire_instruction).

  • questionnaire_dataframe (pandas.DataFrame) – A DataFrame containing the questions.

Returns:

A list of LLMQuestionnaire objects.

Return type:

list[LLMPrompt]

classmethod from_path(survey_path, questionnaire_path)[source]#

Generates LLMPrompt objects from two CSV files (population/survey and questionnaire). :param survey_path: The path to the survey CSV file. :type survey_path: str

Returns:

A list of LLMQuestionnaire objects.

Parameters:
  • survey_path (str)

  • questionnaire_path (str)

Return type:

list[LLMPrompt]

qstn.survey_manager.conduct_survey_battery(model, llm_prompts, client_model_name=None, api_concurrency=10, n_save_step=None, intermediate_save_file=None, print_conversation=False, print_progress=True, seed=42, item_separator='\n', **generation_kwargs)[source]#

Conducts the entire survey in one single LLM prompt (battery presentation).

System Prompt -> User Prompt with all questions -> LLM Answers all questions

Parameters:
  • model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.

  • llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.

  • client_model_name (str, optional) – Name of model when using OpenAI client.

  • api_concurrency (int) – Number of concurrent API requests. Defaults to 10.

  • print_conversation (bool) – If True, prints all conversations to stdout. Default False.

  • print_progress (bool) – If True, shows a tqdm progress bar. Default True.

  • n_save_step (int, optional) – Save intermediate results every n steps.

  • intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.

  • seed (int) – Random seed for reproducibility. Defaults to 42.

  • item_separator (str) – The str that separates each question. Defaults to a newline.

  • generation_kwargs (Any) – Additional generation parameters that will be given to vllm.chat(), vllm.SamplingParams, or client.chat.completions.create().

Returns:

A list of results containing the survey data and

LLM responses for each provided prompt.

Return type:

List(InferenceResult)

qstn.survey_manager.conduct_survey_sequential(model, llm_prompts, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, n_save_step=None, intermediate_save_file=None, seed=42, **generation_kwargs)[source]#

Conducts the survey in multiple chat calls, where all questions and answers are kept in context (sequential presentation).

System Prompt -> User Prompt with first question -> LLM Answer to first question -> User Prompt with second question -> ….

Parameters:
  • model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.

  • llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.

  • client_model_name (str, optional) – Name of model when using OpenAI client.

  • api_concurrency (int) – Number of concurrent API requests. Defaults to 10.

  • print_conversation (bool) – If True, prints all conversations to stdout. Default False.

  • print_progress (bool) – If True, shows a tqdm progress bar. Default True.

  • n_save_step (int, optional) – Save intermediate results every n steps.

  • intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.

  • seed (int) – Random seed for reproducibility. Defaults to 42.

  • generation_kwargs (Any) – Additional generation parameters that will be given to vllm.chat(), vllm.SamplingParams, or client.chat.completions.create().

Returns:

A list of results containing the survey data and

LLM responses for each provided prompt.

Return type:

List(InferenceResult)

qstn.survey_manager.conduct_survey_single_item(model, llm_prompts, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, n_save_step=None, intermediate_save_file=None, seed=42, **generation_kwargs)[source]#

Conducts a survey by asking each question in a new context (single item presentation).

System Prompt -> User Prompt with one question -> LLM Answer for one question -> Reset Context -> New instance with System Prompt

Parameters:
  • model (LLM or AsyncOpenAI) – vllm.LLM instance or AsyncOpenAI client.

  • llm_prompts (LLMPrompt or List(LLMPrompt)) – Single LLMPrompt or list of LLMPrompt objects to conduct as a survey.

  • client_model_name (str, optional) – Name of model when using OpenAI client.

  • api_concurrency (int) – Number of concurrent API requests. Defaults to 10.

  • print_conversation (bool) – If True, prints all conversations to stdout. Default False.

  • print_progress (bool) – If True, shows a tqdm progress bar. Default True.

  • n_save_step (int, optional) – Save intermediate results every n steps.

  • intermediate_save_file (str, optional) – Path to save intermediate results. Has to be provided if n_save_step.

  • seed (int) – Random seed for reproducibility. Defaults to 42.

  • generation_kwargs (Any) – Additional generation parameters that will be given to vllm.chat(), vllm.SamplingParams, or client.chat.completions.create().

Returns:

A list of results containing the survey data and

LLM responses for each provided prompt.

Return type:

List(InferenceResult)

Subpackages#