Parser#

qstn.parser.llm_answer_parser.parse_json(survey_results)[source]#

Parse JSON outputs of survey results with automatic battery routing.

Battery-style results (questionnaire_item_id == -1 as a single aggregated row) are routed to parse_json_battery; all others use standard JSON parsing.

Parameters:

survey_results (List[InferenceResult]) – Survey results returned by survey conduction methods.

Returns:

Mapping from questionnaire to parsed: dataframe. Battery-style inputs are returned in expanded per-question row format.

Return type:

Dict[LLMPrompt, pd.DataFrame]

qstn.parser.llm_answer_parser.parse_json_battery(survey_results)[source]#

Parse JSON outputs of battery-style survey results.

Expects one aggregated response row per questionnaire (item_id == -1) and expands this into one row per questionnaire item by matching top-level JSON keys to questionnaire questions and flattening the nested object for each question into that row.

Parameters:

survey_results (List[InferenceResult]) – Battery-style survey results with one aggregated response (item_id == -1) per questionnaire.

Returns:

Mapping from questionnaire to expanded: per-question dataframe.

Return type:

Dict[LLMPrompt, pd.DataFrame]

qstn.parser.llm_answer_parser.parse_json_str(answer)[source]#

Parameters:: answer (str)
Return type:: dict[str, str] | None

qstn.parser.llm_answer_parser.parse_logprobs(survey_results, choice_aliases=None, filter_to_answer_options=False)[source]#

Parse returned token logprobs, optionally aggregating them into answer choices.

Parameters:

survey_results (list[InferenceResult]) – Survey inference results containing token logprobs.
choice_aliases (dict[str, list[str]] | None) – Optional complete mapping from output labels to token spellings. Each key becomes one dataframe column. The probabilities of all matching spellings are added together for that column. When provided, this mapping overrides filter_to_answer_options and questionnaire answer options.
filter_to_answer_options (bool) – If True, retain only choices from each item’s attached answer options. Defaults to False, which returns every token.

Returns:

Mapping from questionnaires to per-item probability dataframes. Rows share the union of returned tokens or configured output labels.

Raises:

ValueError – If answer-option filtering is requested for an item without answer options, or if an empty alias mapping is provided.

Return type:

dict[LLMPrompt, DataFrame]

Examples

Aggregate several possible model tokens into one canonical answer:

>>> aliases = {
...     "Yes": ["Y", "Yes", "Yeah"],
...     "No": ["N", "No", "Nope"],
... }
>>> parsed = parse_logprobs(results, choice_aliases=aliases)

For the Yes column, probabilities found for Y, Yes, and Yeah are summed. The alias columns are then normalized to sum to one. This normalization only covers tokens present in the returned top-logprob payload, not the model’s complete vocabulary distribution.

qstn.parser.llm_answer_parser.parse_with_llm(model, survey_results, system_prompt='You are a helpful assistant.', prompt='Your task is to parse the correct answer option from an open text answer a LLM has given to survey questions. You will be provided with the survey question, possible answer options and the LLM answer.\n{automatic_output_instructions}\n{no_answer_option_instruction}Question: {question}\nPossible answer options: {answer_options}\nResponse by LLM: {llm_response}', battery_prompt='Your task is to parse answer options for multiple survey questions from one aggregated LLM response. {automatic_output_instructions}\n{no_answer_option_instruction}Questions: {question}\nPossible answer options per question:\n{answer_options}\nAggregated response by LLM: {llm_response}', response_generation_method=<qstn.inference.response_generation.JSONSingleResponseGenerationMethod object>, no_answer_option=None, answer_options=None, no_answer_option_instruction='If no valid answer can be extracted, return exactly "{no_answer_option}" for that question.\n', generation_fn=<function batch_generation>, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, seed=42, **generation_kwargs)[source]#

Parse free-text survey answers using LLM-as-a-judge with automatic battery routing.

Battery-style results are routed to parse_with_llm_battery. Non-battery results use the regular single-item/sequential parser flow.

Parameters:

model (LLM or AsyncOpenAI) – vLLM model or AsyncOpenAI client used for parser inference.
survey_results (List[InferenceResult]) – Survey results to parse.
system_prompt (str) – System prompt passed to parser inference.
prompt (str) – Prompt template for parser inference. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.
battery_prompt (str) – Prompt template used for battery-style parser routing. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.
( (response_generation_method) – ResponseGenerationMethod | List[ResponseGenerationMethod], optional
) – Constraint for parser output. One method is applied to all questions; alternatively, provide one method per question. The default is JSONSingleResponseGenerationMethod(). Passing None disables structured parsing and returns raw judge-model outputs. If no answer options are available for a question, the parser falls back to a free-text JSON value for that field instead of enforcing an enum.
generation_fn (Callable) – Generation function following the batch_generation output contract.
client_model_name (str, optional) – Model name for OpenAI client calls.
api_concurrency (int) – Max concurrent API requests for OpenAI calls.
print_conversation (bool) – Whether parser conversations are printed.
print_progress (bool) – Whether parser progress bars are shown.
no_answer_option (str, optional) – Optional additional answer label that allows parser output to mark unanswered/unparseable cases.
Dict[int (answer_options (AnswerOptions |) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
| (AnswerOptions]) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
no_answer_option_instruction (str) – Instruction inserted through the {no_answer_option_instruction} prompt placeholder when no_answer_option is set. It may contain {no_answer_option}.
seed (int) – Random seed for parser inference.
generation_kwargs (Any) – Additional generation kwargs passed to generation_fn.
response_generation_method (ResponseGenerationMethod | list[ResponseGenerationMethod] | None)
answer_options (AnswerOptions | dict[Any, AnswerOptions | dict[Any, AnswerOptions] | None] | None)

Returns:

Mapping from questionnaire to parsed (or: raw) dataframe. Includes source_llm_response for traceability.

Return type:

Dict[LLMPrompt, pd.DataFrame]

qstn.parser.llm_answer_parser.parse_with_llm_battery(model, survey_results, system_prompt='You are a helpful assistant.', prompt='Your task is to parse answer options for multiple survey questions from one aggregated LLM response. {automatic_output_instructions}\n{no_answer_option_instruction}Questions: {question}\nPossible answer options per question:\n{answer_options}\nAggregated response by LLM: {llm_response}', response_generation_method=<qstn.inference.response_generation.JSONSingleResponseGenerationMethod object>, no_answer_option=None, answer_options=None, no_answer_option_instruction='If no valid answer can be extracted, return exactly "{no_answer_option}" for that question.\n', generation_fn=<function batch_generation>, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, seed=42, **generation_kwargs)[source]#

Parse battery-style aggregated survey answers using LLM-as-a-judge.

Structured outputs are expanded to per-question rows via parse_json_battery. Passing response_generation_method=None returns one aggregated raw-output row.

Parameters:

model (LLM or AsyncOpenAI) – vLLM model or AsyncOpenAI client used for parser inference.
survey_results (List[InferenceResult]) – Battery-style survey results with one aggregated response (item_id == -1) per questionnaire.
system_prompt (str) – System prompt passed to parser inference.
prompt (str) – Prompt template for parser inference. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.
( (response_generation_method) – ResponseGenerationMethod | List[ResponseGenerationMethod], optional
) – Constraint for parser output. The default single-answer JSON method is expanded into a battery-aware schema. Passing None returns raw judge-model output. If no answer options are available for a question, that question falls back to a free-text JSON value instead of enforcing an enum.
generation_fn (Callable) – Generation function following the batch_generation output contract.
client_model_name (str, optional) – Model name for OpenAI client calls.
api_concurrency (int) – Max concurrent API requests for OpenAI calls.
print_conversation (bool) – Whether parser conversations are printed.
print_progress (bool) – Whether parser progress bars are shown.
no_answer_option (str, optional) – Optional additional answer label that allows parser output to mark unanswered/unparseable cases.
seed (int) – Random seed for parser inference.
Dict[int (answer_options (AnswerOptions |) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
| (AnswerOptions]) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
no_answer_option_instruction (str) – Instruction inserted through the {no_answer_option_instruction} prompt placeholder when no_answer_option is set. It may contain {no_answer_option}.
seed – Random seed for parser inference.
generation_kwargs (Any) – Additional generation kwargs passed to generation_fn.
response_generation_method (ResponseGenerationMethod | list[ResponseGenerationMethod] | None)
answer_options (AnswerOptions | dict[Any, AnswerOptions | dict[Any, AnswerOptions] | None] | None)

Returns:

Mapping from questionnaire to parsed (or: raw) dataframe. Includes source_llm_response.

Return type:

Dict[LLMPrompt, pd.DataFrame]

Raises:

ValueError – If any input result is not battery-style.

qstn.parser.llm_answer_parser.raw_responses(survey_results)[source]#

Organizes the questions and answers of a survey in a pandas Dataframe. :param survey_results List[InterviewResult]: All results for all interviews.

Returns:

A dictionary where the keys are the: LLMInterviews and the values are a Dataframe with questions/answers.

Return type:

Dict[LLMInterview, pd.Dataframe]

Parameters:

survey_results (list[InferenceResult])

qstn.parser.llm_answer_parser.to_dataframe(survey_results, *, choice_aliases=None, filter_to_answer_options=False)[source]#

Convert survey results to one dataframe using response-generation metadata.

Each result item is routed through the parser implied by its questionnaire metadata: JSON response methods use parse_json, logprob response methods use parse_logprobs, and all other responses use raw_responses. The parsed per-questionnaire frames are then combined with qstn.utilities.create_one_dataframe.

Parameters:

survey_results (list[InferenceResult]) – Survey results returned by survey conduction methods.
choice_aliases (dict[str, list[str]] | None) – Optional alias mapping forwarded to parse_logprobs.
filter_to_answer_options (bool) – If True, logprob rows are filtered to each item attached answer options.

Returns:

One dataframe containing all parsed survey rows. An empty dataframe is returned when no results are provided.

Return type:

DataFrame

Parser

Contents

Parser#