Parser#

qstn.parser.llm_answer_parser.parse_json(survey_results)[source]#

Parse JSON outputs of survey results with automatic battery routing.

Battery-style results (questionnaire_item_id == -1 as a single aggregated row) are routed to parse_json_battery; all others use standard JSON parsing.

Parameters:

survey_results (List[InferenceResult]) – Survey results returned by survey conduction methods.

Returns:

Mapping from questionnaire to parsed

dataframe. Battery-style inputs are returned in expanded per-question row format.

Return type:

Dict[LLMPrompt, pd.DataFrame]

qstn.parser.llm_answer_parser.parse_json_battery(survey_results)[source]#

Parse JSON outputs of battery-style survey results.

Expects one aggregated response row per questionnaire (item_id == -1) and expands this into one row per questionnaire item by matching top-level JSON keys to questionnaire questions and flattening the nested object for each question into that row.

Parameters:

survey_results (List[InferenceResult]) – Battery-style survey results with one aggregated response (item_id == -1) per questionnaire.

Returns:

Mapping from questionnaire to expanded

per-question dataframe.

Return type:

Dict[LLMPrompt, pd.DataFrame]

qstn.parser.llm_answer_parser.parse_json_str(answer)[source]#
Parameters:

answer (str)

Return type:

dict[str, str] | None

qstn.parser.llm_answer_parser.parse_logprobs(survey_results, allowed_choices)[source]#

Filter and aggregate logprobs returned by Logprob_AnswerProductionMethod.

Parameters:
  • survey_results (list[InferenceResult]) – List of InterviewResult that is returned from running a survey

  • allowed_choices (list[str] | dict[str, list[str]]) – List of possible answer options OR dictionary mapping options to multiple tokens that encode each option

Returns:

A dictionary where the keys are the

LLMInterviews and the values are a Dataframe with questions/answers.

Return type:

Dict[LLMInterview, pd.Dataframe]

qstn.parser.llm_answer_parser.parse_with_llm(model, survey_results, system_prompt='You are a helpful assistant.', prompt='Your task is to parse the correct answer option from an open text answer a LLM has given to survey questions. You will be provided with the survey question, possible answer options and the LLM answer.\n{automatic_output_instructions}\n{no_answer_option_instruction}Question: {question}\nPossible answer options: {answer_options}\nResponse by LLM: {llm_response}', battery_prompt='Your task is to parse answer options for multiple survey questions from one aggregated LLM response. {automatic_output_instructions}\n{no_answer_option_instruction}Questions: {question}\nPossible answer options per question:\n{answer_options}\nAggregated response by LLM: {llm_response}', response_generation_method=None, generation_fn=<function batch_generation>, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, use_parser=True, no_answer_option=None, seed=42, answer_options=None, **generation_kwargs)[source]#

Parse free-text survey answers using LLM-as-a-judge with automatic battery routing.

Battery-style results are routed to parse_with_llm_battery. Non-battery results use the regular single-item/sequential parser flow.

Parameters:
  • model (LLM or AsyncOpenAI) – vLLM model or AsyncOpenAI client used for parser inference.

  • survey_results (List[InferenceResult]) – Survey results to parse.

  • system_prompt (str) – System prompt passed to parser inference.

  • prompt (str) – Prompt template for parser inference. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.

  • battery_prompt (str) – Prompt template used for battery-style parser routing. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.

  • ( (response_generation_method) – ResponseGenerationMethod | List[ResponseGenerationMethod], optional

  • ) – Constraint for parser output. If use_parser=True and this is None, default JSON parser schemas are applied. If no answer options are available for a question, the parser falls back to a free-text JSON value for that field instead of enforcing an enum.

  • generation_fn (Callable) – Generation function following the batch_generation output contract.

  • client_model_name (str, optional) – Model name for OpenAI client calls.

  • api_concurrency (int) – Max concurrent API requests for OpenAI calls.

  • print_conversation (bool) – Whether parser conversations are printed.

  • print_progress (bool) – Whether parser progress bars are shown.

  • use_parser (bool) – If True, parser outputs are post-processed into structured dataframes (parse_json / parse_json_battery). If False, raw parser model outputs are returned.

  • no_answer_option (str, optional) – Optional additional answer label that allows parser output to mark unanswered/unparseable cases.

  • seed (int) – Random seed for parser inference.

  • Dict[int (answer_options (AnswerOptions |) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.

  • | (AnswerOptions]) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.

  • generation_kwargs (Any) – Additional generation kwargs passed to generation_fn.

  • response_generation_method (ResponseGenerationMethod | list[ResponseGenerationMethod] | None)

  • answer_options (AnswerOptions | dict[Any, AnswerOptions | dict[Any, AnswerOptions] | None] | None)

Returns:

Mapping from questionnaire to parsed (or

raw) dataframe. Includes source_llm_response for traceability.

Return type:

Dict[LLMPrompt, pd.DataFrame]

qstn.parser.llm_answer_parser.parse_with_llm_battery(model, survey_results, system_prompt='You are a helpful assistant.', prompt='Your task is to parse answer options for multiple survey questions from one aggregated LLM response. {automatic_output_instructions}\n{no_answer_option_instruction}Questions: {question}\nPossible answer options per question:\n{answer_options}\nAggregated response by LLM: {llm_response}', response_generation_method=None, generation_fn=<function batch_generation>, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, use_parser=True, no_answer_option=None, seed=42, answer_options=None, **generation_kwargs)[source]#

Parse battery-style aggregated survey answers using LLM-as-a-judge.

If use_parser=True, outputs are expanded to per-question rows via parse_json_battery. If use_parser=False, the single aggregated row is returned (raw parser output).

Parameters:
  • model (LLM or AsyncOpenAI) – vLLM model or AsyncOpenAI client used for parser inference.

  • survey_results (List[InferenceResult]) – Battery-style survey results with one aggregated response (item_id == -1) per questionnaire.

  • system_prompt (str) – System prompt passed to parser inference.

  • prompt (str) – Prompt template for parser inference. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.

  • ( (response_generation_method) – ResponseGenerationMethod | List[ResponseGenerationMethod], optional

  • ) – Constraint for parser output. If use_parser=True and this is None, a default battery-aware JSON schema is created. If no answer options are available for a question, that question falls back to a free-text JSON value instead of enforcing an enum.

  • generation_fn (Callable) – Generation function following the batch_generation output contract.

  • client_model_name (str, optional) – Model name for OpenAI client calls.

  • api_concurrency (int) – Max concurrent API requests for OpenAI calls.

  • print_conversation (bool) – Whether parser conversations are printed.

  • print_progress (bool) – Whether parser progress bars are shown.

  • use_parser (bool) – If True, parser outputs are expanded with parse_json_battery. If False, raw aggregated parser output is returned.

  • no_answer_option (str, optional) – Optional additional answer label that allows parser output to mark unanswered/unparseable cases.

  • seed (int) – Random seed for parser inference.

  • Dict[int (answer_options (AnswerOptions |) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.

  • | (AnswerOptions]) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.

  • generation_kwargs (Any) – Additional generation kwargs passed to generation_fn.

  • response_generation_method (ResponseGenerationMethod | list[ResponseGenerationMethod] | None)

  • answer_options (AnswerOptions | dict[Any, AnswerOptions | dict[Any, AnswerOptions] | None] | None)

Returns:

Mapping from questionnaire to parsed (or

raw) dataframe. Includes source_llm_response.

Return type:

Dict[LLMPrompt, pd.DataFrame]

Raises:

ValueError – If any input result is not battery-style.

qstn.parser.llm_answer_parser.raw_responses(survey_results)[source]#

Organizes the questions and answers of a survey in a pandas Dataframe. :param survey_results List[InterviewResult]: All results for all interviews.

Returns:

A dictionary where the keys are the

LLMInterviews and the values are a Dataframe with questions/answers.

Return type:

Dict[LLMInterview, pd.Dataframe]

Parameters:

survey_results (list[InferenceResult])