Parser#
- qstn.parser.llm_answer_parser.parse_json(survey_results)[source]#
Parse JSON outputs of survey results with automatic battery routing.
Battery-style results (questionnaire_item_id == -1 as a single aggregated row) are routed to parse_json_battery; all others use standard JSON parsing.
- Parameters:
survey_results (List[InferenceResult]) – Survey results returned by survey conduction methods.
- Returns:
- Mapping from questionnaire to parsed
dataframe. Battery-style inputs are returned in expanded per-question row format.
- Return type:
Dict[LLMPrompt, pd.DataFrame]
- qstn.parser.llm_answer_parser.parse_json_battery(survey_results)[source]#
Parse JSON outputs of battery-style survey results.
Expects one aggregated response row per questionnaire (item_id == -1) and expands this into one row per questionnaire item by matching top-level JSON keys to questionnaire questions and flattening the nested object for each question into that row.
- Parameters:
survey_results (List[InferenceResult]) – Battery-style survey results with one aggregated response (item_id == -1) per questionnaire.
- Returns:
- Mapping from questionnaire to expanded
per-question dataframe.
- Return type:
Dict[LLMPrompt, pd.DataFrame]
- qstn.parser.llm_answer_parser.parse_json_str(answer)[source]#
- Parameters:
answer (str)
- Return type:
dict[str, str] | None
- qstn.parser.llm_answer_parser.parse_logprobs(survey_results, allowed_choices)[source]#
Filter and aggregate logprobs returned by Logprob_AnswerProductionMethod.
- Parameters:
survey_results (list[InferenceResult]) – List of InterviewResult that is returned from running a survey
allowed_choices (list[str] | dict[str, list[str]]) – List of possible answer options OR dictionary mapping options to multiple tokens that encode each option
- Returns:
- A dictionary where the keys are the
LLMInterviews and the values are a Dataframe with questions/answers.
- Return type:
Dict[LLMInterview, pd.Dataframe]
- qstn.parser.llm_answer_parser.parse_with_llm(model, survey_results, system_prompt='You are a helpful assistant.', prompt='Your task is to parse the correct answer option from an open text answer a LLM has given to survey questions. You will be provided with the survey question, possible answer options and the LLM answer.\n{automatic_output_instructions}\n{no_answer_option_instruction}Question: {question}\nPossible answer options: {answer_options}\nResponse by LLM: {llm_response}', battery_prompt='Your task is to parse answer options for multiple survey questions from one aggregated LLM response. {automatic_output_instructions}\n{no_answer_option_instruction}Questions: {question}\nPossible answer options per question:\n{answer_options}\nAggregated response by LLM: {llm_response}', response_generation_method=None, generation_fn=<function batch_generation>, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, use_parser=True, no_answer_option=None, seed=42, answer_options=None, **generation_kwargs)[source]#
Parse free-text survey answers using LLM-as-a-judge with automatic battery routing.
Battery-style results are routed to parse_with_llm_battery. Non-battery results use the regular single-item/sequential parser flow.
- Parameters:
model (LLM or AsyncOpenAI) – vLLM model or AsyncOpenAI client used for parser inference.
survey_results (List[InferenceResult]) – Survey results to parse.
system_prompt (str) – System prompt passed to parser inference.
prompt (str) – Prompt template for parser inference. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.
battery_prompt (str) – Prompt template used for battery-style parser routing. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.
( (response_generation_method) – ResponseGenerationMethod | List[ResponseGenerationMethod], optional
) – Constraint for parser output. If use_parser=True and this is None, default JSON parser schemas are applied. If no answer options are available for a question, the parser falls back to a free-text JSON value for that field instead of enforcing an enum.
generation_fn (Callable) – Generation function following the batch_generation output contract.
client_model_name (str, optional) – Model name for OpenAI client calls.
api_concurrency (int) – Max concurrent API requests for OpenAI calls.
print_conversation (bool) – Whether parser conversations are printed.
print_progress (bool) – Whether parser progress bars are shown.
use_parser (bool) – If True, parser outputs are post-processed into structured dataframes (parse_json / parse_json_battery). If False, raw parser model outputs are returned.
no_answer_option (str, optional) – Optional additional answer label that allows parser output to mark unanswered/unparseable cases.
seed (int) – Random seed for parser inference.
Dict[int (answer_options (AnswerOptions |) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
| (AnswerOptions]) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
generation_kwargs (Any) – Additional generation kwargs passed to generation_fn.
response_generation_method (ResponseGenerationMethod | list[ResponseGenerationMethod] | None)
answer_options (AnswerOptions | dict[Any, AnswerOptions | dict[Any, AnswerOptions] | None] | None)
- Returns:
- Mapping from questionnaire to parsed (or
raw) dataframe. Includes source_llm_response for traceability.
- Return type:
Dict[LLMPrompt, pd.DataFrame]
- qstn.parser.llm_answer_parser.parse_with_llm_battery(model, survey_results, system_prompt='You are a helpful assistant.', prompt='Your task is to parse answer options for multiple survey questions from one aggregated LLM response. {automatic_output_instructions}\n{no_answer_option_instruction}Questions: {question}\nPossible answer options per question:\n{answer_options}\nAggregated response by LLM: {llm_response}', response_generation_method=None, generation_fn=<function batch_generation>, client_model_name=None, api_concurrency=10, print_conversation=False, print_progress=True, use_parser=True, no_answer_option=None, seed=42, answer_options=None, **generation_kwargs)[source]#
Parse battery-style aggregated survey answers using LLM-as-a-judge.
If use_parser=True, outputs are expanded to per-question rows via parse_json_battery. If use_parser=False, the single aggregated row is returned (raw parser output).
- Parameters:
model (LLM or AsyncOpenAI) – vLLM model or AsyncOpenAI client used for parser inference.
survey_results (List[InferenceResult]) – Battery-style survey results with one aggregated response (item_id == -1) per questionnaire.
system_prompt (str) – System prompt passed to parser inference.
prompt (str) – Prompt template for parser inference. Supports {question}, {llm_response}, {answer_options}, {automatic_output_instructions}, and {no_answer_option_instruction} placeholders.
( (response_generation_method) – ResponseGenerationMethod | List[ResponseGenerationMethod], optional
) – Constraint for parser output. If use_parser=True and this is None, a default battery-aware JSON schema is created. If no answer options are available for a question, that question falls back to a free-text JSON value instead of enforcing an enum.
generation_fn (Callable) – Generation function following the batch_generation output contract.
client_model_name (str, optional) – Model name for OpenAI client calls.
api_concurrency (int) – Max concurrent API requests for OpenAI calls.
print_conversation (bool) – Whether parser conversations are printed.
print_progress (bool) – Whether parser progress bars are shown.
use_parser (bool) – If True, parser outputs are expanded with parse_json_battery. If False, raw aggregated parser output is returned.
no_answer_option (str, optional) – Optional additional answer label that allows parser output to mark unanswered/unparseable cases.
seed (int) – Random seed for parser inference.
Dict[int (answer_options (AnswerOptions |) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
| (AnswerOptions]) – Dict[LLMPrompt, AnswerOptions | Dict[int, AnswerOptions]], optional): Optional override for answer options used by parser prompts and parser JSON schemas. This is useful when original survey questions were run without embedded answer options.
generation_kwargs (Any) – Additional generation kwargs passed to generation_fn.
response_generation_method (ResponseGenerationMethod | list[ResponseGenerationMethod] | None)
answer_options (AnswerOptions | dict[Any, AnswerOptions | dict[Any, AnswerOptions] | None] | None)
- Returns:
- Mapping from questionnaire to parsed (or
raw) dataframe. Includes source_llm_response.
- Return type:
Dict[LLMPrompt, pd.DataFrame]
- Raises:
ValueError – If any input result is not battery-style.
- qstn.parser.llm_answer_parser.raw_responses(survey_results)[source]#
Organizes the questions and answers of a survey in a pandas Dataframe. :param survey_results List[InterviewResult]: All results for all interviews.
- Returns:
- A dictionary where the keys are the
LLMInterviews and the values are a Dataframe with questions/answers.
- Return type:
Dict[LLMInterview, pd.Dataframe]
- Parameters:
survey_results (list[InferenceResult])