LRMs Reasoning Steps

Description

This dataset provides an object-centric event log (OCEL) detailing the reasoning processes of various Large Reasoning Models (LRMs) when tackling tasks from the PMLRM-Bench benchmark. The PMLRM-Bench is an extension of the PM-LLM-Benchmark, designed to evaluate both the correctness of LRM outputs and the robustness of their reasoning processes in the domain of process mining.

PMLRM-Bench: An Object-Centric Event Log of Large Reasoning Model Reasoning Steps for Process Mining Analysis

Link to the benchmark’s repository

Link to the benchmark’s (pre-print) paper

The OCEL is generated from the textual “chain-of-thought” outputs of LRMs. Each reasoning step within these traces has been extracted and classified by its type (e.g., Deductive Reasoning, Hypothesis Generation) and its effect on the overall reasoning correctness (Positive, Indifferent, or Negative). This classification was performed using a judge LLM (Gemini-2.5-Pro-Preview-03-25), as detailed in the source paper.

Structure of the OCEL

The event log is structured with the following object types and event attributes:

Objects:
- MOD: Represents a specific Large Reasoning Model evaluated in the benchmark.
- QUE: Represents a unique question or prompt from the PM-LLM-Benchmark dataset that the LRM responded to.
- MODQUE: Represents a unique instance of a specific model (MOD) answering a specific question (QUE).
Events: Each event corresponds to a single reasoning step identified in the LRM’s output.
- ocel:activity: Stores the classified reasoning step, combining its type (e.g., PR, DR, HG) and its effect (PE, IND, NE). For example, “Deductive Reasoning - PE”.
- ocel:timestamp: A synthetically generated timestamp to preserve the order of reasoning steps within a trace.
- text: Contains the actual text snippet from the LRM’s reasoning trace that corresponds to this specific step.
- ocel:eid: A unique identifier for the event.
Relations: Each event is linked to:
- The MOD object that produced the reasoning step.
- The QUE object that the reasoning step is addressing.
- The MODQUE object representing the specific answer instance.

Purpose and Potential Use

This OCEL allows for in-depth analysis of LRM reasoning behaviors using process mining techniques. Researchers can explore:

Common reasoning patterns across different models or question types.
The sequence and frequency of various reasoning steps (e.g., how often Hypothesis Generation is followed by Validation).
The impact of different reasoning strategies on task performance and correctness.
Differences in reasoning approaches between high-performing and lower-performing LRMs.

The dataset is intended to complement the research paper “Configuring Large Reasoning Models using Process Mining: A Benchmark and a Case Study” by Berti et al., providing the structured data used to analyze and benchmark LRM reasoning capabilities.

File Information

The dataset contains one file: reasoning_benchmark.jsonocel. This file is an object-centric event log formatted according to the OCEL 2.0 standard.

This dataset was generated using a Python script that parses the JSON files containing the classified reasoning steps (from the prel/final_abstract_steps folder mentioned in the script, which corresponds to the outputs of the reasoning trace extraction and classification pipeline described in Section 3.1 of the paper).

LRMs Reasoning Steps

Description #

Structure of the OCEL #

Purpose and Potential Use #

File Information #

Description

Structure of the OCEL

Purpose and Potential Use

File Information