Developing reliable quantum code remains a significant challenge, despite advances in quantum computing, and current large language models often produce flawed results. Kiana Kheiri, Aamna Aamir, and Andriy Miranskyy from Toronto Metropolitan University, along with Chen Ding, address this problem by fine-tuning a powerful 32 billion parameter model to generate more accurate quantum circuits. Their work demonstrates substantial improvements over existing language models, achieving pass rates of over 56% on standard benchmarks, and even surpassing the performance of general-purpose coding tools. This research represents a crucial step towards reliable AI-assisted quantum programming, paving the way for more complex and error-resilient quantum algorithms, although challenges remain in tackling the most advanced programming tasks.
Quantum Software Development Faces Significant Challenges
Quantum computing promises revolutionary advances in fields like medicine and materials science, but realizing this potential depends on overcoming a significant hurdle: the complexity of programming quantum computers. Writing correct and efficient quantum code remains a challenging and error-prone process, even for experts, due to the fundamentally different nature of quantum computation compared to classical computing, demanding new approaches to software development. Researchers are now exploring how artificial intelligence, particularly large language models (LLMs), can bridge this gap and make quantum programming more accessible. Adapting these models to the quantum realm presents unique challenges, as quantum code relies on distinct languages, libraries, and programming idioms, and the available training data is limited.
To address these challenges, Kiana Kheiri, Aamna Aamir, Andriy Miranskyy, and Chen Ding have developed a Qiskit-based quantum computing coding assistant, an AI-driven tool designed to help developers write and refine quantum programs. This assistant focuses specifically on Qiskit, a widely used quantum SDK, and supports tasks such as circuit construction, optimization, and debugging. By training a large language model on a richly annotated dataset of quantum programming examples, the team aims to create a system that understands high-level intentions and provides context-sensitive suggestions or code snippets. The team fine-tuned a 32 billion parameter model using two reinforcement learning methods, achieving significant improvements in code generation accuracy. Their assistant, tested on the HumanEval benchmark, surpasses all general-purpose baselines, demonstrating a clear advancement in AI-assisted quantum programming. This work represents a crucial step towards lowering the barrier to entry for quantum programming and accelerating the development of quantum software for both experts and beginners.
Optimising Code Generation with Preference Policies
Methods involving Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO) utilise a richly annotated synthetic dataset. Evaluation on the Qiskit HumanEval benchmark demonstrates ORPO achieves 56.29% Pass@1, representing an approximate ten percentage point improvement over Granite-8B-QK, while GRPO attains 49%, both exceeding the performance of all general-purpose baseline models. Performance on the original HumanEval benchmark yields scores of 65.90% for ORPO and 63.00% for GRPO. GRPO demonstrates particular strength in solving basic tasks, while ORPO excels at intermediate tasks.
Qiskit Coding Assistant Speeds Quantum Development
The research presents a Qiskit-based coding assistant designed to help developers write and refine quantum programs. Simply adapting classical software development techniques to the quantum realm is insufficient due to fundamental differences in how quantum programs operate, necessitating specialized tools. This assistant focuses specifically on Qiskit, IBM’s widely used quantum SDK, and supports tasks such as circuit construction, optimization, and debugging. The goal is to lower the barrier to entry for quantum programming and accelerate development for both experts and beginners. The team reviewed existing quantum programming environments and AI-assisted coding tools, noting that current approaches require higher-level abstractions and systematic design.
Several efforts aim to address these challenges by providing more robust tools and processes tailored for quantum software development. Recent research has also explored the use of large language models (LLMs) for quantum code generation, with researchers developing the Qiskit HumanEval benchmark to evaluate LLM performance. The team built their Qiskit-based assistant upon the Qwen2.5-Coder-Coder-32B model, a 32-billion-parameter LLM specialized for code generation, and fine-tuned it using a curated dataset of Qiskit programs. This dataset was created through an automated pipeline involving code retrieval, function extraction, annotation, validation, deduplication, and formatting, resulting in approximately 10,000 Qiskit-related source code samples.
The dataset was categorized into basic, intermediate, and advanced levels based on circuit complexity and algorithmic structure, and validated through simulation-based unit tests. Two specialized training subsets were derived for reinforcement learning: one for Odds-Ratio Preference Optimization (ORPO) and another for Group Relative Policy Optimization (GRPO). To refine the model’s behavior, the team employed two independent reinforcement learning strategies. ORPO aligns the model with human-like coding preferences, focusing on readability and maintainability, while GRPO reinforces code quality by optimizing for group-level performance differentials.
The ORPO objective increases the likelihood of preferred output while regularizing the divergence from the original pre-trained policy. The GRPO strategy uses a clipped objective to ensure training stability and guide the model toward producing more executable and resource-efficient quantum circuits. The team anticipates that this work will contribute to the broader goal of quantum software engineering: to bring the productivity and reliability benefits of modern software development to the field of quantum computing, thereby accelerating innovation and adoption.
Preference Optimisation Boosts Quantum Code Generation
In this work, a Qiskit-based quantum code assistant is presented, built on the Qwen2.5-Coder-32B model and fine-tuned using reinforcement learning with preferences. The introduction of Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO) explores how domain-aligned feedback can improve quantum code generation beyond conventional supervised fine-tuning. The models demonstrate competitive performance on the Qiskit HumanEval benchmark, particularly excelling at Basic and Intermediate tasks, where they outperform several general-purpose LLMs. These results underscore the promise of preference-based optimization for aligning large language models with quantum programming best practices. However, this work also presents important challenges, including inconsistencies in benchmark releases and missing evaluation scripts, which necessitated manual running and validation of test cases, affecting reproducibility.
👉 More information
🗞 QSpark: Towards Reliable Qiskit Code Generation
🧠 DOI: https://doi.org/10.48550/arXiv.2507.12642
I’m making a project in qiskit right now so this was really interesting to come across today