Verdict: Microsoft's open-source Qlib is the most complete free quant-research platform you can run on your own machine. It covers the whole pipeline — market data, feature engineering, model training, backtesting, and trade execution — under the permissive MIT license. Pair it with the newer RD-Agent project and you get LLM-driven agents that propose and test trading signals on their own. The catch? You bring your own data, you must avoid overfitting, and this is a workbench, not a money-printing machine.
Last verified: 2026-06-17 · MIT license · 44,588 GitHub stars · 7,062 forks · latest release v0.9.7 (Aug 15, 2025)
TL;DR
- What it is: End-to-end AI quant platform in Python.
- License/cost: MIT, free;
pip install pyqlib.- Best for: Quants, ML engineers, and small funds who want a readable, forkable stack instead of a rented black box.
- Headline feature:
qrunruns a YAML workflow from raw data to backtest report in one command.- Biggest caveat: The official dataset is disabled for licensing reasons; the community supplies a backup, and quality of data decides everything.
What Qlib actually is
Most open-source quant tools do one thing well: a backtester like Zipline or Backtrader replays a strategy against history, while standalone notebooks train models but leave you to wire in data and execution. Qlib's pitch is the opposite — it ships the entire assembly line.
Microsoft Research released Qlib in 2020 as an "AI-oriented quantitative investment platform" [1]. Its goal is to bridge three gaps AI creates in quant work: the workflow needs a new shape, the data layer needs to be fast, and finance has unique pitfalls like non-stationary markets and point-in-time correctness [2].
The full chain it covers:
| Stage | What Qlib does |
|---|---|
| Alpha seeking | Build features ("factors") from price, volume, and alternative data. |
| Risk modeling | Estimate volatility, drawdown, and factor exposure. |
| Portfolio optimization | Turn forecasts into position sizes. |
| Order execution | Model how to enter and exit positions without moving the market against you. |
You can use each module alone or run the whole thing from a single YAML config.
How it compares to the alternatives
| Tool | Scope | License/cost | Best for |
|---|---|---|---|
| Qlib | Full data → model → backtest → execution pipeline | MIT, free | Builders who want a real quant stack they can read and change. |
| Zipline / Backtrader | Backtesting only | MIT/Apache, free | Strategy replay and quick historical tests. |
| BlackRock Aladdin | Full institutional stack | Closed, enterprise-priced | Large funds with compliance and scale needs. |
| RD-Agent(Q) | Auto factor + model R&D loop on top of Qlib | MIT, free | Researchers who want LLM agents to propose and test signals. |
The fair summary: Qlib is not as polished as a commercial fortress, but it is far deeper than a backtester-in-a-notebook. For a solo quant or a small team, that depth matters more than polish.
The stack in plain English
1. The data layer is the secret sauce
Qlib stores market data in a compact, column-shaped format built to be loaded straight into NumPy/Pandas arrays. It then layers on an expression cache and a dataset cache, so repeating a feature computation is almost instant [3].
Microsoft's own benchmark, creating a 14-factor dataset from 800 stocks across 2007-2020, shows the difference [3]:
| Storage | Time (1 CPU) |
|---|---|
| HDF5 | 184.4 s |
| MySQL | 365.3 s |
| MongoDB | 253.6 s |
| InfluxDB | 368.2 s |
| Qlib (+Expression + Dataset cache) | 7.4 s |
That speed is what makes a deep model zoo usable. Without it, you'd spend most of your time waiting for data, not experimenting.
2. The model zoo is research-grade out of the box
Qlib ships more than 20 named models drawn from published papers, all wired to run as one-liners [1]. The list spans:
- Gradient boosting: LightGBM, XGBoost.
- Sequence models: LSTM, GRU, ALSTM, TCN.
- Graph / attention: GATs, Transformer, HIST, TFT.
- Meta / market dynamics: DDG-DA, ADARNN.
To feed them, Qlib includes two ready-made feature libraries: Alpha158 and Alpha360 — 158 and 360 pre-built signals from raw price and volume. That lets you skip months of "factor plumbing" and move straight to model experiments [1].
3. One command runs the whole research loop
The qrun CLI takes a YAML workflow config, builds the dataset, trains the model, runs the backtest, and returns a full report: predictive metrics, cumulative return curve, max drawdown, and more [4].
Example workflow files are in the repo under examples/benchmarks/, such as the LightGBM + Alpha158 config [4].
4. It tries to stop you from fooling yourself
The most common quant mistake is peeking at future data during a backtest. Qlib includes a point-in-time database design whose only job is to make sure your model cannot accidentally see tomorrow's prices today [1]. That is the single biggest reason naive backtests look amazing and then collapse with real money.
5. It goes beyond prediction into execution
Qlib also includes reinforcement-learning tooling for order execution — the unglamorous but expensive problem of buying a large position without your own buying pushing the price up. And it has meta-learning / drift-adaptation methods so the model can be retrained as markets shift [1].
The new wild card: RD-Agent
In 2024 Microsoft Research released RD-Agent, a general-purpose multi-agent framework for data-driven R&D, and its quant variant RD-Agent(Q) sits on top of Qlib [5].
The agents run a loop:
- Research: Propose hypotheses for new trading signals.
- Development: Write code to compute the signals and run real-market backtests.
- Feedback: Read the results and propose sharper signals.
Microsoft's paper on the system, accepted at NeurIPS 2025, reports roughly 2× higher annualized returns than classical factor libraries while using 70% fewer factors, at under $10 per optimization cycle [6]. Those numbers come from the authors' experiments, not independent verification, so treat them as a promising research claim, not a guarantee.
RD-Agent has its own repo (microsoft/RD-Agent, 13,486 stars, 1,682 forks as of June 2026) and is now the active experimental layer, while core Qlib itself has slowed to stable "housekeeping" releases [5].
How to get started
- Install:
pip install pyqlib(Python 3.8–3.12 supported; Conda recommended). [1] - Get data: The official dataset is temporarily disabled, so use the community source Microsoft documents in the README [1]:
wget https://github.com/chenditc/investment_data/releases/latest/download/qlib_bin.tar.gz mkdir -p ~/.qlib/qlib_data/cn_data tar -zxvf qlib_bin.tar.gz -C ~/.qlib/qlib_data/cn_data --strip-components=1 rm -f qlib_bin.tar.gz - Run a benchmark: Point
qrunat one of the bundled workflow YAMLs and you will have a trained model and backtest report without writing glue code [1]. - Read the paper: The original Qlib paper (arXiv:2009.11189) explains the design intent [2]. The RD-Agent paper (arXiv:2505.15155) covers the agent loop [6].
What this means for you
- If you are a developer curious about quant: Qlib is the best free sandbox. Start with
pip install pyqlib, download the community dataset, and run the LightGBM/Alpha158 benchmark. - If you run a small fund or prop desk: Qlib is a credible read-the-code alternative to rented black boxes, but budget engineering time for data integration and careful out-of-sample testing.
- If you are a researcher: The model zoo and RD-Agent integration give you a reproducible baseline for new ideas, with the point-in-time guardrail already built in.
- For everyone: The hard part is not installing the tool; it is avoiding overfit backtests and sourcing clean data. Treat every great-looking historical result as guilty until proven out-of-sample.
FAQ
Q: Is Qlib really free for commercial use?
A: Yes. It is under the MIT license, which permits commercial use, modification, and distribution with the license text included [1].
Q: Can I trade real money with Qlib?
A: You can build strategies with it, but production execution requires your own broker integration, risk controls, and compliance. The platform gives the research engine; you supply the live wiring.
Q: What data does Qlib include?
A: The official dataset is currently disabled for licensing reasons. A community-maintained China-market dataset is documented as the interim source. US-market users will need to bring or convert their own data [1].
Q: Is the 2× return claim from RD-Agent verified?
A: That figure comes from the authors' paper. It has not been independently audited, so label it as a reported research result, not a promise [6].
Q: Do I need a GPU?
A: No. The core models run on CPU with Pandas, NumPy, and LightGBM. Deep-learning models benefit from a GPU if you train at scale, but the quick-start path does not require one.
Q: What's the difference between Qlib and RD-Agent?
A: Qlib is the quant platform engine. RD-Agent is a multi-agent R&D layer that uses LLMs to automate parts of the research loop on top of Qlib.
Discussion
0 comments