English | 简体中文
Visual-driven AI Operator for Web, Android, iOS, Automation & Testing. Open-source and MIT licensed.
Instruction | Video |
---|---|
Use JS code to drive task orchestration, collect information about Jay Chou's concert, and write it into Google Docs (By UI-TARS model) | google-doc-1080p.mp4 |
Control Maps App on Android (By Qwen-2.5-VL model) | control-maps-app-on-android.mp4 |
Using midscene mcp to browse the page (https://www.saucedemo.com/), perform login, add products, place orders, and finally generate test cases based on mcp execution steps and playwright example | showcase-3-mcp.mp4 |
- Describe your goals and steps, and Midscene will plan and operate the user interface for you.
- Use Javascript SDK or YAML to write your automation script.
- Web Automation 🖥️: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser.
- Android Automation 📱: Use Javascript SDK with adb to control your local Android device.
- iOS Automation 🍎: Use Javascript SDK with iOS Simulator to control your local iOS devices and simulators.
- Any Interface Automation 🌐: Use Javascript SDK to control your own interface.
- Visual Reports for Debugging 🎞️: Through our test reports and Playground, you can easily understand, replay and debug the entire process.
- Caching for Efficiency 🔄: Replay your script with cache and get the result faster.
- MCP: Allows other MCP Clients to directly use Midscene's capabilities. Web MCP Android MCP
- Interaction API 🔗: interact with the user interface.
- Data Extraction API 🔗: extract data from the user interface and dom.
- Utility API 🔗: utility functions like
aiAssert()
,aiLocate()
,aiWaitFor()
.
- Chrome Extension: Start in-browser experience immediately through the Chrome Extension, without writing any code.
- Android Playground: There is also a built-in Android playground to control your local Android device.
- iOS Playground: There is also a built-in iOS playground to control your local iOS device.
Midscene.js supports visual-language models like Qwen3-VL
, Doubao-1.6-vision
, gemini-2.5-pro
and UI-TARS
.
- Capable of finding and understanding the target element on the page by just providing the screenshot.
- No dom or semantic markups are required.
- Less tokens and money cost compared to generalLLM models.
- Support open-source models.
Read more about Choose a model
Midscene will automatically plan the steps and execute them. It may be slower and heavily rely on the quality of the AI model.
await aiAction('click all the records one by one. If one record contains the text "completed", skip it');
Split complex logic into multiple steps to improve the stability of the automation code.
const recordList = await agent.aiQuery('string[], the record list')
for (const record of recordList) {
const hasCompleted = await agent.aiBoolean(`check if the record ${record}" contains the text "completed"`)
if (!hasCompleted) {
await agent.aiTap(record)
}
}
For more details about the workflow style, please refer to Blog - Use JavaScript to Optimize the AI Automation Code
There are so many UI automation tools out there, and each one seems to be all-powerful. What's special about Midscene.js?
-
Visual-driven brings reliability and efficiency: By using visual-language models, Midscene.js is suitable for both web and mobile app automation, no matter the technology stack the interface is built with.
-
Debugging Experience: You will soon realize that debugging and maintaining automation scripts is the real challenge. Midscene.js offers a visualized report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need.
-
Open Source, Free, Deploy as you want: Midscene.js is an open-source project, and it supports self-hosted models.
-
Integrate with Javascript: You can always bet on Javascript 😎
- Home Page and Documentation: https://midscenejs.com
- Sample Projects: https://github.com/web-infra-dev/midscene-example
- API Reference: https://midscenejs.com/api.html
- GitHub: https://github.com/web-infra-dev/midscene
Community projects that extend Midscene.js capabilities:
- midscene-ios - iOS automation support for Midscene
- Midscene-Python - Python SDK for Midscene automation
We would like to thank the following projects:
- Rsbuild and Rslib for the build tool.
- UI-TARS for the open-source agent model UI-TARS.
- Qwen-VL for the open-source VL model Qwen-VL.
- scrcpy and yume-chan allow us to control Android devices with browser.
- appium-adb for the javascript bridge of adb.
- appium-webdriveragent for the javascript operate XCTest。
- YADB for the yadb tool which improves the performance of text input.
- Puppeteer for browser automation and control.
- Playwright for browser automation and control and testing.
If you use Midscene.js in your research or project, please cite:
@software{Midscene.js,
author = {Xiao Zhou, Tao Yu, YiBing Lin},
title = {Midscene.js: Your AI Operator for Web, Android, iOS, Automation & Testing.},
year = {2025},
publisher = {GitHub},
url = {https://github.com/web-infra-dev/midscene}
}
Midscene.js is MIT licensed.