Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Windows 365 for Agents is an MCP server that gives you full operational control of a Windows 365 cloud PC. Use this MCP server to drive a real Windows environment through desktop interaction (mouse, keyboard, screen capture, command execution), browser automation via Microsoft Edge, and semantic UI inspection via Windows UI Automation.
Note
Browser automation works on Microsoft Edge. Edge launches automatically on the first browser tool call. focus_browser can also target Chrome or Firefox, but DOM-level browser tools only work on the Edge instance.
To learn more about Windows 365 for Agents, see Windows 365 for Agents documentation.
Overview
| Server ID | Tenant-level URL | Display name | Description |
|---|---|---|---|
mcp_W365ComputerUse |
https://agent365.svc.cloud.microsoft/agents/tenants/{tenantId}/servers/mcp_W365ComputerUse |
Windows 365 for Agents MCP server | Full operational control of a Windows 365 cloud PC, including desktop interaction, browser automation, and UI inspection. |
Available tools
mcp_desktop_move_mouse
Moves the cursor to a screen position. Use mcp_desktop_click instead if you intend to click at the destination. Required parameters:
- x: X coordinate in screen pixels
- y: Y coordinate in screen pixels
mcp_desktop_click
Clicks at a position, or at the current cursor location if coordinates are omitted. Supports single-click, double-click, and all five mouse buttons.
Optional parameters:
- x: X coordinate in screen pixels (omit for current position)
- y: Y coordinate in screen pixels (omit for current position)
- button: Left, Right, Middle, Forward, or Backward (default Left)
- clickCount: 1 = single click, 2 = double click (default 1)
mcp_desktop_get_cursor_position
Returns the current cursor coordinates. No parameters. Returns {cursorX, cursorY}.
mcp_desktop_drag_mouse
Drags from one position to another. Useful for moving objects, resizing windows, or pixel-precise scrolling. Required parameters:
- startX: Start X coordinate.
- startY: Start Y coordinate.
- endX: End X coordinate.
- endY: End Y coordinate. Optional parameters:
- button: Left, Right, or Middle (default is Left)
mcp_desktop_scroll
Scrolls at a position by using notch units, not pixels. Three notches are approximately one page.
Required parameters:
- x: Scroll position X
- y: Scroll position Y
Optional parameters:
- deltaX: Horizontal notches, positive = right (default 0)
- deltaY: Vertical notches, positive = down (default 0)
Note
Values are clamped to the range [-20, 20].
mcp_desktop_type_text
Types text by simulating keyboard input. For keyboard shortcuts, use mcp_desktop_press_keys. For web form fields, use mcp_browser_type.
Required parameters:
- text: Text to type.
mcp_desktop_press_keys
Presses a key combination simultaneously. Supports modifier keys, function keys, and standard keys.
Required parameters:
- keys: Array of key names to press together (for example,
["ctrl","c"],["alt","tab"],["ctrl","shift","s"])
mcp_desktop_take_screenshot
Captures the full screen or a cropped region as a PNG image (base64-encoded).
Optional parameters:
- x: Crop region left edge
- y: Crop region top edge
- width: Crop region width
- height: Crop region height
Note
Provide all four crop parameters together, or omit all four for a full-screen capture.
mcp_desktop_zoom_region
Captures a screen region at native resolution as a PNG image (base64-encoded). Use this feature to inspect small text or dense UI elements that are hard to read in a downscaled full-screen screenshot.
Required parameters:
- x: Left edge X coordinate in screen pixels
- y: Top edge Y coordinate in screen pixels
- width: Region width in pixels
- height: Region height in pixels
Note
The maximum region size is 1920x1080 pixels.
mcp_desktop_analyze_screen
Performs OCR on the entire screen. No parameters. Returns {fullText, averageConfidence, boxes[{text, confidence, x, y, width, height}], width, height}.
mcp_desktop_get_screen_size
Returns the screen resolution. No parameters. Returns {width, height}.
mcp_desktop_list_windows
Lists all visible windows with their titles, positions, and dimensions. No parameters. Returns an array of {title, processName, handle, x, y, width, height}.
mcp_desktop_activate_window
Brings a window to the foreground by using a fuzzy title match.
Required parameters:
- titlePattern: Partial window title (case-insensitive substring)
mcp_desktop_focus_browser
Focuses a browser window (Edge, Chrome, or Firefox), optionally filtered by URL or title.
Optional parameters:
- pattern: URL or title substring to match (omit for any browser window)
mcp_desktop_close_window
Closes a window gracefully by using a fuzzy title match. The system protects critical processes and you can't close them.
Required parameters:
- titlePattern: Partial window title (80% match threshold). Returns
{matchedTitle, processName, closed}.
mcp_desktop_resize_window
Resizes, moves, maximizes, minimizes, or restores a window by using a fuzzy title match.
Required parameters:
- title: Window title to match (case-insensitive fuzzy match)
- action: Action to perform -
Resize,Move,Maximize,Minimize, orRestore
Optional parameters:
- x: Left edge X coordinate (used with
ResizeorMove) - y: Top edge Y coordinate (used with
ResizeorMove) - width: Width in pixels (used with
Resize) - height: Height in pixels (used with
Resize)
mcp_desktop_execute_shell_command
Runs a shell command in a sandboxed environment. The command is checked against an allow list, and dangerous patterns are blocked.
Required parameters:
- command: Command to run
Optional parameters:
- cwd: Working directory. Use forward slashes (for example,
C:/Users/me/project). - timeoutMs: Timeout in milliseconds (default 30000, max 30000)
Note
- Allowed commands: git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type, and notepad.
- Blocked patterns include shell metacharacters (|, ;, &, <, >), environment variable expansion
(%VAR%), interpreter eval flags (python -cornode -e),git config --global,npm -g, path-prefixed executables,rm -rf,sudo, and disk or system commands. - The command's
stdoutandstderreach truncate at 32 KB. For arbitrary computation, usemcp_desktop_execute_python_code. The command returns{stdout, stderr, exitCode, success, timedOut, resourceLimitsApplied}.
mcp_desktop_execute_python_code
Executes Python code in a sandboxed environment with resource limits. This function is ideal for data processing, calculations, file I/O, and any computation that goes beyond simple shell commands.
Required parameters:
- code: Python code (max 262,144 characters).
Optional parameters:
- cwd: Working directory. Use forward slashes.
- timeoutMs: Timeout in milliseconds (default 30000, max 30000).
Returns the same schema as mcp_desktop_execute_shell_command.
Note
The sandbox enforces a 512 MB memory limit and a 30-second timeout.
mcp_desktop_wait_milliseconds
Pauses execution to allow animations or transitions to complete. Don't use this function in polling loops. Instead, use mcp_browser_wait_for for DOM polling.
Required parameters:
- ms: Wait duration in milliseconds (clamped to [0, 5000])
mcp_desktop_clipboard_read
Reads the current content of the system clipboard. This command doesn't require any parameters. It returns a JSON object that describes the clipboard format and payload, which can be either a text string or a base64-encoded image.
mcp_desktop_clipboard_write
Writes text to the system clipboard, replacing the current content.
Required parameters:
- text: Text to write to the clipboard
Returns a confirmation that includes the character count.
mcp_desktop_list_processes
Lists running processes in the current session. Each entry includes the PID, process name, memory usage, window title (if any), and startTimeTicks. Pair startTimeTicks with mcp_desktop_kill_process to prevent killing a recycled PID.
Optional parameters:
- maxCount: Maximum number of processes to return (default 200)
Returns a JSON array of process info objects.
mcp_desktop_kill_process
Terminates a process by PID. Supply the startTime value from mcp_desktop_list_processes to protect against PID recycling.
Required parameters:
- pid: Process ID returned by
mcp_desktop_list_processes - startTime: Process start time ticks returned by
mcp_desktop_list_processes
Optional parameters:
- force: Force-kill without a graceful shutdown (default false)
Returns a JSON result describing the outcome.
mcp_desktop_launch_application
Launches a GUI application from an allowed directory. Use mcp_desktop_execute_shell_command for CLI commands instead.
Required parameters:
- path: Absolute path to the executable. Use forward slashes (for example,
C:/Program Files/app.exe).
Optional parameters:
- args: Array of command-line arguments
Returns {path, pid}.
mcp_desktop_get_system_info
Returns the OS version, CPU, RAM, available disk space, and display resolution. No parameters. Returns a JSON object containing the system information.
mcp_browser_navigate
Navigates to a URL and waits for the page to load.
Required parameters:
- url: Full URL including protocol (for example,
https://example.com)
mcp_browser_back
Navigates back in browser history. No parameters.
mcp_browser_forward
Navigates forward in browser history. No parameters.
mcp_browser_reload
Reloads the current page. No parameters.
mcp_browser_get_url
Returns the current page URL as a plain string. No parameters.
mcp_browser_get_title
Returns the current page title as a plain string. No parameters.
mcp_browser_get_text
Returns the visible page text content as a plain string. No parameters. Truncated at 512 KB.
mcp_browser_get_html
Returns the full page HTML source as a plain string. No parameters. Truncated at 512 KB.
mcp_browser_get_page_state
Retrieves multiple page state fields in a single call. Useful for capturing several signals at once without issuing separate tool calls.
Required parameters:
- fields: Array of fields to return. Allowed values:
url,title,dom,screenshot,tabs
Returns a JSON object containing only the requested fields.
mcp_browser_click
Clicks a DOM element by CSS selector. More reliable than coordinate-based clicking for web content.
Required parameters:
- selector: CSS selector (for example,
#submit-btnora.nav-link)
mcp_browser_type
Types text into a form element by using a CSS selector.
Required parameters:
- selector: CSS selector of the input element.
- text: Text to type.
mcp_browser_query_text
Gets the text content of the first element that matches a CSS selector.
Required parameters:
- selector: CSS selector.
mcp_browser_wait_for
Waits for a DOM element to appear. This function is useful for dynamic content that loads asynchronously.
Required parameters:
- selector: CSS selector to wait for.
Optional parameters:
- timeoutMs: Timeout in milliseconds. The default is 5,000 and the maximum is 30,000.
mcp_browser_eval_js
Evaluates a JavaScript expression in the page context and returns the result as a string.
Required parameters:
- expression: JavaScript expression that returns a string
Note
If your expression returns an object or number, convert it to a string explicitly (for example, JSON.stringify(obj) or .toString()).
mcp_browser_list_tabs
Lists all open tabs with their index, title, and URL. No parameters. Returns an array of {index, title, url}.
mcp_browser_switch_tab
Switches to a tab by index.
Required parameters:
- tabIndex: 0-based tab index
mcp_browser_new_tab
Opens a new tab, optionally navigating to a URL.
Optional parameters:
- url: URL to open (blank tab if omitted)
Returns {index, title, url}.
mcp_browser_create_tabs
Opens multiple tabs at once. Optionally bring one of them to the foreground.
Required parameters:
- urls: Array of URLs to open, one tab per URL
Optional parameters:
- foregroundIndex: Index of the tab to bring to the foreground after creation (omit to keep the current tab focused)
Returns a text confirmation.
mcp_browser_close_tab
Closes a tab by index.
Required parameters:
- tabIndex: 0-based tab index
mcp_browser_screenshot
Captures a PNG screenshot of the browser viewport only (not the full screen). No parameters. Returns a base64-encoded PNG.
mcp_browser_select_option
Selects one or more options in a <select> element by their value attribute.
Required parameters:
- selector: CSS selector for the
<select>element - values: Array of option value(s) to select
Returns a confirmation with the count of selected options.
mcp_browser_fill_form
Fill multiple form fields in a single call. Each entry is a {selector, value} pair. The operation stops on the first failure and reports which fields succeeded.
Required parameters:
- fields: Array of
{selector, value}pairs
Returns a confirmation with the count of filled fields.
mcp_browser_drag
Drags a source element onto a target element. Both elements are identified by CSS selector.
Required parameters:
- sourceSelector: CSS selector of the drag source
- targetSelector: CSS selector of the drop target
mcp_browser_pdf_save
Saves the current page as a PDF file. Destination paths are restricted to %USERPROFILE% or %TEMP%.
Required parameters:
- filePath: Destination file path under
%USERPROFILE%or%TEMP%. Use forward slashes.
Returns a confirmation including the saved file path.
mcp_browser_handle_dialog
Accepts or dismisses a pending browser dialog (alert, confirm, prompt, or beforeunload). Returns "No dialog pending" if no dialog is active.
Required parameters:
- action:
acceptordismiss
Optional parameters:
- promptText: Text to supply to a prompt dialog (ignored for alert and confirm)
mcp_browser_get_cookies
Gets cookies for the current page, or for a specified set of URLs. Cookie values are always redacted for security; names, domains, paths, and flags are returned.
Optional parameters:
- urls: Array of URLs to get cookies for (omit for the current page)
Returns an array of cookie objects with redacted values.
mcp_browser_set_cookies
Sets cookies on the current page's domain. This action adds or overwrites cookies but doesn't clear existing cookies.
Required parameters:
- cookies: Array of cookie objects. Each entry requires
nameandvalue. Optional fields:domain,path,secure,httpOnly,sameSite.
Returns a text confirmation.
mcp_browser_execute_batch
Executes multiple browser actions sequentially in a single call. This action stops on the first failure and returns the results collected up to that point.
Required parameters:
- actions: Array of
{action, params}objects. Allowed actions:navigate,snapshot,click_ref,type_ref,hover_ref,scroll_ref,keypress_ref,wait_for,eval_js.
Returns an array of results, one per executed action.
mcp_browser_snapshot
Captures the page's accessibility tree with stable ref IDs (for example, e5) that map to DOM nodes. Use the refs with mcp_browser_click_ref, mcp_browser_type_ref, and mcp_browser_hover_ref. Refs expire when the page navigates—retake a snapshot after navigation.
Optional parameters:
- maxDepth: Maximum tree depth, 1-10 (default 5)
- includeIframes: Include cross-origin iframes (default true)
Returns a JSON object containing the accessibility snapshot and ref IDs.
mcp_browser_click_ref
Clicks an element by ref ID from mcp_browser_snapshot. A hit-test verifies that no other element overlays the target. Fails if the snapshot expires—retake the snapshot in that case.
Required parameters:
- snapshotId: Snapshot ID returned by
mcp_browser_snapshot - ref: Element ref (for example,
e5) from the snapshot nodes
Optional parameters:
- button: Left, Right, or Middle (default Left)
- clickCount: 1 = single click, 2 = double click (default 1)
Returns a confirmation including the clicked coordinates.
mcp_browser_type_ref
Types text into an element by using the ref ID from mcp_browser_snapshot. The element is focused first, and existing text is cleared by default. The operation fails if the snapshot expires.
Required parameters:
- snapshotId: Snapshot ID returned by
mcp_browser_snapshot - ref: Element ref (for example,
e5) from the snapshot nodes - text: Text to type
Optional parameters:
- clear: Clear existing text first (default true)
Returns a confirmation that includes the character count.
mcp_browser_hover_ref
Hovers over an element by using the ref ID from mcp_browser_snapshot. Returns immediately. The operation fails if the snapshot expires - retake the snapshot in that case.
Required parameters:
- snapshotId: Snapshot ID returned by
mcp_browser_snapshot - ref: Element ref (for example,
e5) from the snapshot nodes
Returns a confirmation including the hover coordinates.
mcp_accessibility_get_accessibility_tree
Retrieves the UI element tree for the foreground window. Each element includes its role, name, value, and screen coordinates.
Optional parameters:
- maxDepth: Maximum tree traversal depth, 1-10 (default 3)
- maxElements: Maximum elements to return, 1-2000 (default 500)
Returns a hierarchical tree of {role, name, value, x, y, width, height, children[...]}.
mcp_browser_keypress_ref
Presses a single key on an element by ref ID from mcp_browser_snapshot. The element is focused first. Supports modifier keys. Fails if the snapshot has expired — retake the snapshot in that case.
Required parameters:
- snapshotId: Snapshot ID returned by
mcp_browser_snapshot - ref: Element ref (for example,
e5) from the snapshot nodes - key: Key name — for example,
Enter,Escape,Tab,ArrowUp,ArrowDown, orF1–F12
Optional parameters:
- modifiers: Array of modifier keys to hold during the press —
Ctrl,Shift,Alt, orMeta
Returns a text confirmation.
mcp_browser_scroll_ref
Scrolls an element into view by ref ID from mcp_browser_snapshot. Optionally, scrolls by a pixel delta within the element. Fails if the snapshot expires.
Required parameters:
- snapshotId: Snapshot ID returned by
mcp_browser_snapshot - ref: Element ref (for example,
e5) from the snapshot nodes
Optional parameters:
- deltaX: Horizontal scroll delta in pixels (default 0)
- deltaY: Vertical scroll delta in pixels (default 0)
Returns a text confirmation.
mcp_browser_set_file_input_ref
Sets files on a file input element by ref ID from mcp_browser_snapshot. File paths are restricted to the user's Documents, Downloads, Desktop, or %TEMP% directories.
Required parameters:
- snapshotId: Snapshot ID returned by
mcp_browser_snapshot - ref: Element ref for the file input
- filePaths: Array of file paths to upload
Returns a text confirmation.
mcp_accessibility_find_ui_element
Searches for UI elements by text content, accessibility role, or name (case-insensitive substring). Returns matching elements with their clickable screen coordinates.
Optional parameters:
- text: Text to search for (used as name if name omitted)
- role: UI role filter -
Button,TextBox,CheckBox,MenuItem,ComboBox, and more - name: Accessible name (takes precedence over text if both provided)
- windowHandle: Target window handle (null = foreground window)
Key features
Desktop interaction
- Click, double-click, right-click, and five-button mouse control.
- Pixel-precise drag and drop.
- Notch-based scrolling (three notches ≈ one page).
- Keyboard typing and multi-key shortcut combos.
- Cursor position tracking.
- Screen resolution detection.
Screen capture and analysis
- Full-screen or cropped PNG screenshots.
- OCR of the full screen with per-region confidence scores and bounding boxes.
- Browser-viewport-only screenshots for web content.
Window management
- Enumerate all visible windows with positions and dimensions.
- Activate windows by fuzzy title match.
- Focus browser windows (Edge, Chrome, Firefox) optionally filtered by URL or title.
- Graceful window close with protection for system-critical processes.
Command execution
- Sandboxed shell commands with an allow list (git, npm, dotnet, python, cargo, node, pip, dir, mkdir, del, copy, move, robocopy, findstr, where, type).
- Sandboxed Python execution up to 262,144 characters of code.
- Working-directory and per-call timeout control (max 30 seconds).
- Resource limits and hardened block list against shell metacharacters, eval flags, privilege escalation, and destructive operations.
Browser automation
- Navigate, back, forward, reload, and configurable wait conditions on navigation (
load,networkidle0,networkidle2). - Read page URL, title, visible text (512 KB cap), and full HTML (512 KB cap).
- Consolidated page state retrieval — URL, title, DOM, screenshot, and tab list in a single call.
- DOM-level click, type, form fill, drag, and
<select>option selection by CSS selector. - Accessibility-snapshot-based interaction by ref ID — click, type, hover, keypress with modifiers, scroll, and file-input upload.
- Wait for dynamic elements with configurable timeout, optionally requiring visibility.
- Evaluate JavaScript expressions in the page context.
- Multi-tab management: list, switch, open one or many at once, and close.
- Cookie inspection (values redacted) and assignment on the current domain.
- Batched action execution — sequence multiple browser steps in one call, stopping on first failure.
- Save the current page as a PDF under
%USERPROFILE%or%TEMP%. - Dialog handling for
alert,confirm,prompt, andbeforeunload. - Runs on Microsoft Edge, launched automatically on first use.
UI accessibility
- Retrieve the Windows UI Automation tree for the foreground window with configurable depth and element count.
- Find UI elements by text, role, or accessible name.
- Returns clickable screen coordinates for precise targeting of buttons, text boxes, checkboxes, menu items, and combo boxes.
Timing and synchronization
- Use
mcp_desktop_wait_millisecondsfor short one-shot pauses (up to five seconds). - Use
mcp_browser_wait_forfor DOM-level polling (up to 30 seconds).
Notes
- All coordinates are in screen pixels with (0,0) at the top-left corner. Coordinates from
mcp_desktop_take_screenshot,mcp_desktop_analyze_screen,mcp_accessibility_find_ui_element, andmcp_desktop_list_windowsall share the same coordinate space. - A cursor failsafe is active: If the cursor moves within five pixels of any screen corner, mouse operations are canceled. Avoid targeting the extreme edges of the screen.
- Shell pipe operators (|), semicolons (;), ampersands (&), and output redirection (>, <) are blocked. To transform command output, capture it and process it with
mcp_desktop_execute_python_code. - If interpreter eval flags are blocked or if
python -c "..."andnode -e "..."are rejected, you can usemcp_desktop_execute_python_codefor Python code, or write code to a file first. - Command
stdout/stderris truncated at 32 KB each. Use flags to limit verbose output (for example,git log --oneline -20) or redirect to a file and read it separately. - Maximum timeout for
mcp_desktop_execute_shell_commandandmcp_desktop_execute_python_codeis 30 seconds. For longer work, break it into smaller steps or launch a background process from Python and poll. - There's no dedicated file read/write tool. Read files with
mcp_desktop_execute_shell_commandusing thetypecommand. Write files withmcp_desktop_execute_python_codeusing Python's built-in file I/O. Shell output redirection (>, >>) is blocked. mcp_browser_eval_jsalways returns a string. Convert objects or numbers explicitly before returning.- Browser DOM tools (
mcp_browser_click,mcp_browser_type,mcp_browser_eval_js, and others) operate only on the Microsoft Edge instance.mcp_desktop_focus_browsercan focus Chrome or Firefox windows, but DOM tools don't target them. mcp_desktop_take_screenshotrequires all four crop parameters (x, y, width, height) together, or none for a full-screen capture.mcp_desktop_scrolluses notch units (clamped to [-20, 20]), not pixels. Three notches is approximately one page.mcp_accessibility_find_ui_elementrequires at least one of text, role, or name. When both text and name are provided, name takes precedence.mcp_browser_snapshotrefs expire on navigation. If a_reftool (click, type, hover, keypress, scroll, or set file input) fails because the snapshot is stale, retake the snapshot and retry.mcp_browser_set_file_input_refonly accepts file paths under the user'sDocuments,Downloads,Desktop, or%TEMP%directories. Files outside those locations are rejected.mcp_browser_get_cookiesalways returns redacted cookie values. Use it for inspection—names, domains, paths, and flags are returned in full, but values aren't exposed.mcp_browser_set_cookiesonly adds or overwrites cookies. It doesn't clear existing cookies. To remove a cookie, overwrite it with an expiredexpiresvalue via this tool, or clear it through the page itself.mcp_browser_execute_batchstops on the first failed action and returns only the results collected up to that point. Subsequent actions in the array aren't attempted. Allowed batch actions are limited to:navigate,snapshot,click_ref,type_ref,hover_ref,scroll_ref,keypress_ref,wait_for, andeval_js.mcp_browser_create_tabsopens tabs in the order provided. IfforegroundIndexis omitted, focus stays on the currently active tab.mcp_browser_get_page_stateonly returns the fields listed in thefieldsarray. Request only what you need – includingdomorscreenshotcan produce large payloads.
Common use cases
Fill out a web form
- Call
mcp_browser_navigateto open the target page. - Call
mcp_browser_wait_forto wait for the form to load. - Call
mcp_browser_typeto fill each field by CSS selector. - Call
mcp_browser_clickto submit the form. - Call
mcp_browser_wait_forto wait for the confirmation element. - Call
mcp_browser_get_textto read and verify the result.
Automate a desktop application
- Call
mcp_desktop_activate_windowto bring the application to the foreground. - Call
mcp_desktop_take_screenshotto capture the current state. - Call
mcp_accessibility_find_ui_elementto locate a button or field by name. - Call
mcp_desktop_clickon the element's reported coordinates. - Call
mcp_desktop_type_textto enter data. - Call
mcp_desktop_press_keysfor shortcuts (for example,["ctrl","s"]to save). - Call
mcp_desktop_take_screenshotto verify the result.
Extract data from a web page
- Call
mcp_browser_navigateto open the page. - Call
mcp_browser_get_textto extract visible text content. - Call
mcp_desktop_execute_python_codeto parse and process the extracted data. - Call
mcp_browser_eval_jsto query specific values via JavaScript when text extraction isn't enough.
Run development tasks
- Call
mcp_desktop_execute_shell_commandforgit pull,npm install, anddotnet build. - Call
mcp_desktop_take_screenshotto capture build output. - Call
mcp_desktop_execute_python_codeto analyze logs or test results. - Call
mcp_browser_navigateto open a local dev server in the browser. - Call
mcp_browser_screenshotto capture the rendered page.
Read and write files
- Read a file by using
mcp_desktop_execute_shell_commandwithtype C:\path\to\file.txt. - Write a file by using
mcp_desktop_execute_python_codewith Python'sopen(...)andwrite(...). - Verify by using
mcp_desktop_execute_shell_commandwithdir C:\path\to\output.txt.
Navigate complex UI with accessibility
- Call
mcp_accessibility_get_accessibility_treeto understand the full UI structure. - Call
mcp_accessibility_find_ui_elementto find a specific control (for example,role: "MenuItem",name: "Settings"). - Call
mcp_desktop_clickusing the element's reported coordinates. - Call
mcp_accessibility_find_ui_elementagain to find the next control in the dialog. - Call
mcp_desktop_type_textormcp_desktop_clickto interact with it.
Keep a long-running session alive
- Send any MCP request at least once every 30 minutes to prevent idle eviction.
mcp_desktop_get_screen_sizeis lightweight and works well as a heartbeat.