-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Motivation
The @remix-run/multipart-parser is exceptionally fast and efficient for small-to-medium sized payloads. However, its current implementation buffers the entirety of each file part into memory before yielding it to the consumer. This behavior prevents what is arguably the most critical use case for a streaming parser: handling large file uploads in a memory-constrained environment.
This issue proposes introducing a true, end-to-end streaming API for file parts to make the parser robust for all use cases and align its implementation with the "Memory Efficient" promise in the README.
The Core Issue: Unbounded Memory Buffering
In a real-world test on a system with 16GB of RAM, the current buffering behavior becomes a critical bottleneck:
- Uploading a 1GB file caused the process's memory usage to spike to over 1GB.
- Attempting to upload a 2.5GB file exhausted all available system memory, crashing the process.
- In contrast, a library like
busboyon the same system handled a 20GB file upload with a stable memory footprint of ~700MB.
The current API design encourages this memory-intensive pattern, as the entire file's content is loaded into part.bytes before it can be processed:
for await (let part of parseMultipartRequest(request)) {
if (part.isFile) {
// By the time this loop yields a `part`, its entire content is already
// buffered in `part.bytes`, causing memory usage to spike to the size of the file.
await saveFile(part.filename, part.bytes);
}
}This effectively turns a streaming transport layer into a buffered-per-part implementation at the application layer, negating the benefits of streaming for large files.
Steps to Reproduce
The memory exhaustion issue can be reliably reproduced using the bun-large-file demo within this repository.
-
Clone the repository and navigate to the demo:
git clone https://github.com/remix-run/remix.git cd remix/packages/multipart-parser/demos/bun-large-file -
Use a minimal server to isolate the issue: Ensure
server.tsuses the standardparseMultipartRequestand accessespart.bytes.// packages/multipart-parser/demos/bun-large-file/server.ts import { parseMultipartRequest } from '@remix-run/multipart-parser' import * as fs from 'fs/promises' import * as path from 'path' const UPLOAD_DIR = path.resolve(__dirname, 'uploads') await fs.mkdir(UPLOAD_DIR, { recursive: true }) Bun.serve({ port: 3001, maxRequestBodySize: Infinity, async fetch(request) { if (request.method === 'POST') { try { for await (let part of parseMultipartRequest(request, { maxFileSize:Infinity })) { if (part.isFile) { const filePath = path.join(UPLOAD_DIR, part.filename!) // This line buffers the entire file into memory before writing. await fs.writeFile(filePath, part.bytes) } } return new Response('Upload complete', { status: 200 }) } catch (error) { console.error(error) return new Response('Error', { status: 500 }) } } return new Response('OK') }, }) console.log('Server listening on http://localhost:3001 ...')
-
Install dependencies and start the server:
pnpm install bun start
-
Upload a file larger than available RAM:
# Create a dummy 3GB file dd if=/dev/zero of=large_file.bin bs=1G count=3 # Upload the file curl -X POST -F "file=@large_file.bin" http://localhost:3001
-
Monitor memory usage: Observe the
bunprocess's memory consumption. It will grow linearly with the size of the upload, eventually leading to process or system instability.
Proposed Solution: A True Streaming API for Parts
To address this, the content of a file part should be exposed as a ReadableStream, allowing the consumer to process the file in chunks as they arrive. This keeps memory usage low and constant, regardless of file size.
Proposal 1: Expose a ReadableStream on the MultipartPart
This approach is idiomatic with modern JavaScript and maintains the ergonomic for await...of API.
for await (let part of parseMultipartRequest(request)) {
if (part.isFile) {
// Get a stream of the file content
const stream = part.stream; // or part.contentStream
// Pipe it directly to a file on disk or a cloud storage service
await stream.pipeTo(fs.createWriteStream(part.filename));
} else {
// Non-file parts can still be buffered as they are typically small
console.log(part.name, await part.text());
}
}Implementation Considerations:
- To prevent accidental buffering, accessing
.bytesor.text()on a part that has had its stream consumed should throw an error. - Conversely, accessing
.streamafter.byteshas been read should yield an empty stream or throw. - This new property would only be necessary for file parts (
isFile === true).
Proposal 2: An Event-Driven API (like Busboy)
Alternatively, an event-based approach is a well-established pattern for memory-efficient stream processing.
const parser = createStreamingMultipartParser(request);
parser.on('file', (filename, stream, contentType) => {
// 'stream' is a ReadableStream of the file content
console.log(`Receiving file: ${filename}`);
stream.pipeTo(fs.createWriteStream(filename));
});
parser.on('field', (name, value) => {
console.log(`Received field: ${name} = ${value}`);
});
await parser.done();This pattern, while a larger departure from the current API, is proven to be highly effective for this use case.
Conclusion
Implementing a true streaming primitive for file parts would solidify @remix-run/multipart-parser's position as a top-tier solution. It would combine its already benchmarked speed with the memory safety required for modern, production-grade applications, making it a clear and compelling choice for all multipart parsing needs.