Provide a link to your weights and we autonomously set up a production endpoint for you usually within an hour. An agent handles container setup, GPU selection, and deployment for any common or custom model.
If you would like to try Moonshine, please reach out.
FAQ
What are the limits of “deploy every model”?
Moonshine deploys your model as a callable endpoint on serverless GPUs that scale with load. It's a model behind an API, not a backend, so it does not support long-running processes or persistent data.
What's the full workflow?
- Upload your model code + weights to GitHub, Hugging Face, etc.
- Provide the link and start the deployment process. Moonshine builds the image, selects the hardware, and deploys the model for you.
- Get an email when it's live with your endpoint and generated API documentation.
- Update the API, hardware, or redeploy afterwards.
What model types/frameworks do you support?
Any of them. Moonshine is framework and architecture agnostic. The only requirement is the shape, not the framework: the model runs on a GPU and works in a single call with inputs in and results out.
What GPUs do you support?
| GPU | VRAM |
|---|---|
| B300 | 288 GB |
| B200 | 180 GB |
| H200 | 141 GB |
| RTX Pro 6000 | 96 GB |
| H100 NVL | 94 GB |
| H100 SXM | 80 GB |
| H100 PCIe | 80 GB |
| A100 SXM | 80 GB |
| A100 PCIe | 80 GB |
| L40S | 48 GB |
| L40 | 48 GB |
| RTX 6000 Ada | 48 GB |
| RTX A6000 | 48 GB |
| A40 | 48 GB |
| RTX 5090 | 32 GB |
| RTX 4090 | 24 GB |
| RTX 3090 | 24 GB |
| L4 | 24 GB |
| RTX A5000 | 24 GB |
| Your own compute | |
Where does my model run?
Moonshine provisions serverless GPUs from a network of compute providers and auto-selects the right hardware. Nothing is stored.
How much does it cost?
Pricing is usage based, contact us to learn more.
Who is Moonshine?
We previously built video models and spent too much time on deployment. We're backed by Y Combinator and a number of angels.