Plans

Explaining how our different subscription tiers work.

Featherless provides serverless access to models as well agent runtimes, enabling you to power AI-applications without needing to manage infrastructure.

Our plans are subscription and concurrency based. Allowing unlimited monthly requests with a fixed number of concurrent requests. Subscription tiers differ by the model sizes and context lengths offered.

Featherless offers consumer plans with smaller plans for interactive chat for assistants and role-playing, and two larger plans for agentic inference and coding plans.

Consumer Plans

*Plan*	*Tier*	*Price (/month)*	*Features*
Featherless Basic	Chat	$10	Private, secure, and anonymous usage (no logs) Use any model up to 15B parameters 2 concurrent connections*
Featherless Premium	Chat	$25	Private, secure, and anonymous usage (no logs) Access *any* model in the catalogue (including K2.5 & GLM 5.1) Up to 4 concurrent connections*
Featherless Agent Standard	Agentic	$100	Private, secure, and anonymous usage (no logs) Access any model up to 229B (e.g. StepFun, MiniMax) Up to 8 concurrent connections* Up to 256K context One agent sandbox
Featherless Agent Standard	Agentic	$200	Private, secure, and anonymous usage (no logs) Access *any* model in the catalogue (including Deepseek, K2.5 & GLM 5.1) Up to 8 concurrent connections* Up to 256K context One agent sandbox

*smaller models allow for higher concurrency than larger models. See more below.

Business Plans

Business plans are scalable, allowing users to purchase larger amounts of inference to power production applications - whether agent fleets or other AI applications.

*Plan*	*Price (/unit/month)*	*Features*
Featherless 229B Full Context	$100	Private, secure, and anonymous usage (no logs) Access any model up to 229B (e.g. StepFun, MiniMax) Up to 8 concurrent connections* Up to 256K context One sandbox per unit
Featherless Max	$200	Private, secure, and anonymous usage (no logs) Access *any* model in the catalogue (including Deepseek, K2.5 & GLM 5.1) Up to 8 concurrent connections* Up to 256K context One sandbox per unit

*For more info on how the concurrency limits work visit:

Concurrency Limits

Explaining how subscription tiers translate to concurrent inference call maximums.

Last edited: May 7, 2026