whatcani.run is a benchmarking platform that provides real-world performance data for running Large Language Models (LLMs) locally. The service aggregates crowd-sourced data from users to help individuals identify which models their specific hardware can efficiently execute. Core metrics analyzed include Decode speed, Prefill speed, Time to First Token (TTFT), and Peak memory usage. The platform is designed to provide practical insights into local inference, moving beyond theoretical benchmarks to verifiable results from community members. Key technical metrics are standardized across trials using 4,096 input tokens and 1,024 output tokens. As of the current documentation, the database includes data from 18,611,200 tokens across 3,635 trials contributed by 161 individuals. For instance, testing a Qwen3.5-9B model (Unsloth q4_k_m) on an M3 Pro chip shows a decode speed of 15.8 tok/s and peak memory usage of 2.93 GB. Users can interact with the service via a web interface or a command-line tool by running 'npx whatcanirun'. Mentioned entities and tools include: Unsloth, llama.cpp, Qwen3.5-9B, GitHub, and Hugging Face. The project is maintained by fiveoutofnine.
Timeln saves articles, videos, and posts — then summarizes, tags, and connects them so you never lose a good find again.
Save anything
one click
AI summaries
instant
Connected ideas
automatic
Free forever · No credit card · 30 seconds to start