whatcani.run: Local LLM Model Performance and Compatibility Guide

local llmmodel benchmarkinghardware performanceinference optimizationtoken metrics

April 7, 2026Source

whatcani.run is a benchmarking platform that provides real-world performance data for running Large Language Models (LLMs) locally. The service aggregates crowd-sourced data from users to help individuals identify which models their specific hardware can efficiently execute. Core metrics analyzed include Decode speed, Prefill speed, Time to First Token (TTFT), and Peak memory usage. The platform is designed to provide practical insights into local inference, moving beyond theoretical benchmarks to verifiable results from community members. Key technical metrics are standardized across trials using 4,096 input tokens and 1,024 output tokens. As of the current documentation, the database includes data from 18,611,200 tokens across 3,635 trials contributed by 161 individuals. For instance, testing a Qwen3.5-9B model (Unsloth q4_k_m) on an M3 Pro chip shows a decode speed of 15.8 tok/s and peak memory usage of 2.93 GB. Users can interact with the service via a web interface or a command-line tool by running 'npx whatcanirun'. Mentioned entities and tools include: Unsloth, llama.cpp, Qwen3.5-9B, GitHub, and Hugging Face. The project is maintained by fiveoutofnine.

Want AI summaries like this for everything you read?

Timeln saves articles, videos, and posts — then summarizes, tags, and connects them so you never lose a good find again.

Save anything

one click

AI summaries

instant

Connected ideas

automatic

Start saving for free

Free forever · No credit card · 30 seconds to start