Skip to main content

    whatcani.run: Local LLM Model Performance and Compatibility Guide

    local llmmodel benchmarkinghardware performanceinference optimizationtoken metrics
    April 7, 2026Source

    whatcani.run is a benchmarking platform that provides real-world performance data for running Large Language Models (LLMs) locally. The service aggregates crowd-sourced data from users to help individuals identify which models their specific hardware can efficiently execute. Core metrics analyzed include Decode speed, Prefill speed, Time to First Token (TTFT), and Peak memory usage. The platform is designed to provide practical insights into local inference, moving beyond theoretical benchmarks to verifiable results from community members. Key technical metrics are standardized across trials using 4,096 input tokens and 1,024 output tokens. As of the current documentation, the database includes data from 18,611,200 tokens across 3,635 trials contributed by 161 individuals. For instance, testing a Qwen3.5-9B model (Unsloth q4_k_m) on an M3 Pro chip shows a decode speed of 15.8 tok/s and peak memory usage of 2.93 GB. Users can interact with the service via a web interface or a command-line tool by running 'npx whatcanirun'. Mentioned entities and tools include: Unsloth, llama.cpp, Qwen3.5-9B, GitHub, and Hugging Face. The project is maintained by fiveoutofnine.

    Share this

    Want AI summaries like this for everything you read?

    Timeln saves articles, videos, and posts — then summarizes, tags, and connects them so you never lose a good find again.

    Save anything

    one click

    AI summaries

    instant

    Connected ideas

    automatic

    Start saving for free

    Free forever · No credit card · 30 seconds to start