CodeSOTA
A continuation of the Papers With Code idea: task pages, model rankings, evidence notes, and practical guidance for choosing the right model.
- Benchmarks by task
- Editorial model selection
- Evidence over vendor claims
Neolab · applied AI systems
A small independent lab behind CodeSOTA, Hardparse, and Hermes SDK. We benchmark models, turn the reliable parts into tools, and run the experiments on our own GPU box when cloud credits stop being funny.
Small lab, public artifacts
Benchmarks before marketing
Useful prototypes over decks
One 3090 counts as a cluster if it has a queue
What this is
Fabryka.ai is the container for experiments that need a public face: model selection research, document AI infrastructure, and agentic tooling. The output is deliberately concrete: benchmark pages, SDKs, parsers, demos, and working services.
Portfolio
A continuation of the Papers With Code idea: task pages, model rankings, evidence notes, and practical guidance for choosing the right model.
Document parsing and OCR infrastructure spun out of benchmark work: receipts, PDFs, tables, layouts, and model comparisons.
An ongoing SDK and runtime track for agentic systems, tool use, local execution, routing, and practical automation research.
Our own inference router: a single endpoint that sends each request to the right model and provider, trading off quality, latency, and cost across open-weight and hosted backends.
An AI coworker that lives inside Microsoft Teams: it reads threads and file attachments, answers in-channel, and runs real work through the Graph API instead of being yet another standalone chatbot.
A 48-item benchmark for Polish bureaucratic reasoning across 11 domains, scored with a deterministic rubric instead of an LLM judge. Live leaderboard, frontier models ranked.
A Polish offline text-to-speech voice, fine-tuned from XTTS-v2 on our own RTX 3090 — the full pipeline from raw audio to a trained checkpoint, built in public.
Traction · codesota.com
Infrastructure
This is intentionally small. The point is not pretending to be a hyperscaler. The point is owning enough compute to reproduce runs, test open-weight models, compare OCR pipelines, and keep agent experiments honest before moving anything into hosted infrastructure.
Operating mode
CodeSOTA decides what is worth using. Hardparse turns document-model lessons into a focused OCR product. Hermes SDK explores how agents should call tools, execute work, and hand off between local and hosted compute.
Start a conversationOpen invitation
We’re looking for contributors — people who want to dig into genuinely interesting problems (benchmarks, parsers, agents, models) inside a small, helpful community. Bring taste and curiosity; we’ll bring the queue.
Join the benchWant to hire the lab, start a company, or chase a weirder idea together? If it’s interesting and useful — applied AI, tooling, or something we haven’t thought of yet — we want to hear about it.
Start something