Neolab · applied AI systems

The weird useful layer between AI papers and working products.

A small independent lab behind CodeSOTA, Hardparse, and Hermes SDK. We benchmark models, turn the reliable parts into tools, and run the experiments on our own GPU box when cloud credits stop being funny.

fig. 01 — tractioncodesota.com
trailing 12 mo45,233
dec ’26 run-rate17.4K
→ 100K crossoverJul ’27
f.01

Small lab, public artifacts

f.02

Benchmarks before marketing

f.03

Useful prototypes over decks

f.04

One 3090 counts as a cluster if it has a queue

What this is

Not an agency. Not a newsletter. A bench-to-product lab.

Fabryka.ai is the container for experiments that need a public face: model selection research, document AI infrastructure, and agentic tooling. The output is deliberately concrete: benchmark pages, SDKs, parsers, demos, and working services.

Portfolio

Active tracks

[ 07 / 07 ]
№ 01Benchmark page

CodeSOTA

A continuation of the Papers With Code idea: task pages, model rankings, evidence notes, and practical guidance for choosing the right model.

  • Benchmarks by task
  • Editorial model selection
  • Evidence over vendor claims
Open CodeSOTA
№ 02OCR spinoff

Hardparse

Document parsing and OCR infrastructure spun out of benchmark work: receipts, PDFs, tables, layouts, and model comparisons.

  • OCR pipelines
  • PDF and image parsing
  • Eval against messy documents
Open Hardparse
№ 03Agentic research

Hermes SDK

An ongoing SDK and runtime track for agentic systems, tool use, local execution, routing, and practical automation research.

  • Agent runtimes
  • Tool gateways
  • Local plus hosted execution
Follow research
№ 04Inference router

Router

Our own inference router: a single endpoint that sends each request to the right model and provider, trading off quality, latency, and cost across open-weight and hosted backends.

  • One endpoint, many models
  • Routes on quality, cost, latency
  • Open-weight + hosted backends
Open router
№ 05Teams agent

Robotnik

An AI coworker that lives inside Microsoft Teams: it reads threads and file attachments, answers in-channel, and runs real work through the Graph API instead of being yet another standalone chatbot.

  • Lives in Microsoft Teams
  • Reads threads + attachments
  • Acts via the Graph API
See how it works
№ 06Polish LLM benchmark

ZusWaveBench

A 48-item benchmark for Polish bureaucratic reasoning across 11 domains, scored with a deterministic rubric instead of an LLM judge. Live leaderboard, frontier models ranked.

  • 48 items, 11 domains
  • Deterministic scorer
  • Live public leaderboard
Open leaderboard
№ 07Polish TTS

Bozenka

A Polish offline text-to-speech voice, fine-tuned from XTTS-v2 on our own RTX 3090 — the full pipeline from raw audio to a trained checkpoint, built in public.

  • XTTS-v2 fine-tune
  • Trained on the 3090
  • Built in public
Read the build log

Traction · codesota.com

Zero to 100k monthly by Jul 2027.

[ visitors / mo ]
codesota.com — unique visitors / month
025K50K75K100K100K / mo targetnowforecast →100K · Jul ’27Dec '25Mar '26May '26Sep '26Dec '26Mar '27Jul '27
45,233visitors · trailing 12 mo
17.4Kdec ’26 · run-rate / mo
+30%modeled mom growth
Jul ’27≈ 100K / mo crossover

Infrastructure

The cluster is one RTX 3090, a queue, and taste.

This is intentionally small. The point is not pretending to be a hyperscaler. The point is owning enough compute to reproduce runs, test open-weight models, compare OCR pipelines, and keep agent experiments honest before moving anything into hosted infrastructure.

cluster — node 00online
1NVIDIA RTX 3090
24GBVRAM for local model runs
3090cluster size, if you ask politely

Operating mode

Benchmark, parse, route, ship.

CodeSOTA decides what is worth using. Hardparse turns document-model lessons into a focused OCR product. Hermes SDK explores how agents should call tools, execute work, and hand off between local and hosted compute.

Start a conversation

Open invitation

Two ways in.

[ join / work ]
contributors

Work on interesting things.

We’re looking for contributors — people who want to dig into genuinely interesting problems (benchmarks, parsers, agents, models) inside a small, helpful community. Bring taste and curiosity; we’ll bring the queue.

Join the bench
work with us

Hire us, or build something.

Want to hire the lab, start a company, or chase a weirder idea together? If it’s interesting and useful — applied AI, tooling, or something we haven’t thought of yet — we want to hear about it.

Start something