Instant Article — turned Groq's own blog post into a publish-ready article in 1.72 seconds

Ntty · February 9, 2026, 9:45pm

Paste any text and get a publish-ready 800-1200 word SEO/GEO optimized-article in under 2 seconds, streamed live token by token. Accepts URLs and file uploads too. 3 free generations, no signup.

Link: https://www.citedy.com/tools/instant-article

How I came up with the idea:

Content creators spend 20+ minutes manually remixing blog posts into social snippets, newsletter segments, email teases. I wanted to prove that with the right inference stack, this takes under 2 seconds. The speed itself is the tech demo.

For this showcase I grabbed Groq’s own blog post about MoE model support, pasted it in, and got a complete reformatted article back in 1.72 seconds. First token in 356ms.

What differentiates it:

Complete 800-1200 word articles from pasted text in 0.8–3 seconds
First token under 400ms — streaming-first, users watch every word land live
Real-time performance metrics displayed right in the UI (TTFT, first paragraph, total time)
Output passes through our custom fine-tuned SEO/GEO optimization model — articles are optimized for both search engines and AI assistants (ChatGPT, Perplexity, Gemini etc) out of the box
9-model fallback chain ensures 99.9% uptime — Groq is a key provider in the priority chain

Architecture:

Multi-provider inference chain with intelligent failover. Streaming SSE pipeline renders token deltas directly in the browser with live performance metrics. After generation, a second pass through our fine-tuned SEO/GEO model optimizes output for discoverability across both traditional search and AI assistants.

One non-AI thing about me: 25 years in product dev — shipped projects with Apple, LucasFilm, Sony Pictures, DreamWorks, Coca-Cola, and others. Father of 4 kids, which led me to build talents.kids — a talents discovery engine for children. Currently in Portugal, where the pastéis de nata intake is becoming a serious health concern.

meshari · February 11, 2026, 4:29pm

Very nice work Ntty. Sub-400ms TTFT and a full 800–1200 word article in a couple seconds, with live metrics and multi-provider failover is impressive. Which Groq model are you using in the primary path?

Ntty · February 11, 2026, 9:59pm

Although it does not offer the highest tokens-per-second performance, models like Qwen3-235B-Instruct and ZAI-GLM-4.7 are used for their strong reasoning and instruction-following quality. The system applies unconventional chunking strategies to optimize context handling and improve response accuracy.

In addition, GPT-OSS-120B is deployed through agent cloning, enabling parallel candidate generation. I use parallel candidate generation with a high temperature setting (0.9) to produce diverse ideas and approaches. Four requests are sent simultaneously to the GPT-OSS-120B model, and the results are collected almost instantly thanks to the high-speed Cerebras infrastructure, with automatic failover if needed.

After that, a synthesis engine with a low temperature (0.2) processes all candidates. It extracts the best insights from each response, resolves conflicts, and produces a single, consistent, high-quality final output.

Topic		Replies	Views
OpenAI's gpt-oss now available, plus built-in tools and responses API! Announcements	0	304	August 5, 2025
Search model Forum	3	104	September 4, 2025
Qwen3-32B is Now LIVE on Groq! Announcements	1	187	June 11, 2025
Qwen3 Coder & Qwen3 235B A22B Thinking 2507 Feature Requests	9	350	December 31, 2025
Appreciate you Groq Forum	2	93	August 24, 2025

Instant Article — turned Groq's own blog post into a publish-ready article in 1.72 seconds

Related topics