Instant Article β€” turned Groq's own blog post into a publish-ready article in 1.72 seconds

Paste any text and get a publish-ready 800-1200 word SEO/GEO optimized-article in under 2 seconds, streamed live token by token. Accepts URLs and file uploads too. 3 free generations, no signup.

Link: https://www.citedy.com/tools/instant-article

How I came up with the idea:

Content creators spend 20+ minutes manually remixing blog posts into social snippets, newsletter segments, email teases. I wanted to prove that with the right inference stack, this takes under 2 seconds. The speed itself is the tech demo.

For this showcase I grabbed Groq’s own blog post about MoE model support, pasted it in, and got a complete reformatted article back in 1.72 seconds. First token in 356ms.

What differentiates it:

  • Complete 800-1200 word articles from pasted text in 0.8–3 seconds

  • First token under 400ms β€” streaming-first, users watch every word land live

  • Real-time performance metrics displayed right in the UI (TTFT, first paragraph, total time)

  • Output passes through our custom fine-tuned SEO/GEO optimization model β€” articles are optimized for both search engines and AI assistants (ChatGPT, Perplexity, Gemini etc) out of the box

  • 9-model fallback chain ensures 99.9% uptime β€” Groq is a key provider in the priority chain

Architecture:

Multi-provider inference chain with intelligent failover. Streaming SSE pipeline renders token deltas directly in the browser with live performance metrics. After generation, a second pass through our fine-tuned SEO/GEO model optimizes output for discoverability across both traditional search and AI assistants.

One non-AI thing about me: 25 years in product dev β€” shipped projects with Apple, LucasFilm, Sony Pictures, DreamWorks, Coca-Cola, and others. Father of 4 kids, which led me to build talents.kids β€” a talents discovery engine for children. Currently in Portugal, where the pastΓ©is de nata intake is becoming a serious health concern.

1 Like

Very nice work Ntty. Sub-400ms TTFT and a full 800–1200 word article in a couple seconds, with live metrics and multi-provider failover is impressive. Which Groq model are you using in the primary path?

1 Like

Although it does not offer the highest tokens-per-second performance, models like Qwen3-235B-Instruct and ZAI-GLM-4.7 are used for their strong reasoning and instruction-following quality. The system applies unconventional chunking strategies to optimize context handling and improve response accuracy.

In addition, GPT-OSS-120B is deployed through agent cloning, enabling parallel candidate generation. I use parallel candidate generation with a high temperature setting (0.9) to produce diverse ideas and approaches. Four requests are sent simultaneously to the GPT-OSS-120B model, and the results are collected almost instantly thanks to the high-speed Cerebras infrastructure, with automatic failover if needed.

After that, a synthesis engine with a low temperature (0.2) processes all candidates. It extracts the best insights from each response, resolves conflicts, and produces a single, consistent, high-quality final output.

1 Like