nextbigthing.my

⌘K

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

AI & ML·May 29, 2026·2 min read·via RedditOriginal source →

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

Score: Comments:

More Stories

To see to it that the forces of Napoleon are driven out of Spain (1809)

To see to it that the forces of Napoleon are driven out of Spain (1809)

May 30 · 2 min read

SQLite is all you need for durable workflows

SQLite is all you need for durable workflows

May 30 · 2 min read

Bill C-22 Is a Mess of the Government's Own Making

Bill C-22 Is a Mess of the Government's Own Making

May 30 · 2 min read

CVE-2026-48710: A Maintainer's Perspective

CVE-2026-48710: A Maintainer's Perspective

May 30 · 2 min read