Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

AI & ML··2 min read·via RedditOriginal source →

Building a monokernel for LLM inference on AMD MI300X - up to 3,300 output tokens/s per request [P]

Score: Comments:

More Stories