If there is a Chatbot with massive users and uses your own NLP model, it might face a bottleneck that back-end service is not easy to handle such NLP concurrency. To solve this kind problem, Nvidia has introduced an open-source inference server, Triton, which allowed to deploy AI model on GPU or CPU. It will maximize utilization of GPU/CPU for faster inference.
In this session, I will introduce Triton Inference Server and deploy NLP model Triton with a practical sample.
About Ko Ko
Ko Ko is a Microsoft AI MVP. He is dedicated to sharing AI and Chatbot related technology. He has been a lecturer in large conference in Taiwan, such as COSCUP, .NET CONF, Taiwan AI Academy Annual Conference and so on. And Ko Ko is also a core member in Chatbot Developers Taiwan.