Deepgram + DeepSeek + Fish.Audio: Build Your Own Voice Assistant with TEN-Agent

TEN Framework
4 min readJan 29, 2025

--

Over the past couple of days, everyone has been talking about DeepSeek. On January 20th, they released their AI model, R1, which delivers performance comparable to OpenAI’s latest model at an incredibly low cost. Remarkably, their AI assistant app has even surpassed ChatGPT in rankings, becoming the most downloaded app. We’ve been closely following this and are eager to explore if it’s possible to integrate DeepSeek into TEN. DeepSeek also offers a free usage quota upon registration, so you can try out its API. If you’re interested, why not see how you can use DeepSeek to create your own voice assistant?

Today, we’ll walk you through how to use Deepgram, DeepSeek, and Fish.Audio to build a free voice assistant in TEN-Agent. We’ll use:

  • Deepgram as the STT (Speech-to-Text) service,
  • DeepSeek as the LLM (Large Language Model) service,
  • Fish.Audio as the TTS (Text-to-Speech) service,
  • Agora for real-time voice communication between users and cloud-based AI.

Prerequisites

First, we need to prepare the API keys for each service. Each platform provides a free usage quota. Here’s how to obtain the API keys:

  • Deepgram: Sign up to get your API Key.
  • DeepSeek: Sign up to get your API Key.
  • Fish.Audio: Sign up to get your API Key.
  • Agora.io: Sign up to get your App ID and App Certificate.

TEN-Agent relies on Docker for its development environment, so make sure to install Docker beforehand.

Setting Up TEN-Agent

Next, set up TEN-Agent by following the TEN-Agent Quick Start Guide.

Once properly launched, you should see a screen like this:

Playground on localhost: 3000

At this point, we haven’t configured the individual modules or their API keys. Let’s proceed to configure them step by step.

Configuring STT

First, we’ll configure the STT (Speech-to-Text) module using Deepgram. Open the module selector, choose Deepgram from the STT dropdown menu, and save. If Deepgram is already selected by default, no further action is needed.

Select STT

Next, configure the Deepgram API key. Click the button on the right of the module selector to open the property configuration. Enter your API Key in the field that appears and save.

Config STT

Configuring TTS

Now let’s configure the TTS (Text-to-Speech) module using Fish.Audio. Open the module selector, choose Fish.Audio from the TTS dropdown menu, and save. If Fish.Audio is already selected by default, no action is needed.

Select TTS

Next, configure the Fish.Audio API key. Click the button on the right of the module selector to open the property configuration. Enter your API Key and save. Fish.Audio supports different voice tones and allows you to clone custom voices. If you wish to configure a different voice tone, set the model_id property accordingly.

Config TTS

Configuring LLM

Finally, configure the LLM module to use DeepSeek as the language model. Since DeepSeek’s API is compatible with OpenAI’s API, select OpenAI as the LLM module. Open the module selector, choose OpenAI from the LLM dropdown menu, and save. If OpenAI is already selected by default, no action is needed.

Select LLM

Next, configure the LLM module properties to use DeepSeek’s service. Click the button on the right of the module selector to open the property configuration and set the following properties:

  • api_key: Your DeepSeek API Key
  • model: deepseek-chat (DeepSeek’s model name)
  • base_url: https://api.deepseek.com/v1 (DeepSeek’s API URL)
Config LLM

Save your changes.

Starting the Voice Assistant

Now that all modules are configured, click the Connect button. After a few seconds, you can start interacting with your voice assistant!

TEN Agent Connect with Deepseek

Additional Customizations

You can customize your voice assistant further by binding additional modules like weather, news, etc. Simply select the required module from the module selector and configure its properties. You can also adjust the LLM prompt to make the assistant more aligned with your specific needs and style.

--

--

TEN Framework
TEN Framework

Written by TEN Framework

Transformative Extensions Network and represents the Next-Gen AI-Agent Framework, the world's first truly real-time multimodal AI agent framework.

No responses yet