Multi-language VOD Dubbing Guide

This guide walks you through MK.IO’s AI dubbing pipeline to automatically generate Spanish, German, and French audio tracks from English-language video content. At the end, your source asset streams with all four audio tracks (original English plus three dubbed languages) and be available for viewer selection.

Format requirement: VOD dubbing requires assets in the MP4v4 format, which produces a .mpd DASH manifest. Assets from older encoding pipelines — including those from Azure Media Services (AMS) or any source that generates a .ismc manifest — are not supported directly. If your asset is in the legacy HSS format, convert it first using the Convert assets to MP4v4 guide, then return here. For supported languages and specifications, see the AI workflows documentation.

Prerequisites

Active MK.IO project access
API token: available in Organisation Settings → API Tokens
English-language MP4 video (2–10 minutes recommended for testing)
Azure Storage account connected to MK.IO

Starting point: If you already have an MP4v4 asset with a .mpd manifest, skip to Step 2. If your asset is in the legacy HSS format (it has a .ismc manifest), see Convert HSS assets for VOD dubbing before continuing.

Step 1: Upload and encode source video

1.1 Create an asset

In the MK.IO dashboard, navigate to Assets → + Add Asset.
Select your storage location.
Enter the asset details:
- Asset name: english-source-video
- Container: videos
- Storage account: select your Azure Storage account
Upload your MP4 file. This guide uses a file named english-video-demo.mp4.
Select Upload and wait for it to complete.

1.2 Create an encoding transform and job

Navigate to Video Processing → Transforms → + Create Transform.
Configure the transform:
- Name: encode-streaming
- Type: Encoding
- Preset: H.264 Multiple Bitrate 1080p
Select Create.
Navigate to Video Processing → Jobs → + Create Job.
Configure the job:
- Name: encode-english-source
- Transform: encode-streaming
- Input asset name: english-source-video — select english-video-demo.mp4
- Output asset name: english-encoded
Select Create and monitor the job status.

Encoding job status view in MK.IO dashboard

Wait for the job status to show Finished.

Why encode first? Encoding generates the .mpd manifest file that track insertion operations require.

Step 2: Create multi-language dubs

2.1 Create a dubbing transform

A dubbing transform defines the source language and the target languages for the AI dubbing pipeline.

Parameter	Description
`@odata.type`	Must be set to `#MediaKind.AIPipelinePreset`
`pipeline name`	`Predefined_ACSVodSpeechToSpeech`
`language`	Source language code (for example, `en-US`)
`targetLanguages`	Array of target language codes
`speakerCount`	Number of speakers in the source audio (`auto` for automatic detection)
`personalVoice`	`true` to preserve the original speaker’s voice characteristics

Navigate to Video Processing → Transforms → + Create Transform.
Enter a Transform name, for example dubbing-transform.
Select AI workflow as the transform type.
Select Predefined_ACSVodSpeechToSpeech from the AI pipeline dropdown.
Configure the pipeline settings:
- Language: en-US
- Translate to: select es-ES, de-DE, and fr-FR
- Speaker count: auto
- Personal voice: leave unchecked to use synthetic voices
Select Create.

PUThttps://app.mk.io/api/v1/projects/<PROJECT_NAME>/media/transforms/<TRANSFORM_NAME>

Path Parameters

<PROJECT_NAME>Your unique project identifier

<TRANSFORM_NAME>Transform name (for example, dubbing-transform)

Request Body

{
  "properties": {
    "description": "AI dub: English to Spanish, German, French",
    "outputs": [
      {
        "preset": {
          "@odata.type": "#MediaKind.AIPipelinePreset",
          "pipeline": {
            "name": "Predefined_ACSVodSpeechToSpeech",
            "arguments": {
              "VodSpeechToSpeechTranslation": [
                {
                  "name": "language",
                  "value": "en-US"
                },
                {
                  "name": "targetLanguages",
                  "value": [
                    "es-ES",
                    "de-DE",
                    "fr-FR"
                  ]
                },
                {
                  "name": "speakerCount",
                  "value": "auto"
                },
                {
                  "name": "personalVoice",
                  "value": false
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Notes on the configuration:

Target languages: A single dubbing job generates all three language dubs simultaneously.
Personal voice: Setting personalVoice to false uses a synthetic voice. Set to true to attempt to preserve each speaker’s voice characteristics across languages.

2.2 Create a dubbing job

Navigate to Video Processing → Jobs → + Create Job.
Enter a Job name, for example dub-english-source.
Under Select a transform, choose dubbing-transform.
Under Select input, set:
- Input asset name: english-encoded
- Filename: english-video-demo_320x180_400k.mp4
Under Configure output, set:
- Asset storage account: select your Azure Storage account
- Output asset name: enter a new name, for example dubbed-audio — MK.IO creates this asset automatically
Select Create.

PUThttps://app.mk.io/api/v1/projects/<YOUR_PROJECT_NAME>/media/transforms/<TRANSFORM_NAME>/jobs/<JOB_NAME>

Path Parameters

<YOUR_PROJECT_NAME>Your unique project identifier

<TRANSFORM_NAME>The dubbing transform (for example, dubbing-transform)

<JOB_NAME>Unique identifier for this dubbing job

Request Body

{
  "properties": {
    "description": "Generate Spanish, German, French dubs",
    "priority": "Normal",
    "input": {
      "files": [
        "english-video-demo_320x180_400k.mp4"
      ],
      "@odata.type": "#Microsoft.Media.JobInputAsset",
      "assetName": "english-encoded"
    },
    "outputs": [
      {
        "@odata.type": "#Microsoft.Media.JobOutputAsset",
        "assetName": "dubbed-audio"
      }
    ]
  }
}

If you are creating the output asset via the API rather than through the job form, create it first using a PUT request to /projects/<PROJECT_NAME>/media/assets/dubbed-audio before submitting the job.

Input file: Specify any single encoded bitrate variant from the source asset (for example, english-video-demo_320x180_400k.mp4). All variants contain the audio track required for dubbing — the lowest bitrate file is fine.

Monitor progress: Navigate to Video Processing → Jobs and wait for the job status to show Finished.

Output files: When the job completes, the dubbed-audio asset contains three files:

english-video-demo_320x180_400k.mp4_es-ES.mp4 — Spanish
english-video-demo_320x180_400k.mp4_de-DE.mp4 — German
english-video-demo_320x180_400k.mp4_fr-FR.mp4 — French

Step 3: Insert audio tracks

This step adds the dubbed audio tracks to the encoded video asset, making all languages available to viewers.

3.1 Create track insertion transforms

Track insertion transforms must be created via the API. Track insertion jobs can be created using either the UI or API — see step 3.2.

Create three transforms — one per language. Each transform defines the track name, display name, and language code for the inserted audio.

Spanish insert transform:

PUThttps://app.mk.io/api/v1/projects/<YOUR_PROJECT_NAME>/media/transforms/<TRANSFORM_NAME>

Path Parameters

<YOUR_PROJECT_NAME>Your unique project identifier

<TRANSFORM_NAME>Transform name (for example, spanish-insert)

Request Body

{
  "properties": {
    "description": "Insert Spanish audio",
    "outputs": [
      {
        "preset": {
          "tracks": [
            {
              "@odata.type": "#MediaKind.AudioTrack",
              "trackName": "audio-spanish",
              "displayName": "Español (AI Dubbed)",
              "languageCode": "es-ES"
            }
          ],
          "@odata.type": "#MediaKind.TrackInserterPreset"
        },
        "relativePriority": "Normal"
      }
    ]
  }
}

Repeat for German and French:

German transform: set trackName to audio-german, displayName to Deutsch (AI Dubbed), and languageCode to de-DE
French transform: set trackName to audio-french, displayName to Français (AI Dubbed), and languageCode to fr-FR

3.2 Create track insertion jobs

Create one job per language. Each job inserts the corresponding dubbed audio file into the english-encoded asset.

Repeat the following for each language (Spanish, German, French):

Navigate to Video Processing → Jobs → + Create Job.
Enter a Job name, for example job-insert-spanish.
Under Select a transform, choose the corresponding insertion transform (for example, spanish-insert).
Under Select input, set:
- Input asset name: dubbed-audio
- Filename: the dubbed file for this language (for example, english-video-demo_320x180_400k.mp4_es-ES.mp4)
Under Configure output, set:
- Output asset name: english-encoded
Select Create.

Language	Transform	Input filename	Output asset
Spanish	`spanish-insert`	`..._es-ES.mp4`	`english-encoded`
German	`german-insert`	`..._de-DE.mp4`	`english-encoded`
French	`french-insert`	`..._fr-FR.mp4`	`english-encoded`

Spanish insertion job:

PUThttps://app.mk.io/api/v1/projects/<YOUR_PROJECT_NAME>/media/transforms/<TRANSFORM_NAME>/jobs/<JOB_NAME>

Path Parameters

<YOUR_PROJECT_NAME>Your unique project identifier

<TRANSFORM_NAME>The track insertion transform (for example, spanish-insert)

<JOB_NAME>Unique job identifier (for example, job-insert-spanish)

Request Body

{
  "properties": {
    "input": {
      "files": [
        "english-video-demo_320x180_400k.mp4_es-ES.mp4"
      ],
      "@odata.type": "#Microsoft.Media.JobInputAsset",
      "assetName": "dubbed-audio"
    },
    "outputs": [
      {
        "@odata.type": "#Microsoft.Media.JobOutputAsset",
        "assetName": "english-encoded"
      }
    ],
    "priority": "Normal"
  }
}

Repeat for German and French:

German: input file english-video-demo_320x180_400k.mp4_de-DE.mp4, job name job-insert-german
French: input file english-video-demo_320x180_400k.mp4_fr-FR.mp4, job name job-insert-french

Monitor progress: Wait for all three jobs to show Finished.

Verify: Navigate to Assets → english-encoded and open the Tracks section. You should see three audio tracks for Spanish, German, and French.

Asset tracks view showing Spanish, German, and French dubbed audio tracks

Step 4: Configure streaming

4.1 Create a streaming endpoint

Navigate to Streaming Endpoints → + Create Streaming Endpoint.
Configure the endpoint:
- Name: production
- Base URL: content
- Type: Dedicated
Select Create, then Start.

4.2 Create a streaming locator

Navigate to Assets and select english-encoded.
Select the production endpoint created in step 4.1.
Add a streaming locator:
- Name: live
- Policy: Predefined_DownloadAndClearStreaming
Copy the playback URLs provided.

4.3 Test multi-language playback

Select the embedded player in the asset details.
Use the audio track selector to switch between languages.
Confirm that Spanish, German, and French audio tracks are selectable alongside the original English track.