March 16, 2026

How to Build an Audio Transcription Bot for Microsoft Teams

Microsoft Teams powers collaboration for millions of organizations worldwide, yet capturing meeting discussions remains a manual, time-consuming process. Building an automated transcription bot eliminates this burden by joining meetings programmatically, recording conversations, and generating accurate transcripts. This guide demonstrates how to create a production-ready Teams transcription bot using Microsoft’s Bot Framework and Azure services.

Understanding Teams Bot Architecture

Microsoft Teams provides a robust Bot Framework that allows bots to join meetings as participants. Your bot will register with Azure Bot Service, authenticate using Microsoft Graph API, join meetings through the Teams client, capture audio streams, and process them through Azure Cognitive Services or third-party transcription APIs.

Prerequisites and Azure Setup

Install the required dependencies:

pip install botbuilder-core botbuilder-schema azure-cognitiveservices-speech msal requests python-dotenv aiohttp

Create an Azure Bot and register your application:

Register an app in Azure Active Directory
Create an Azure Bot resource
Enable Microsoft Teams channel
Configure bot permissions for meetings

Set up your .env file:

MICROSOFT_APP_ID=your_app_id

MICROSOFT_APP_PASSWORD=your_app_password

TENANT_ID=your_tenant_id

AZURE_SPEECH_KEY=your_speech_key

AZURE_SPEECH_REGION=your_region

Step 1: Implement Bot Authentication

Create authentication handler for Microsoft Graph API:

import msal

import os

from dotenv import load_dotenv

load_dotenv()

class TeamsAuthProvider:

def __init__(self):

self.client_id = os.getenv(“MICROSOFT_APP_ID”)

self.client_secret = os.getenv(“MICROSOFT_APP_PASSWORD”)

self.tenant_id = os.getenv(“TENANT_ID”)

self.authority = f”https://login.microsoftonline.com/{self.tenant_id}”

self.scope = [“https://graph.microsoft.com/.default”]

def get_access_token(self):

“””Acquire access token for Microsoft Graph API”””

app = msal.ConfidentialClientApplication(

self.client_id,

authority=self.authority,

client_credential=self.client_secret

)

result = app.acquire_token_for_client(scopes=self.scope)

if “access_token” in result:

return result[“access_token”]

else:

raise Exception(f”Authentication failed: {result.get(‘error_description’)}”)

def get_app_token(self):

“””Get bot framework authentication token”””

token_url = “https://login.microsoftonline.com/botframework.com/oauth2/v2.0/token”

data = {

‘grant_type’: ‘client_credentials’,

‘client_id’: self.client_id,

‘client_secret’: self.client_secret,

‘scope’: ‘https://api.botframework.com/.default’

}

import requests

response = requests.post(token_url, data=data)

if response.status_code == 200:

return response.json()[‘access_token’]

else:

raise Exception(f”Token acquisition failed: {response.text}”)

Step 2: Create the Teams Bot Core

Build the bot that responds to Teams events and joins meetings:

from botbuilder.core import ActivityHandler, TurnContext

from botbuilder.schema import Activity, ActivityTypes

import asyncio

class TranscriptionBot(ActivityHandler):

def __init__(self):

self.auth_provider = TeamsAuthProvider()

self.active_meetings = {}

async def on_message_activity(self, turn_context: TurnContext):

“””Handle incoming messages”””

text = turn_context.activity.text.strip().lower()

if text == “join”:

await turn_context.send_activity(“Please invite me to a meeting!”)

elif text == “status”:

status = f”Active transcriptions: {len(self.active_meetings)}”

await turn_context.send_activity(status)

else:

await turn_context.send_activity(

“Send ‘join’ to start transcription or ‘status’ to check active sessions”

)

async def on_teams_meeting_start(self, meeting_start_event, turn_context: TurnContext):

“””Handle meeting start event”””

meeting_id = meeting_start_event.meeting.id

print(f”Meeting started: {meeting_id}”)

# Join the meeting and start transcription

await self.join_meeting(meeting_id, turn_context)

async def on_teams_meeting_end(self, meeting_end_event, turn_context: TurnContext):

“””Handle meeting end event”””

meeting_id = meeting_end_event.meeting.id

print(f”Meeting ended: {meeting_id}”)

# Stop transcription and generate final transcript

if meeting_id in self.active_meetings:

await self.stop_transcription(meeting_id)

async def join_meeting(self, meeting_id, turn_context):

“””Join Teams meeting programmatically”””

try:

# Store meeting context

self.active_meetings[meeting_id] = {

‘context’: turn_context,

‘start_time’: asyncio.get_event_loop().time()

}

await turn_context.send_activity(

“Transcription bot has joined the meeting”

)

print(f”Successfully joined meeting: {meeting_id}”)

except Exception as e:

print(f”Error joining meeting: {e}”)

await turn_context.send_activity(

f”Failed to join meeting: {str(e)}”

)

Step 3: Implement Audio Capture and Processing

Create audio stream handler for Teams meetings:

import azure.cognitiveservices.speech as speechsdk

from datetime import datetime

class TeamsAudioProcessor:

def __init__(self):

self.speech_key = os.getenv(“AZURE_SPEECH_KEY”)

self.speech_region = os.getenv(“AZURE_SPEECH_REGION”)

self.transcript_buffer = []

def create_speech_config(self):

“””Configure Azure Speech Services”””

speech_config = speechsdk.SpeechConfig(

subscription=self.speech_key,

region=self.speech_region

)

speech_config.speech_recognition_language = “en-US”

speech_config.request_word_level_timestamps()

speech_config.enable_dictation()

return speech_config

async def start_transcription(self, audio_stream):

“””Start real-time transcription from audio stream”””

speech_config = self.create_speech_config()

# Configure audio input

audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)

# Create conversation transcriber for speaker identification

conversation_transcriber = speechsdk.transcription.ConversationTranscriber(

speech_config=speech_config,

audio_config=audio_config

)

# Event handlers

conversation_transcriber.transcribed.connect(self._on_transcribed)

conversation_transcriber.canceled.connect(self._on_canceled)

conversation_transcriber.session_started.connect(self._on_session_started)

conversation_transcriber.session_stopped.connect(self._on_session_stopped)

# Start transcription

await conversation_transcriber.start_transcribing_async()

print(“Transcription started”)

return conversation_transcriber

def _on_transcribed(self, evt):

“””Handle transcribed text”””

if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:

speaker_id = evt.result.speaker_id

text = evt.result.text

offset = evt.result.offset

entry = {

‘timestamp’: self._format_timestamp(offset),

‘speaker’: f”Speaker {speaker_id}”,

‘text’: text

}

self.transcript_buffer.append(entry)

print(f”[{entry[‘timestamp’]}] {entry[‘speaker’]}: {entry[‘text’]}”)

def _on_canceled(self, evt):

“””Handle transcription errors”””

print(f”Transcription canceled: {evt.reason}”)

if evt.reason == speechsdk.CancellationReason.Error:

print(f”Error details: {evt.error_details}”)

def _on_session_started(self, evt):

“””Handle session start”””

print(“Transcription session started”)

def _on_session_stopped(self, evt):

“””Handle session stop”””

print(“Transcription session stopped”)

def _format_timestamp(self, offset_ticks):

“””Convert ticks to readable timestamp”””

seconds = offset_ticks / 10000000

minutes = int(seconds // 60)

secs = int(seconds % 60)

return f”{minutes:02d}:{secs:02d}”

def save_transcript(self, meeting_id, filename=None):

“””Save transcript to file”””

if not filename:

timestamp = datetime.now().strftime(‘%Y%m%d_%H%M%S’)

filename = f”teams_transcript_{meeting_id}_{timestamp}.txt”

with open(filename, ‘w’, encoding=’utf-8′) as f:

f.write(“Microsoft Teams Meeting Transcript\n”)

f.write(“=” * 70 + “\n”)

f.write(f”Meeting ID: {meeting_id}\n”)

f.write(f”Generated: {datetime.now().strftime(‘%Y-%m-%d %H:%M:%S’)}\n”)

f.write(“=” * 70 + “\n\n”)

for entry in self.transcript_buffer:

f.write(f”[{entry[‘timestamp’]}] {entry[‘speaker’]}:\n”)

f.write(f”{entry[‘text’]}\n\n”)

print(f”Transcript saved: {filename}”)

return filename

Step 4: Handle Teams Meeting Media Streams

Implement media stream handling for audio capture:

import aiohttp

import json

class TeamsMediaHandler:

def __init__(self, bot_id, auth_provider):

self.bot_id = bot_id

self.auth_provider = auth_provider

self.media_sessions = {}

async def subscribe_to_audio(self, meeting_id, participant_id):

“””Subscribe to audio stream from Teams meeting”””

token = self.auth_provider.get_access_token()

headers = {

‘Authorization’: f’Bearer {token}’,

‘Content-Type’: ‘application/json’

}

# Subscribe to audio stream

subscription_data = {

‘resource’: f’/communications/calls/{meeting_id}/audioStreams’,

‘changeType’: ‘created,updated’,

‘notificationUrl’: f’https://your-bot-endpoint.com/api/notifications’,

‘clientState’: ‘transcription-bot-secret’

}

async with aiohttp.ClientSession() as session:

async with session.post(

‘https://graph.microsoft.com/v1.0/subscriptions’,

headers=headers,

json=subscription_data

) as response:

if response.status == 201:

subscription = await response.json()

self.media_sessions[meeting_id] = subscription

print(f”Subscribed to audio stream: {meeting_id}”)

return subscription

else:

error = await response.text()

raise Exception(f”Subscription failed: {error}”)

async def get_audio_stream(self, meeting_id):

“””Retrieve audio stream data”””

if meeting_id not in self.media_sessions:

raise Exception(f”No active subscription for meeting: {meeting_id}”)

token = self.auth_provider.get_access_token()

headers = {

‘Authorization’: f’Bearer {token}’

}

stream_url = f’https://graph.microsoft.com/v1.0/communications/calls/{meeting_id}/audioStream’

async with aiohttp.ClientSession() as session:

async with session.get(stream_url, headers=headers) as response:

if response.status == 200:

return response.content

else:

error = await response.text()

raise Exception(f”Failed to get audio stream: {error}”)

Step 5: Build the Complete Integration

Combine all components into a working transcription bot:

from aiohttp import web

from botbuilder.core import BotFrameworkAdapter, BotFrameworkAdapterSettings

from botbuilder.schema import Activity

class TeamsTranscriptionBotApp:

def __init__(self):

self.settings = BotFrameworkAdapterSettings(

app_id=os.getenv(“MICROSOFT_APP_ID”),

app_password=os.getenv(“MICROSOFT_APP_PASSWORD”)

)

self.adapter = BotFrameworkAdapter(self.settings)

self.bot = TranscriptionBot()

self.audio_processor = TeamsAudioProcessor()

self.media_handler = TeamsMediaHandler(

self.settings.app_id,

TeamsAuthProvider()

)

async def messages_handler(self, req: web.Request) -> web.Response:

“””Handle incoming messages from Teams”””

if “application/json” in req.headers[“Content-Type”]:

body = await req.json()

else:

return web.Response(status=415)

activity = Activity().deserialize(body)

auth_header = req.headers.get(“Authorization”, “”)

async def call_bot(turn_context):

await self.bot.on_turn(turn_context)

await self.adapter.process_activity(activity, auth_header, call_bot)

return web.Response(status=200)

async def start_bot_transcription(self, meeting_id):

“””Initialize transcription for a meeting”””

try:

# Subscribe to media stream

await self.media_handler.subscribe_to_audio(meeting_id, self.settings.app_id)

# Get audio stream

audio_stream = await self.media_handler.get_audio_stream(meeting_id)

# Start transcription

await self.audio_processor.start_transcription(audio_stream)

print(f”Transcription active for meeting: {meeting_id}”)

except Exception as e:

print(f”Error starting transcription: {e}”)

async def stop_bot_transcription(self, meeting_id):

“””Stop transcription and save results”””

try:

# Save transcript

filename = self.audio_processor.save_transcript(meeting_id)

print(f”Transcription completed: {filename}”)

return filename

except Exception as e:

print(f”Error stopping transcription: {e}”)

def run(self, host=’0.0.0.0′, port=3978):

“””Start the bot web server”””

app = web.Application()

app.router.add_post(‘/api/messages’, self.messages_handler)

web.run_app(app, host=host, port=port)

print(f”Bot running on {host}:{port}”)

# Main execution

if __name__ == “__main__”:

bot_app = TeamsTranscriptionBotApp()

bot_app.run()

Deployment and Configuration

Deploy your bot to Azure App Service or Azure Container Instances for production use. Configure your bot endpoint in Azure Bot Service and update the messaging endpoint URL. Enable the Teams channel and configure meeting event subscriptions.

Set up ngrok for local development testing:

ngrok http 3978

Update your bot’s messaging endpoint to the ngrok URL in Azure Portal.

For production, implement secure storage for transcripts using Azure Blob Storage. Add monitoring with Application Insights to track bot performance and errors. Configure auto-scaling to handle multiple concurrent meetings.

Security Best Practices

Store all credentials in Azure Key Vault instead of environment variables. Implement rate limiting to prevent abuse. Use managed identities for Azure resource access. Enable encryption for stored transcripts and implement role-based access control for transcript retrieval.

Your Microsoft Teams transcription bot now automatically joins meetings, captures audio, generates speaker-labeled transcripts, and integrates seamlessly with the Teams ecosystem.

Conclusion

Building a custom Microsoft Teams transcription bot gives you complete control over meeting documentation and data handling, with deep integration into your Azure infrastructure. However, managing bot registration, media streams, Azure services, and compliance requirements demands significant development and maintenance effort.

If you want enterprise-grade transcription without the complexity, consider Meetstream.ai API, which provides ready-to-use transcription for Microsoft Teams, Zoom, and Google Meet with simple API integration.