Microsoft Teams powers collaboration for millions of organizations worldwide, yet capturing meeting discussions remains a manual, time-consuming process. Building an automated transcription bot eliminates this burden by joining meetings programmatically, recording conversations, and generating accurate transcripts. This guide demonstrates how to create a production-ready Teams transcription bot using Microsoft’s Bot Framework and Azure services.
Understanding Teams Bot Architecture
Microsoft Teams provides a robust Bot Framework that allows bots to join meetings as participants. Your bot will register with Azure Bot Service, authenticate using Microsoft Graph API, join meetings through the Teams client, capture audio streams, and process them through Azure Cognitive Services or third-party transcription APIs.
Prerequisites and Azure Setup
Install the required dependencies:
pip install botbuilder-core botbuilder-schema azure-cognitiveservices-speech msal requests python-dotenv aiohttp
Create an Azure Bot and register your application:
- Register an app in Azure Active Directory
- Create an Azure Bot resource
- Enable Microsoft Teams channel
- Configure bot permissions for meetings
Set up your .env file:
MICROSOFT_APP_ID=your_app_id
MICROSOFT_APP_PASSWORD=your_app_password
TENANT_ID=your_tenant_id
AZURE_SPEECH_KEY=your_speech_key
AZURE_SPEECH_REGION=your_region
Step 1: Implement Bot Authentication
Create authentication handler for Microsoft Graph API:
import msal
import os
from dotenv import load_dotenv
load_dotenv()
class TeamsAuthProvider:
def __init__(self):
self.client_id = os.getenv(“MICROSOFT_APP_ID”)
self.client_secret = os.getenv(“MICROSOFT_APP_PASSWORD”)
self.tenant_id = os.getenv(“TENANT_ID”)
self.authority = f”https://login.microsoftonline.com/{self.tenant_id}”
self.scope = [“https://graph.microsoft.com/.default”]
def get_access_token(self):
“””Acquire access token for Microsoft Graph API”””
app = msal.ConfidentialClientApplication(
self.client_id,
authority=self.authority,
client_credential=self.client_secret
)
result = app.acquire_token_for_client(scopes=self.scope)
if “access_token” in result:
return result[“access_token”]
else:
raise Exception(f”Authentication failed: {result.get(‘error_description’)}”)
def get_app_token(self):
“””Get bot framework authentication token”””
token_url = “https://login.microsoftonline.com/botframework.com/oauth2/v2.0/token”
data = {
‘grant_type’: ‘client_credentials’,
‘client_id’: self.client_id,
‘client_secret’: self.client_secret,
‘scope’: ‘https://api.botframework.com/.default’
}
import requests
response = requests.post(token_url, data=data)
if response.status_code == 200:
return response.json()[‘access_token’]
else:
raise Exception(f”Token acquisition failed: {response.text}”)
Step 2: Create the Teams Bot Core
Build the bot that responds to Teams events and joins meetings:
from botbuilder.core import ActivityHandler, TurnContext
from botbuilder.schema import Activity, ActivityTypes
import asyncio
class TranscriptionBot(ActivityHandler):
def __init__(self):
self.auth_provider = TeamsAuthProvider()
self.active_meetings = {}
async def on_message_activity(self, turn_context: TurnContext):
“””Handle incoming messages”””
text = turn_context.activity.text.strip().lower()
if text == “join”:
await turn_context.send_activity(“Please invite me to a meeting!”)
elif text == “status”:
status = f”Active transcriptions: {len(self.active_meetings)}”
await turn_context.send_activity(status)
else:
await turn_context.send_activity(
“Send ‘join’ to start transcription or ‘status’ to check active sessions”
)
async def on_teams_meeting_start(self, meeting_start_event, turn_context: TurnContext):
“””Handle meeting start event”””
meeting_id = meeting_start_event.meeting.id
print(f”Meeting started: {meeting_id}”)
# Join the meeting and start transcription
await self.join_meeting(meeting_id, turn_context)
async def on_teams_meeting_end(self, meeting_end_event, turn_context: TurnContext):
“””Handle meeting end event”””
meeting_id = meeting_end_event.meeting.id
print(f”Meeting ended: {meeting_id}”)
# Stop transcription and generate final transcript
if meeting_id in self.active_meetings:
await self.stop_transcription(meeting_id)
async def join_meeting(self, meeting_id, turn_context):
“””Join Teams meeting programmatically”””
try:
# Store meeting context
self.active_meetings[meeting_id] = {
‘context’: turn_context,
‘start_time’: asyncio.get_event_loop().time()
}
await turn_context.send_activity(
“Transcription bot has joined the meeting”
)
print(f”Successfully joined meeting: {meeting_id}”)
except Exception as e:
print(f”Error joining meeting: {e}”)
await turn_context.send_activity(
f”Failed to join meeting: {str(e)}”
)
Step 3: Implement Audio Capture and Processing
Create audio stream handler for Teams meetings:
import azure.cognitiveservices.speech as speechsdk
from datetime import datetime
class TeamsAudioProcessor:
def __init__(self):
self.speech_key = os.getenv(“AZURE_SPEECH_KEY”)
self.speech_region = os.getenv(“AZURE_SPEECH_REGION”)
self.transcript_buffer = []
def create_speech_config(self):
“””Configure Azure Speech Services”””
speech_config = speechsdk.SpeechConfig(
subscription=self.speech_key,
region=self.speech_region
)
speech_config.speech_recognition_language = “en-US”
speech_config.request_word_level_timestamps()
speech_config.enable_dictation()
return speech_config
async def start_transcription(self, audio_stream):
“””Start real-time transcription from audio stream”””
speech_config = self.create_speech_config()
# Configure audio input
audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)
# Create conversation transcriber for speaker identification
conversation_transcriber = speechsdk.transcription.ConversationTranscriber(
speech_config=speech_config,
audio_config=audio_config
)
# Event handlers
conversation_transcriber.transcribed.connect(self._on_transcribed)
conversation_transcriber.canceled.connect(self._on_canceled)
conversation_transcriber.session_started.connect(self._on_session_started)
conversation_transcriber.session_stopped.connect(self._on_session_stopped)
# Start transcription
await conversation_transcriber.start_transcribing_async()
print(“Transcription started”)
return conversation_transcriber
def _on_transcribed(self, evt):
“””Handle transcribed text”””
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
speaker_id = evt.result.speaker_id
text = evt.result.text
offset = evt.result.offset
entry = {
‘timestamp’: self._format_timestamp(offset),
‘speaker’: f”Speaker {speaker_id}”,
‘text’: text
}
self.transcript_buffer.append(entry)
print(f”[{entry[‘timestamp’]}] {entry[‘speaker’]}: {entry[‘text’]}”)
def _on_canceled(self, evt):
“””Handle transcription errors”””
print(f”Transcription canceled: {evt.reason}”)
if evt.reason == speechsdk.CancellationReason.Error:
print(f”Error details: {evt.error_details}”)
def _on_session_started(self, evt):
“””Handle session start”””
print(“Transcription session started”)
def _on_session_stopped(self, evt):
“””Handle session stop”””
print(“Transcription session stopped”)
def _format_timestamp(self, offset_ticks):
“””Convert ticks to readable timestamp”””
seconds = offset_ticks / 10000000
minutes = int(seconds // 60)
secs = int(seconds % 60)
return f”{minutes:02d}:{secs:02d}”
def save_transcript(self, meeting_id, filename=None):
“””Save transcript to file”””
if not filename:
timestamp = datetime.now().strftime(‘%Y%m%d_%H%M%S’)
filename = f”teams_transcript_{meeting_id}_{timestamp}.txt”
with open(filename, ‘w’, encoding=’utf-8′) as f:
f.write(“Microsoft Teams Meeting Transcript\n”)
f.write(“=” * 70 + “\n”)
f.write(f”Meeting ID: {meeting_id}\n”)
f.write(f”Generated: {datetime.now().strftime(‘%Y-%m-%d %H:%M:%S’)}\n”)
f.write(“=” * 70 + “\n\n”)
for entry in self.transcript_buffer:
f.write(f”[{entry[‘timestamp’]}] {entry[‘speaker’]}:\n”)
f.write(f”{entry[‘text’]}\n\n”)
print(f”Transcript saved: {filename}”)
return filename
Step 4: Handle Teams Meeting Media Streams
Implement media stream handling for audio capture:
import aiohttp
import json
class TeamsMediaHandler:
def __init__(self, bot_id, auth_provider):
self.bot_id = bot_id
self.auth_provider = auth_provider
self.media_sessions = {}
async def subscribe_to_audio(self, meeting_id, participant_id):
“””Subscribe to audio stream from Teams meeting”””
token = self.auth_provider.get_access_token()
headers = {
‘Authorization’: f’Bearer {token}’,
‘Content-Type’: ‘application/json’
}
# Subscribe to audio stream
subscription_data = {
‘resource’: f’/communications/calls/{meeting_id}/audioStreams’,
‘changeType’: ‘created,updated’,
‘notificationUrl’: f’https://your-bot-endpoint.com/api/notifications’,
‘clientState’: ‘transcription-bot-secret’
}
async with aiohttp.ClientSession() as session:
async with session.post(
‘https://graph.microsoft.com/v1.0/subscriptions’,
headers=headers,
json=subscription_data
) as response:
if response.status == 201:
subscription = await response.json()
self.media_sessions[meeting_id] = subscription
print(f”Subscribed to audio stream: {meeting_id}”)
return subscription
else:
error = await response.text()
raise Exception(f”Subscription failed: {error}”)
async def get_audio_stream(self, meeting_id):
“””Retrieve audio stream data”””
if meeting_id not in self.media_sessions:
raise Exception(f”No active subscription for meeting: {meeting_id}”)
token = self.auth_provider.get_access_token()
headers = {
‘Authorization’: f’Bearer {token}’
}
stream_url = f’https://graph.microsoft.com/v1.0/communications/calls/{meeting_id}/audioStream’
async with aiohttp.ClientSession() as session:
async with session.get(stream_url, headers=headers) as response:
if response.status == 200:
return response.content
else:
error = await response.text()
raise Exception(f”Failed to get audio stream: {error}”)
Step 5: Build the Complete Integration
Combine all components into a working transcription bot:
from aiohttp import web
from botbuilder.core import BotFrameworkAdapter, BotFrameworkAdapterSettings
from botbuilder.schema import Activity
class TeamsTranscriptionBotApp:
def __init__(self):
self.settings = BotFrameworkAdapterSettings(
app_id=os.getenv(“MICROSOFT_APP_ID”),
app_password=os.getenv(“MICROSOFT_APP_PASSWORD”)
)
self.adapter = BotFrameworkAdapter(self.settings)
self.bot = TranscriptionBot()
self.audio_processor = TeamsAudioProcessor()
self.media_handler = TeamsMediaHandler(
self.settings.app_id,
TeamsAuthProvider()
)
async def messages_handler(self, req: web.Request) -> web.Response:
“””Handle incoming messages from Teams”””
if “application/json” in req.headers[“Content-Type”]:
body = await req.json()
else:
return web.Response(status=415)
activity = Activity().deserialize(body)
auth_header = req.headers.get(“Authorization”, “”)
async def call_bot(turn_context):
await self.bot.on_turn(turn_context)
await self.adapter.process_activity(activity, auth_header, call_bot)
return web.Response(status=200)
async def start_bot_transcription(self, meeting_id):
“””Initialize transcription for a meeting”””
try:
# Subscribe to media stream
await self.media_handler.subscribe_to_audio(meeting_id, self.settings.app_id)
# Get audio stream
audio_stream = await self.media_handler.get_audio_stream(meeting_id)
# Start transcription
await self.audio_processor.start_transcription(audio_stream)
print(f”Transcription active for meeting: {meeting_id}”)
except Exception as e:
print(f”Error starting transcription: {e}”)
async def stop_bot_transcription(self, meeting_id):
“””Stop transcription and save results”””
try:
# Save transcript
filename = self.audio_processor.save_transcript(meeting_id)
print(f”Transcription completed: {filename}”)
return filename
except Exception as e:
print(f”Error stopping transcription: {e}”)
def run(self, host=’0.0.0.0′, port=3978):
“””Start the bot web server”””
app = web.Application()
app.router.add_post(‘/api/messages’, self.messages_handler)
web.run_app(app, host=host, port=port)
print(f”Bot running on {host}:{port}”)
# Main execution
if __name__ == “__main__”:
bot_app = TeamsTranscriptionBotApp()
bot_app.run()
Deployment and Configuration
Deploy your bot to Azure App Service or Azure Container Instances for production use. Configure your bot endpoint in Azure Bot Service and update the messaging endpoint URL. Enable the Teams channel and configure meeting event subscriptions.
Set up ngrok for local development testing:
ngrok http 3978
Update your bot’s messaging endpoint to the ngrok URL in Azure Portal.
For production, implement secure storage for transcripts using Azure Blob Storage. Add monitoring with Application Insights to track bot performance and errors. Configure auto-scaling to handle multiple concurrent meetings.
Security Best Practices
Store all credentials in Azure Key Vault instead of environment variables. Implement rate limiting to prevent abuse. Use managed identities for Azure resource access. Enable encryption for stored transcripts and implement role-based access control for transcript retrieval.
Your Microsoft Teams transcription bot now automatically joins meetings, captures audio, generates speaker-labeled transcripts, and integrates seamlessly with the Teams ecosystem.
Conclusion
Building a custom Microsoft Teams transcription bot gives you complete control over meeting documentation and data handling, with deep integration into your Azure infrastructure. However, managing bot registration, media streams, Azure services, and compliance requirements demands significant development and maintenance effort.
If you want enterprise-grade transcription without the complexity, consider Meetstream.ai API, which provides ready-to-use transcription for Microsoft Teams, Zoom, and Google Meet with simple API integration.