How to Build an Audio Transcription Bot for Microsoft Teams

Microsoft Teams powers collaboration for millions of organizations worldwide, yet capturing meeting discussions remains a manual, time-consuming process. Building an automated transcription bot eliminates this burden by joining meetings programmatically, recording conversations, and generating accurate transcripts. This guide demonstrates how to create a production-ready Teams transcription bot using Microsoft’s Bot Framework and Azure services.

Understanding Teams Bot Architecture

Microsoft Teams provides a robust Bot Framework that allows bots to join meetings as participants. Your bot will register with Azure Bot Service, authenticate using Microsoft Graph API, join meetings through the Teams client, capture audio streams, and process them through Azure Cognitive Services or third-party transcription APIs.

Prerequisites and Azure Setup

Install the required dependencies:

pip install botbuilder-core botbuilder-schema azure-cognitiveservices-speech msal requests python-dotenv aiohttp

Create an Azure Bot and register your application:

  1. Register an app in Azure Active Directory
  2. Create an Azure Bot resource
  3. Enable Microsoft Teams channel
  4. Configure bot permissions for meetings

Set up your .env file:

MICROSOFT_APP_ID=your_app_id

MICROSOFT_APP_PASSWORD=your_app_password

TENANT_ID=your_tenant_id

AZURE_SPEECH_KEY=your_speech_key

AZURE_SPEECH_REGION=your_region

Step 1: Implement Bot Authentication

Create authentication handler for Microsoft Graph API:

import msal
import os
from dotenv import load_dotenv

load_dotenv()

class TeamsAuthProvider:
def __init__(self):
self.client_id = os.getenv("MICROSOFT_APP_ID")
self.client_secret = os.getenv("MICROSOFT_APP_PASSWORD")
self.tenant_id = os.getenv("TENANT_ID")
self.authority = f"https://login.microsoftonline.com/{self.tenant_id}"
self.scope = ["https://graph.microsoft.com/.default"]
def get_access_token(self):

        “””Acquire access token for Microsoft Graph API”””

        app = msal.ConfidentialClientApplication(

self.client_id,

            authority=self.authority,

            client_credential=self.client_secret

)

        result = app.acquire_token_for_client(scopes=self.scope)

if "access_token" in result:
return result["access_token"]
else:
raise Exception(f"Authentication failed: {result.get('error_description')}")
def get_app_token(self):

        “””Get bot framework authentication token”””

token_url = "https://login.microsoftonline.com/botframework.com/oauth2/v2.0/token"
data = {

            ‘grant_type’: ‘client_credentials’,

            ‘client_id’: self.client_id,

            ‘client_secret’: self.client_secret,

            ‘scope’: ‘https://api.botframework.com/.default’

}
import requests

        response = requests.post(token_url, data=data)

if response.status_code == 200:
return response.json()['access_token']
else:
raise Exception(f"Token acquisition failed: {response.text}")

Step 2: Create the Teams Bot Core

Build the bot that responds to Teams events and joins meetings:

from botbuilder.core import ActivityHandler, TurnContext

from botbuilder.schema import Activity, ActivityTypes

import asyncio
class TranscriptionBot(ActivityHandler):
def __init__(self):
self.auth_provider = TeamsAuthProvider()
self.active_meetings = {}
async def on_message_activity(self, turn_context: TurnContext):

        “””Handle incoming messages”””

        text = turn_context.activity.text.strip().lower()

if text == "join":

            await turn_context.send_activity(“Please invite me to a meeting!”)

elif text == "status":

            status = f”Active transcriptions: {len(self.active_meetings)}”

            await turn_context.send_activity(status)

else:

            await turn_context.send_activity(“Send ‘join’ to start transcription or ‘status’ to check active sessions”)

async def on_teams_meeting_start(self, meeting_start_event, turn_context: TurnContext):

        “””Handle meeting start event”””

        meeting_id = meeting_start_event.meeting.id

print(f"Meeting started: {meeting_id}")

        await self.join_meeting(meeting_id, turn_context)

async def on_teams_meeting_end(self, meeting_end_event, turn_context: TurnContext):

        “””Handle meeting end event”””

        meeting_id = meeting_end_event.meeting.id

print(f"Meeting ended: {meeting_id}")
if meeting_id in self.active_meetings:

            await self.stop_transcription(meeting_id)

async def join_meeting(self, meeting_id, turn_context):

        “””Join Teams meeting programmatically”””

try:
self.active_meetings[meeting_id] = {'context': turn_context, 'start_time': asyncio.get_event_loop().time()}

            await turn_context.send_activity(“Transcription bot has joined the meeting”)

print(f"Successfully joined meeting: {meeting_id}")
except Exception as e:
print(f"Error joining meeting: {e}")

            await turn_context.send_activity(f”Failed to join meeting: {str(e)}”)

Step 3: Implement Audio Capture and Processing

Create audio stream handler for Teams meetings:

import azure.cognitiveservices.speech as speechsdk
from datetime import datetime
class TeamsAudioProcessor:
def __init__(self):
self.speech_key = os.getenv("AZURE_SPEECH_KEY")
self.speech_region = os.getenv("AZURE_SPEECH_REGION")
self.transcript_buffer = []
def create_speech_config(self):

        “””Configure Azure Speech Services”””

        speech_config = speechsdk.SpeechConfig(subscription=self.speech_key, region=self.speech_region)

speech_config.speech_recognition_language = "en-US"
speech_config.request_word_level_timestamps()
speech_config.enable_dictation()
return speech_config
async def start_transcription(self, audio_stream):

        “””Start real-time transcription from audio stream”””

        speech_config = self.create_speech_config()

        audio_config = speechsdk.audio.AudioConfig(stream=audio_stream)

        conversation_transcriber = speechsdk.transcription.ConversationTranscriber(speech_config=speech_config, audio_config=audio_config)

        conversation_transcriber.transcribed.connect(self._on_transcribed)

        conversation_transcriber.canceled.connect(self._on_canceled)

        conversation_transcriber.session_started.connect(self._on_session_started)

        conversation_transcriber.session_stopped.connect(self._on_session_stopped)

        await conversation_transcriber.start_transcribing_async()

print("Transcription started")
return conversation_transcriber
def _on_transcribed(self, evt):

        “””Handle transcribed text”””

if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:

            speaker_id = evt.result.speaker_id

            text = evt.result.text

            offset = evt.result.offset

entry = {'timestamp': self._format_timestamp(offset), 'speaker': f"Speaker {speaker_id}", 'text': text}
self.transcript_buffer.append(entry)
print(f"[{entry['timestamp']}] {entry['speaker']}: {entry['text']}")
def _on_canceled(self, evt):

        “””Handle transcription errors”””

print(f"Transcription canceled: {evt.reason}")
if evt.reason == speechsdk.CancellationReason.Error:
print(f"Error details: {evt.error_details}")
def _on_session_started(self, evt):
print("Transcription session started")
def _on_session_stopped(self, evt):
print("Transcription session stopped")
def _format_timestamp(self, offset_ticks):

        seconds = offset_ticks / 10000000

minutes = int(seconds // 60)
secs = int(seconds % 60)
return f"{minutes:02d}:{secs:02d}"
def save_transcript(self, meeting_id, filename=None):
if not filename:

            timestamp = datetime.now().strftime(‘%Y%m%d_%H%M%S’)

            filename = f”teams_transcript_{meeting_id}_{timestamp}.txt”

with open(filename, 'w', encoding='utf-8') as f:
f.write("Microsoft Teams Meeting Transcript\n")
for entry in self.transcript_buffer:
f.write(f"[{entry['timestamp']}] {entry['speaker']}:\n{entry['text']}\n\n")
return filename

Step 4: Handle Teams Meeting Media Streams

Implement media stream handling for audio capture:

import aiohttp
import json
class TeamsMediaHandler:
def __init__(self, bot_id, auth_provider):
self.bot_id = bot_id
self.auth_provider = auth_provider
self.media_sessions = {}
async def subscribe_to_audio(self, meeting_id, participant_id):

        “””Subscribe to audio stream from Teams meeting”””

        token = self.auth_provider.get_access_token()

headers = {'Authorization': f'Bearer {token}', 'Content-Type': 'application/json'}
subscription_data = {'resource': f'/communications/calls/{meeting_id}/audioStreams', 'changeType': 'created,updated', 'notificationUrl': 'https://your-bot-endpoint.com/api/notifications', 'clientState': 'transcription-bot-secret'}

        async with aiohttp.ClientSession() as session:

            async with session.post(‘https://graph.microsoft.com/v1.0/subscriptions’, headers=headers, json=subscription_data) as response:

if response.status == 201:

                    subscription = await response.json()

self.media_sessions[meeting_id] = subscription
return subscription
else:

                    error = await response.text()

raise Exception(f"Subscription failed: {error}")

Step 5: Build the Complete Integration

Combine all components into a working transcription bot:

from aiohttp import web

from botbuilder.core import BotFrameworkAdapter, BotFrameworkAdapterSettings

from botbuilder.schema import Activity

class TeamsTranscriptionBotApp:
def __init__(self):
self.settings = BotFrameworkAdapterSettings(app_id=os.getenv("MICROSOFT_APP_ID"), app_password=os.getenv("MICROSOFT_APP_PASSWORD"))
self.adapter = BotFrameworkAdapter(self.settings)
self.bot = TranscriptionBot()
self.audio_processor = TeamsAudioProcessor()
self.media_handler = TeamsMediaHandler(self.settings.app_id, TeamsAuthProvider())
async def messages_handler(self, req: web.Request) -> web.Response:
if "application/json" in req.headers["Content-Type"]:

            body = await req.json()

else:
return web.Response(status=415)
activity = Activity().deserialize(body)

        auth_header = req.headers.get(“Authorization”, “”)

async def call_bot(turn_context):

            await self.bot.on_turn(turn_context)

        await self.adapter.process_activity(activity, auth_header, call_bot)

return web.Response(status=200)
def run(self, host='0.0.0.0', port=3978):

        app = web.Application()

        app.router.add_post(‘/api/messages’, self.messages_handler)

web.run_app(app, host=host, port=port)
if __name__ == "__main__":
bot_app = TeamsTranscriptionBotApp()
bot_app.run()

Deployment and Configuration

Deploy your bot to Azure App Service or Azure Container Instances for production use. Configure your bot endpoint in Azure Bot Service and update the messaging endpoint URL. Enable the Teams channel and configure meeting event subscriptions.

Set up ngrok for local development testing:

ngrok http 3978

Update your bot’s messaging endpoint to the ngrok URL in Azure Portal.

For production, implement secure storage for transcripts using Azure Blob Storage. Add monitoring with Application Insights to track bot performance and errors. Configure auto-scaling to handle multiple concurrent meetings.

Security Best Practices

Store all credentials in Azure Key Vault instead of environment variables. Implement rate limiting to prevent abuse. Use managed identities for Azure resource access. Enable encryption for stored transcripts and implement role-based access control for transcript retrieval.

Your Microsoft Teams transcription bot now automatically joins meetings, captures audio, generates speaker-labeled transcripts, and integrates seamlessly with the Teams ecosystem.

Conclusion

Building a custom Microsoft Teams transcription bot gives you complete control over meeting documentation and data handling, with deep integration into your Azure infrastructure. However, managing bot registration, media streams, Azure services, and compliance requirements demands significant development and maintenance effort.

If you want enterprise-grade transcription without the complexity, consider Meetstream.ai API, which provides ready-to-use transcription for Microsoft Teams, Zoom, and Google Meet with simple API integration.

Leave a Reply

Your email address will not be published. Required fields are marked *