Google Meet dominates enterprise video conferencing, yet most organizations struggle to capture and document meeting discussions effectively.
Building an automated transcription bot solves this challenge by joining meetings programmatically, recording conversations, and generating accurate transcripts.
This guide shows you how to create a Google Meet transcription bot using Python and modern cloud APIs.
Understanding Google Meet Bot Architecture
Unlike Zoom’s dedicated bot SDK, Google Meet requires a different approach. Your bot will use Puppeteer or Selenium to control a headless browser, join meetings through the web interface, capture audio streams, and send them to a speech recognition service. We’ll use Playwright for browser automation and Assembly AI for transcription due to its superior accuracy and speaker identification.
Prerequisites and Setup
Install the required dependencies:
pip install playwright assemblyai python-dotenv asyncio
playwright install chromium
You’ll need:
- Google Workspace Account – For creating and joining meetings
- AssemblyAI API Key – Get from assemblyai.com
- Google OAuth Credentials – For bot authentication
Create your .env file:
GOOGLE_EMAIL=bot@yourdomain.com
GOOGLE_PASSWORD=your_bot_password
ASSEMBLYAI_API_KEY=your_api_key
Step 1: Automate Browser Login
First, create a module to authenticate and join Google Meet:
from playwright.async_api import async_playwright
import asyncio
import os
from dotenv import load_dotenv
load_dotenv()
class GoogleMeetBot:
def __init__(self):
self.email = os.getenv(“GOOGLE_EMAIL”)
self.password = os.getenv(“GOOGLE_PASSWORD”)
self.browser = None
self.page = None
self.context = None
async def initialize_browser(self):
“””Launch browser with audio capture enabled”””
playwright = await async_playwright().start()
self.browser = await playwright.chromium.launch(
headless=False, # Set True for production
args=[
‘–use-fake-ui-for-media-stream’,
‘–use-fake-device-for-media-stream’,
‘–no-sandbox’,
‘–disable-setuid-sandbox’
]
)
self.context = await self.browser.new_context(
permissions=[‘microphone’, ‘camera’],
viewport={‘width’: 1280, ‘height’: 720}
)
self.page = await self.context.new_page()
print(“Browser initialized”)
async def login_google(self):
“””Authenticate with Google account”””
await self.page.goto(‘https://accounts.google.com’)
# Enter email
await self.page.fill(‘input[type=”email”]’, self.email)
await self.page.click(‘#identifierNext’)
await self.page.wait_for_timeout(2000)
# Enter password
await self.page.fill(‘input[type=”password”]’, self.password)
await self.page.click(‘#passwordNext’)
await self.page.wait_for_timeout(3000)
print(“Logged in successfully”)
async def join_meeting(self, meeting_url):
“””Join a Google Meet meeting”””
await self.page.goto(meeting_url)
await self.page.wait_for_timeout(3000)
# Disable camera and microphone prompts
try:
await self.page.click(‘button[aria-label*=”camera”]’)
await self.page.click(‘button[aria-label*=”microphone”]’)
except:
pass
# Click join button
await self.page.click(‘button:has-text(“Ask to join”)’)
await self.page.wait_for_timeout(5000)
print(f”Joined meeting: {meeting_url}”)
Step 2: Capture Audio Streams
Implement audio capture using browser audio APIs:
import subprocess
import threading
class AudioStreamCapture:
def __init__(self, output_file=”meet_audio.wav”):
self.output_file = output_file
self.is_recording = False
self.process = None
def start_capture(self):
“””Start capturing system audio using FFmpeg”””
self.is_recording = True
# FFmpeg command for audio capture
ffmpeg_cmd = [
‘ffmpeg’,
‘-f’, ‘pulse’, # Use ‘avfoundation’ on macOS
‘-i’, ‘default’,
‘-acodec’, ‘pcm_s16le’,
‘-ar’, ‘16000’,
‘-ac’, ‘1’,
self.output_file,
‘-y’
]
self.process = subprocess.Popen(
ffmpeg_cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE
)
print(f”Audio capture started: {self.output_file}”)
def stop_capture(self):
“””Stop audio recording”””
if self.process:
self.process.terminate()
self.process.wait()
self.is_recording = False
print(“Audio capture stopped”)
def get_audio_file(self):
“””Return the path to recorded audio”””
return self.output_file
Step 3: Implement Real-Time Transcription
Connect captured audio to Assembly AI’s streaming transcription:
import assemblyai as aai
import json
from datetime import datetime
class MeetTranscriber:
def __init__(self):
aai.settings.api_key = os.getenv(“ASSEMBLYAI_API_KEY”)
self.transcripts = []
self.current_speakers = {}
def transcribe_file(self, audio_file):
“””Transcribe recorded audio file”””
config = aai.TranscriptionConfig(
speaker_labels=True,
speakers_expected=5,
language_code=”en_us”
)
transcriber = aai.Transcriber()
transcript = transcriber.transcribe(audio_file, config=config)
if transcript.status == aai.TranscriptStatus.error:
raise Exception(f”Transcription error: {transcript.error}”)
return transcript
def format_transcript(self, transcript):
“””Format transcript with speakers and timestamps”””
formatted_output = []
formatted_output.append(“Google Meet Transcript”)
formatted_output.append(“=” * 60)
formatted_output.append(f”Generated: {datetime.now().strftime(‘%Y-%m-%d %H:%M:%S’)}”)
formatted_output.append(“”)
for utterance in transcript.utterances:
timestamp = self._format_timestamp(utterance.start)
speaker = f”Speaker {utterance.speaker}”
text = utterance.text
formatted_output.append(f”[{timestamp}] {speaker}:”)
formatted_output.append(f”{text}”)
formatted_output.append(“”)
return “\n”.join(formatted_output)
def _format_timestamp(self, milliseconds):
“””Convert milliseconds to HH:MM:SS format”””
seconds = milliseconds // 1000
hours = seconds // 3600
minutes = (seconds % 3600) // 60
secs = seconds % 60
return f”{hours:02d}:{minutes:02d}:{secs:02d}”
def extract_action_items(self, transcript):
“””Extract potential action items from transcript”””
action_keywords = [‘todo’, ‘action item’, ‘follow up’, ‘will do’, ‘need to’]
action_items = []
for utterance in transcript.utterances:
text_lower = utterance.text.lower()
if any(keyword in text_lower for keyword in action_keywords):
action_items.append({
‘speaker’: f”Speaker {utterance.speaker}”,
‘text’: utterance.text,
‘timestamp’: self._format_timestamp(utterance.start)
})
return action_items
def save_transcript(self, transcript, filename):
“””Save formatted transcript to file”””
formatted = self.format_transcript(transcript)
with open(filename, ‘w’, encoding=’utf-8′) as f:
f.write(formatted)
# Save action items separately
action_items = self.extract_action_items(transcript)
if action_items:
action_file = filename.replace(‘.txt’, ‘_actions.txt’)
with open(action_file, ‘w’, encoding=’utf-8′) as f:
f.write(“Action Items & Follow-ups\n”)
f.write(“=” * 60 + “\n\n”)
for item in action_items:
f.write(f”[{item[‘timestamp’]}] {item[‘speaker’]}:\n”)
f.write(f”{item[‘text’]}\n\n”)
print(f”Transcript saved: {filename}”)
Step 4: Build the Complete Bot System
Integrate all components into a functional transcription bot:
import asyncio
import signal
import sys
class GoogleMeetTranscriptionBot:
def __init__(self):
self.meet_bot = GoogleMeetBot()
self.audio_capture = AudioStreamCapture()
self.transcriber = MeetTranscriber()
self.is_running = False
async def start(self, meeting_url):
“””Start the bot and join meeting”””
print(“Initializing Google Meet Transcription Bot…”)
# Initialize browser and login
await self.meet_bot.initialize_browser()
await self.meet_bot.login_google()
# Join the meeting
await self.meet_bot.join_meeting(meeting_url)
# Start audio capture
self.audio_capture.start_capture()
self.is_running = True
print(“Bot is now recording and will transcribe on exit…”)
async def stop(self):
“””Stop bot and generate transcript”””
print(“\nStopping bot and generating transcript…”)
self.is_running = False
self.audio_capture.stop_capture()
# Transcribe the recording
audio_file = self.audio_capture.get_audio_file()
print(“Transcribing audio… This may take a few minutes.”)
transcript = self.transcriber.transcribe_file(audio_file)
# Save transcript
timestamp = datetime.now().strftime(‘%Y%m%d_%H%M%S’)
output_file = f”meet_transcript_{timestamp}.txt”
self.transcriber.save_transcript(transcript, output_file)
# Close browser
if self.meet_bot.browser:
await self.meet_bot.browser.close()
print(“Transcription complete!”)
async def run(self, meeting_url, duration=None):
“””Run the bot for specified duration or until interrupted”””
await self.start(meeting_url)
try:
if duration:
await asyncio.sleep(duration)
else:
# Wait for manual interruption
while self.is_running:
await asyncio.sleep(1)
except KeyboardInterrupt:
pass
finally:
await self.stop()
# Main execution
async def main():
bot = GoogleMeetTranscriptionBot()
# Replace with your meeting URL
meeting_url = “https://meet.google.com/abc-defg-hij”
# Setup graceful shutdown
def signal_handler(sig, frame):
asyncio.create_task(bot.stop())
signal.signal(signal.SIGINT, signal_handler)
try:
# Run for 60 minutes or until Ctrl+C
await bot.run(meeting_url, duration=3600)
except Exception as e:
print(f”Error: {e}”)
await bot.stop()
if __name__ == “__main__”:
asyncio.run(main())
Step 5: Add Advanced Features
Enhance your bot with sentiment analysis and summary generation:
class AdvancedTranscriber(MeetTranscriber):
def generate_summary(self, transcript):
“””Generate meeting summary”””
# Get full text
full_text = ” “.join([u.text for u in transcript.utterances])
# Use AssemblyAI’s summarization
config = aai.TranscriptionConfig(
summarization=True,
summary_model=aai.SummarizationModel.informative,
summary_type=aai.SummarizationType.bullets
)
return transcript.summary if hasattr(transcript, ‘summary’) else None
def analyze_sentiment(self, transcript):
“””Analyze sentiment throughout meeting”””
config = aai.TranscriptionConfig(
sentiment_analysis=True
)
sentiments = []
for utterance in transcript.utterances:
if hasattr(utterance, ‘sentiment’):
sentiments.append({
‘timestamp’: self._format_timestamp(utterance.start),
‘speaker’: f”Speaker {utterance.speaker}”,
‘sentiment’: utterance.sentiment,
‘text’: utterance.text
})
return sentiments
Deployment and Production Tips
Use a dedicated Google Workspace account for your bot to avoid interrupting personal meetings. Deploy on a cloud VM with sufficient resources—at minimum 2GB RAM and 2 CPU cores. Implement webhook listeners to automatically join scheduled meetings.
Store transcripts in cloud storage like Google Cloud Storage or AWS S3 for team access. Add monitoring with services like Sentry to track errors and bot performance. Implement retry logic for network failures and meeting access issues.
For security, encrypt stored transcripts and use environment variables for all credentials. Consider implementing role-based access control if multiple team members need transcript access.
Your Google Meet transcription bot now automatically joins meetings, captures conversations, and generates speaker-labeled transcripts with action items and sentiment analysis.
Conclusion
Building a custom Google Meet transcription bot provides full control over your meeting documentation pipeline and data privacy. However, managing browser automation, audio capture, and multiple API integrations requires ongoing maintenance and infrastructure management.
If you prefer a ready-to-use solution, consider Meetstream.ai API, which provides enterprise-grade transcription for Google Meet, Zoom, and Microsoft Teams without the complexity of building and maintaining your own bot infrastructure.