วิธีการคำนวณ Speech-to-Text API Cost

learn.whatIsHeading

The Speech-to-Text API Cost calculator estimates the expense of transcribing audio using services like OpenAI Whisper API, Deepgram, AssemblyAI, Google Cloud Speech-to-Text, and AWS Transcribe. It covers both batch transcription and real-time streaming pricing.

สูตร

Transcription Cost = Audio Duration (minutes) × Price per Minute

D: Audio Duration (minutes) — Total audio duration to transcribe
P: Price per Minute ($/minute) — Service rate per minute of audio
F: Features (list) — Additional features (diarization, timestamps, language detection)
M: Mode (batch/streaming) — Batch transcription vs. real-time streaming

คำแนะนำทีละขั้นตอน

1Enter the total audio duration to transcribe (minutes or hours)
2Select the speech-to-text service and quality tier
3Specify features needed: speaker diarization, timestamps, language detection
4View cost comparison across providers with feature parity notes

ตัวอย่างที่มีคำตอบ

อินพุต

1,000 hours of podcast audio, OpenAI Whisper API

ผลลัพธ์

Cost: 60,000 minutes × $0.006/min = $360. Deepgram Nova-2: 60,000 × $0.0043/min = $258. AssemblyAI: 60,000 × $0.0065/min = $390.

อินพุต

Real-time transcription for a call center, 50,000 minutes/month

ผลลัพธ์

Deepgram streaming: 50,000 × $0.0059/min = $295/month. Google Speech-to-Text: 50,000 × $0.009/min = $450/month.

ข้อผิดพลาดที่ควรหลีกเลี่ยง

✕Not checking whether pricing includes speaker diarization and timestamps or if those are add-on features
✕Comparing batch pricing to real-time streaming pricing, which is typically 20-40% more expensive
✕Forgetting about audio preprocessing costs (format conversion, silence trimming) that can reduce billable minutes

คำถามที่พบบ่อย

Which speech-to-text service is cheapest?

For batch transcription, Deepgram Nova-2 is typically cheapest at $0.0043/minute ($0.26/hour). OpenAI Whisper API is $0.006/minute ($0.36/hour). Self-hosted Whisper on a GPU is cheapest at scale: an A10G at $0.60/hr processes ~180 min/hr audio, costing ~$0.003/minute.

How accurate is AI speech-to-text compared to human transcription?

Top AI models (Whisper large-v3, Deepgram Nova-2) achieve 5-10% word error rate on clean audio, approaching human transcriptionist accuracy (~3-5% WER). Accuracy drops significantly with background noise, accents, technical jargon, and multiple overlapping speakers. For legal or medical use, human review of AI transcription is still recommended.

พร้อมที่จะคำนวณแล้วหรือยัง? ลองใช้เครื่องคิดเลข Speech-to-Text API Cost ฟรี

ลองด้วยตัวคุณเอง→