AnthemScore
About AnthemScore
Transcribing music by ear is the slow, painful skill every serious musician eventually develops, or eventually gives up on. AnthemScore is the desktop application that tries to do it for you, taking an audio file in MP3, WAV, FLAC, OGG, M4A, or AAC format and producing a notated score along with a MIDI export. A neural network listens to the audio, detects pitches over time, quantizes them against an estimated tempo and key, and renders the result as sheet music you can edit, print, or load into a DAW.
It’s not perfect. No automatic transcription tool is, and the application is honest about that in the way it surfaces the underlying spectrogram and lets you correct what the AI got wrong.
The right way to think about it is as a head start on transcription rather than a finished score, and from that angle it saves a meaningful amount of time on the right kind of source material.
How the AI engine actually works
The transcription pipeline runs in stages. The application converts the audio into a constant-Q spectrogram, which is a frequency-time representation tuned to musical pitches rather than linear frequencies. That spectrogram feeds a neural network trained on labeled audio-to-note pairs across instruments and genres. The network outputs a probability for each pitch being active at each time step, and the application thresholds and quantizes those probabilities into discrete notes.
The two-pane interface reflects that pipeline. The top pane shows the spectrogram with color intensity representing the network’s confidence, and the bottom pane shows the resulting notation.
You can drag notes between the two panes, adjust the detection threshold to pick up quieter notes (or filter out noise), and toggle between viewing the raw network output and the cleaned final result.
That visibility into the model’s confidence is the practical part. When a chord comes through clearly the spectrogram shows bright bands at the right pitches and the notes appear in the score.
When the AI is guessing, the bands are blurry and notes appear in gray for low confidence. You can immediately tell which sections need manual review without listening through the whole piece.
What it transcribes well
Solo piano, solo guitar, and other harmonic instruments with clear pitched onsets are the engine’s strong suit. The training data leans heavily on this material and the results show it. A clean recording of a piano performance can come out at 80-90% accuracy on the first pass, leaving the editing work to a handful of mis-detected octaves or polyphonic chord interpretations.
Monophonic vocal lines also work, with the caveat that pitch bends and slides confuse the quantizer. The notation will show stairsteps where a singer actually slid between pitches, which is technically correct but musically misleading. Manual cleanup is required if the goal is a singable transcription rather than a literal pitch trace.
Drums and percussion transcribe in a separate mode using onset detection rather than pitch detection. The output assigns each detected hit to a drum kit piece based on spectral characteristics, with mixed accuracy.
Kick and snare are usually right, hi-hats and cymbals fight for the same frequency ranges and get confused.
What it transcribes badly
Dense ensemble recordings with overlapping instruments are the hard case. Multiple instruments playing the same notes simultaneously combine in the spectrogram, and the network can’t reliably untangle which voice played what. The result is a single merged transcription that captures roughly what’s happening but assigns everything to one staff with no instrumental separation.
Heavily processed audio with chorus, reverb, or distortion smears the pitch information and reduces accuracy. Rock guitar with heavy gain transcribes worse than the same guitarist playing clean. Live recordings with audience noise or room reverb degrade the transcription quality in proportion to how much the dry signal is buried.
For recordings that need pre-cleanup before transcription, running them through Audacity to isolate the relevant instrument, remove reverb, or normalize the level can substantially improve the application’s hit rate. The transcription engine is only as good as the audio you feed it.
The editing workflow
After transcription you land in a notation editor with the spectrogram still available for reference. The editor supports standard operations including adding and removing notes, changing note durations, splitting and merging measures, adjusting the time signature, changing the key, and adding key changes mid-piece.
The interaction model is partly mouse-based and partly keyboard-shortcut driven. Click on the spectrogram to add a note at that pitch and time, click on an existing note to select and modify it, drag the duration handle to extend or shorten. Power users learn the shortcuts and move quickly. New users hunt for menu items and move slowly.
There’s also a piano roll view as an alternative to notation. For users who think in MIDI rather than staff notation, that view is more direct, and the underlying data is the same so you can flip between them.
Export options
MIDI export is the most useful output for most workflows. The file carries pitch, duration, velocity, and tempo data, which is everything a DAW or sample player needs. Drop the MIDI into a project, point it at a virtual instrument, and you have a playable rendition of the transcription. Pairing the MIDI output with Kontakt Player running a piano library gives you a clean, controllable playback of the transcribed material.
MusicXML export targets standard notation software. The format preserves more notation-specific details than MIDI does, including beaming, slurs, and dynamics where the application has detected them. PDF export produces a printable score directly. WAV export renders the transcribed notation through built-in synthesis if you want a quick audio preview of what the score reads as.
For users wanting to learn the transcribed material visually rather than reading the notation, exporting MIDI and loading it into Synthesia gives you a falling-bar piano roll display that’s easier to follow for self-taught players.
Tempo, key, and time signature detection
The application detects tempo, key, and time signature automatically and uses those values to quantize the transcription. Tempo detection works well for music with a steady beat, less well for rubato performances or pieces with frequent tempo changes. The detected tempo can be overridden manually, which is the right move whenever the auto-detection produces an oddly fractional BPM.
Key detection is reasonably accurate on tonal music in major or minor keys. Modal music and music with frequent modulations can fool it into picking a related key rather than the actual one. Manual key correction is one click. Time signature detection is the weakest of the three, with the application sometimes choosing duple feels for triple-meter pieces or vice versa.
Where the application falls short
CPU and RAM usage on long files can be substantial. A full album transcription is technically possible but most users break it down into individual tracks for processing speed. The application keeps the spectrogram in memory during editing, which makes the editor responsive but pushes memory usage proportionally to track length.
The notation engraving is competent but not on the level of dedicated notation software. Layout decisions are utilitarian, page breaks happen where they happen, and complex notation features like cross-staff beaming and multi-voice writing on one staff are limited compared to what professional engravers expect. For polished final output, exporting MusicXML and finishing the layout in a dedicated notation tool is the realistic workflow.
The trial version is time-limited and the saved files carry a watermark. The licensing tier you buy also matters because higher tiers improve the AI engine accuracy and lift export limits.
Conclusion
AnthemScore is the practical answer for musicians who want a starting point for transcription work without spending hours on every measure. The AI engine isn’t going to replace a trained transcriber for complex material, and the application doesn’t pretend otherwise. What it does well is take a clean source and produce a draft score in seconds, leaving you to refine rather than build from scratch.
The natural audience is hobbyist musicians transcribing songs to learn or arrange, educators preparing material from recordings, and producers who want a quick MIDI capture of musical ideas from audio reference. Professional transcribers will find the accuracy ceiling frustrating on demanding material. Casual users wanting a one-click solution will discover that meaningful results still require editing time.
For everyone in the middle, the application earns its position in the workflow by handling the tedious detection work and letting human judgment finish the job.
Pros & Cons
- AI-driven transcription provides a working draft from raw audio in seconds
- Spectrogram view shows model confidence and supports direct editing
- Strong performance on solo piano, solo guitar, and monophonic material
- Exports cover MIDI, MusicXML, PDF, and rendered WAV
- Tempo, key, and time signature auto-detection with manual override
- Piano roll alternative view for users who don't read standard notation
- Accuracy drops on dense ensemble recordings and heavily processed audio
- Drum transcription confuses similar-frequency percussion elements
- Notation engraving is functional but lags behind dedicated notation software
- CPU and memory usage scale with track length, making long-file edits sluggish
- Trial restrictions limit serious evaluation and tiered licensing affects engine quality
Frequently asked questions
MP3, WAV, FLAC, OGG, M4A, and AAC are all read natively. The application decodes them into its internal spectrogram representation, so the original format mostly affects how the audio compares to the original recording quality rather than affecting transcription accuracy.
On clean solo piano or solo guitar material, expect 80-90% accuracy on the first pass. On dense ensemble recordings or heavily processed audio, accuracy drops significantly. The application surfaces confidence visually through the spectrogram so you know where to focus manual review.
Yes. The notation editor supports adding, removing, and modifying notes, plus adjusting measures, time signatures, key changes, and tempo. A piano roll view is available as an alternative for users who don't read standard notation.
Drums and percussion have a dedicated transcription mode using onset detection rather than pitch detection. Results are usable for clear recordings of kick, snare, and basic patterns, with accuracy dropping for cymbal-heavy or complex percussion.
Yes. MIDI export is one of the primary outputs, along with MusicXML for notation software, PDF for printing, and rendered WAV for audio preview.
Octave errors are a common failure mode of pitch detection on harmonic-rich instruments. The network sometimes detects a strong harmonic instead of the fundamental. Manual correction is one click per affected note, and adjusting the detection threshold sometimes helps systematic cases.
Pre-process the audio if possible. Isolating a single instrument, reducing reverb, normalizing the level, and removing background noise all help. Clean monophonic or harmonically clear sources transcribe far better than busy mixes.


(114 votes, average: 3.58 out of 5)