I have been searching for a long time for tools checking the real quality of MP3 files. Many tools will just base their assumptions on the bitrate and encoder type alone (i.e. EncSpot). I think that’s rather insufficient. Though all knowledge on compression artifacts exists and computer programmer’s are out there making incredible stuff, all of the amazing audio analyzing on mpeg has to be done by ears…
Let’s see how we can at least do “something” to get a definite answer to the question: “Is this a bad file” without having to put our ears to the test. Mind though that it is in many ways like a cheap medical test where (in this case) a NO is 100% accurate, but a YES is just a way of saying: I’m not sure… let’s do more testing.
But first, an introduction.
With MP3 a revolution was born: Music became “portable”. But there’s a catch. MP3 is a lossy compression and will sacrifice quality for size. To do this without being noticed too much, MP3 uses a technique based on Psychoacoustics, where it filters out quite frequencies that are adjacent to loud frequencies. Scientists noticed that our ears mask those frequencies (we can’t hear them), so getting rid of these frequencies should go unnoticed. This way the encoder doesn’t have to “encode” those inaudible frequencies and thus can preserve a lot of space (10:1 is no exception). To preserve even more space (20:1) and so use lower bitrate, the encoder will filter out more frequencies. At some point that process becomes noticeable. That’s why very low bitrate MP3 sounds so horrible. ( This description is far from complete. ) If you want to know more about psychoacoustics, read this Wikipedia article..
MP3’s come in different qualities, i.e. how much compression is applied. Usually the compression correlates to the amount of kb/s (kilobits per second) is used to store and to reproduce the music. The more kb/s are available to store sounds, the better the quality can be, because the encoder can leave more of the quieter frequencies intact. And while this is “technically” sound, I’m going to prove to you that kb/s alone is far from a trustworthy indication of quality.
Digital MP3 music comes in different “qualities”, depending on how “lossy” the compression is. Let me list a few of them in order of “sounds horrible (like a cellphone)” to “sounds like the real thing”:
- 48 kb/s – This amount of compression sounds like your average AM portable radio and are mainly used to deliver speech.
- 96 kb/s – This sounds like a very poor quality Youtube video. This compression is often used to broadcast speech and music to mobile devices on GPRS (edge) or 3G, like your iPhone.
- 128 kb/s – This sounds acceptable to most people, but most (if not every) audiophile will notice the loss in quality without comparing it to the source. It’s is the most used compression for (private) MP3 music files on the internet and is mainly spread by filesharing, because it’s the best trade-of between size and quality.
- 192 kb/s – This sounds like CD quality to most people. Audiophiles will notice a slight degradation in quality, but only when comparing it to the source (so called A/B tests). It’s less portable, because it’s bigger and is better known as a better quality illegal download.
- 320 kb/s – This is the closest you can get to CD quality and still be compatible with most portable players. Audiophiles claim they can hear the difference, but it’s like tasting wine… (you know what I mean). This compression is used on many (legal) music download sites.
- 640 kb/s – Like twice as good as the best… (can that be?). Sadly some players won’t keep up with the amount of bits per second, or because the hard- and/or software won’t allow it.
Now, besides MP3 (which is MPEG 1 layer 3) there are many other lossy compression techniques out there, like mp2 (MPEG 2 layer 1 used by digital satellite), ogg, mwa and aac. AAC is known as the successor to MP3 and can achieve much better quality than at the same bitrate, through better use of the psychoacoustic schemes. An AAC file of 96 kb/s sounds like a 192kb/s MP3 file (debatable). AAC at 256 kb/s is widely used by iTunes and sounds awesome (very little trade-off between quality and compression, you get both).
Just for the record, there are some loss-less compression technique out there, like FLAC. These techniques will reduce the size of CD quality music without discarding any frequencies. The reproduction of this compressed file will be bit-by-bit identical to the source. Flac is mainly used for archiving CD collections and can achieve a considerable reduction in filesize.
After this lengthy, but still absolutely incomplete introduction, NOW ON TO THE MAIN DISCUSSION.
How to check the REAL quality
As we discussed above: 320 kb/s sounds better than 128 kb/s. Given this knowledge, some people make a habit of re-encoding bad 128 kb/s source material into 320 kb/s files (or even FLAC!!). In doing so, they incorrectly assume the quality will improve. It doesn’t, because once a music-file is compressed to 128 kb/s, the information removed to reduce size is lost forever. Nothing will ever recover what was lost. It’s like an image resized from 1000×1000 pixels to 100×100 pixels and then back to 1000×1000. The result will be a 90% loss of all information and the image will be blurry. There is no way anyone can sharpen it up again so that it contains the same details as the original 1000×1000 image.
Sadly, for whatever reason, there are 320 kb/s files out there that are in fact 128 kb/s MP3’s that have been upscaled somehow. But because 128 kb/s sounds pretty good as it is, it can be tricky to hear whether or not the file has been tampered with. There is (afaik) NO software out there that will do anything more then read the headers en look at the bits per second to tell you what’s good or not. According to audiophiles “listening” is the only way to tell if a file is good or bad, and that’s it. I will show you however, that a “bad” or “fraud” mp3 can be uncovered just by looking at it’s spectrum and without waking the neighbors.
The analysis is based on the frequency spectrum of the sound in the MP3 file. You can visualize this in many audio editors like Audition. We will be looking for giveaway features of most (if not all) MP3 encoders, i.e: the cut off frequency at 15 (or) 16 khz and clipping. Let’s look at some pictures:
The above image shows the spectral analysis of a CD quality file. The way it works is quite simple: The X axis is time, the Y axis is frequency and the illumination is the volume. Bright yellow means strong frequencies, red is weaker and black is none. This graph shows that all frequencies between 0 Hz en 20.000 Hz are very well present. Now, let’s look at a 128 kb/s compression of the same file:
Notice the difference in this 128 kb/s compressed file. All frequencies above 15,8 khz are gone (black). That’s because most MP3 encoders apply a “filter” on the sound before compressing it. Yes… This has absolutely nothing to do with psychoacoustics, but is just a rude filter that almost all encoders use. The cut-off frequency is thereby a strong indicator of the original bitrate. Most encoders cut off at 16 kHz for 128 kb/s to 8 kHz for 64 kb/s. Once these filters have been applied, the filtered-out frequencies are lost and cannot ever be recovered.
Note the small spikes that go up all the way to 22 khz. These are points where the music “clippes”. Clipping happens when the waveform is bigger that the container and is flattened at the top, it will produce strong harmonic distortion or overmodulation. Read more about clipping.
Above image: Example of clipping.
Once we saved the file again in MP3 320 kb/s the data previously lost in converting it to 128 kb/s is still missing. Consequently “upgrading” our file to 320 kb/s did not improve quality at all it only made the file almost 3 times as big. ( from 3,97 Mbyte to 9,93 Mbyte ).
Many so called CD quality 320 kb/s downloads are ripoffs. Let’s take a look at one of these wonderful 320 kb/s downloads that are in fact upgraded (in television midst they speak of upscaling) low quality MP3’s.
Now, this is quite an interesting one. Notice how the cutoff frequency seems to be there, but not quite “cutting off”… This file looks like it was encoded to 128 kb/s but with the cutoff filter disabled. As you can see the encoder tries to encode the high frequencies when they are really strong (hi hats + percussion), but eventually the encoder is loosing a lot of the information. Typically this will look like a largely blank upper frequencies area with brief spikes of high frequencies, strongly reduced in resolution.
I took the original to compare: The original (full bandwidth) file is very different from the (fake) 320 kb/s file. So what’s going on here?
Actually during my investigation I stumbled upon some annoying properties of Fraunhofer’s MP3 encoder. It seems that even with cutoff filter disabled (or set to 22kHz), the 320kb/s file (from the same original source as the image above) is somewhat filtered. There is a noticable “cutoff” line that isn’t present in the original file. Though the filtered 128 kb/s file will look flattened, this one still has some high frequencies in it. This file is bad, that’s for sure, but it’s not because of upscaling… !!!
This example above shows the same file, once in 320 kb/s unfiltered (left) and 128 kb/s unfiltered (right). Notice that their is some difference. The 320 file is probably better, but still not what we came to suspect from 320 kb/s files.
This is a comparison between the original and LAME (3.8). There is some degree of graining, but the overall frequency response seems better (wider) than FhG.
You can always spot a bad quality 320 kb/s MP3 by looking at the spectrum analyzer, but you can only be sure of the cause of the degradation when you see a very obvious 16khz (or less) cutoff. When the cutoff is overshooting, matters get more complicated. It can be a 128 kb/s source that was rendered without the filter, like we tried or it could be an original 320 kb/s file that was rendered with a bad encoder. In this case (to my surprise) the Fraunhofer encoder did not perform as expected, while the LAME encoder was quite good, although there is a little change in graininess between the original and the LAME 320kb/s encoded file.
Anyway. I hope I have been of any assistance doing these little tests today. 🙂
56 thoughts on “How to check quality of MP3 file”
Have you tried AQuA? We have also used it to not only detect audio quality, but also optimize audio compression. You are welcome to our blog: http://blog.sevana.fi/optimize-bitrate-and-size-preserving-high-audio-quality-in-tracks-podcasts-tunes-with-aqua-wideband/
This article is right about what I’m thinking nowadays. So now I can start here to discover more knowledge, thank you 🙂
As I see the Aqua works only when you have the original, uncompressed file to compare with the compressed track.
If anybody knows a simple, free solution, please tell us!
It shouldn’t be hard to make such a tool, considering this blog already laid out all the important parameters to analyse. I hope someone will make such a tool, or else we will all have to switch to Apple’s iCloud.
Thanks for the clear way of explaining! I try to learn a bit about how to improve the quality of my music collection and in most articles I get lost within a minute in the (for me) technical abracadabra. But you kept me engaged through the whole article. Well done!
Thanks for all the comments.
I’ve taken up C# programming, maybe some day I’ll device a tool that does a fair assessment of “quality”. 🙂
Just found a simple and easy-to.use tool that exactly does this: https://fakinthefunk.net 🙂 So maybe you don’t need to code it yourself!
Now what program did you actually use in this article to compare the songs? Will any old spectrum analyzer do? In the end, is it actually possible to rip a good quality sound file from youtube? And finally I was wondering if you could analyze an upload of mine to youtube because I use a site called mp32tube.com to upload songs to youtube along with a picture, I can clearly see the picture loses quality but the sound files sound fine at the 480p that it uploads at.
You can use any audio editor that has a spectrum analyzer. Try SPEK http://spek-project.org/ or take a look at iZotope RX http://www.izotope.com/products/audio/rx/ ( check right top TRY for free demo download — fully functional except saving )
This is a youtube rip of your video: http://www.youtube.com/watch?v=Lu2lq3nIgjw
As you can see it’s capped at around 15khz, which isn’t that bad for youtube! 🙂
And that’s at 480p? So that’s about 128kbps? And according to Wikipedia that’s the max 480p can host! Good, means I’m not losing anything using mp32tube =)
Can you analyze this one? http://www.youtube.com/watch?v=VG4ZteY3NG4
This guy only uploads in 360p but his uploads always sound GLORIOUS.
Do you have a recommendation for a video editor/spectrum analyzer? (Feel free to e-mail me, I don’t want to clutter this page with comments)
There is a great difference between low- and high quality mp3’s (128 vs 320). Difficult to detect with only headphones, but noticeable when you compare them on a proper sound-system.
You do realize people can’t hear frequencies above 16KHz, right? There’s nothing “rude” about that filter. You can safely delete all frequencies at 16KHz and above, and no one will ever notice.
Steve, the human hearing range is between 20 and 20k and it may vary depending on each individual, according to http://en.wikipedia.org/wiki/Hearing_range
Steve, if you ever heard a DJ play a 128 mp3 on a bug sound system you’ll know that it doesnt matter what siece might say, it just sounds crap!
I agree with this 100%. As a DJ, we try to stick to 320, bu anything below 192 you don’t want playing on any sound system of reasonable power.
Very helpful. Apart from spectrum analysis, is there a synthetic test to determine if an mp3 file has been upscaled from 128 kbps to 320 kbps.
Great, useful information. Thank you for this. I’m also very interested in hearing an answer to S Rahul Bose’s question.
Thanks for this nice and usefull information, have been looking for an explenation on this matter!
I have two different rips of Bryan Adams song “Everything I Do (I Do It For You)”…..each one ripped from different compilation CD Albums (of which I OWN the original copies of) and converted to 320kbps CBR Mp3 format.
They both look almost identical, except that the lines on one reach up a little higher up than the other one. The higher one almost reaches the 22 kHz line, where as the other song’s image is just barely shy of the 20 kHz line.
Since I havent had time to study your lesson closely YET…… can you tell me simply if the one with the HIGHER lines is the better or the HIGHER QUALITY mp3 copy…. or what???
And thank you very much for the wisdom you are sharing. I will learn all I can later… WHEN I CAN.
PS. WHY does TIME have to be such a LIMITED resource? : )
Yes, the higher the lines, the higher the frequencies. One could conclude the one with the higher lines is better.
Generally, you can’t tell the sound quality by looking at graphs. What looks like a big deal in a graph is not necessarily important to our ears at all. Blind testing (ABX) is the only way to know. When dealing with perceptual codecs, transparency (inability to discern a difference) is the definition of maximum quality; if you can’t reliably tell the difference, then neither choice is better than the other.
You are also not taking a couple of things into account. MP3 has a design feature/flaw where the 16+ kHz range can’t be preserved as well as the lower frequencies; it’s noisier. The bitrate is also constrained. These factors create a tradeoff: when the encoder is forced to deal with the content of upper extremes (mostly atonal noise which may not even be audible), valuable bits must be taken from the lower frequencies, where our ears are much more sensitive. It is presumptuous to say that better sound quality results from this. It’s more sensible to assume that encoder has been authored to choose an ideal cutoff for a given bitrate or target quality level, and that you shouldn’t be messing with it in a naÃ¯ve effort to make sexier spectrograms.
Some people simply can’t hear above 16 kHz, so they benefit from not having the encoder waste bits on those frequencies. Many of us can hear a little above 16 kHz, but not all the way 20, even under ideal conditions. So for us, an ~18 kHz cutoff can put the audio at less risk, quality-wise, than a higher one. However, I wouldn’t want to force the encoder to use 18 when its algorithms have determined that better quality is obtained at 16; it knows its limits better than I do.
Even when we hear no difference in the lower bands, and we can hear differences in the extreme high end, a file that’s missing some or all of the extreme high end isn’t necessarily “lower quality” to our earsâ€”people are just as apt to feel that the one that’s less noisy sounds better…and in those upper bands, that’s mostly all there is: background hiss and the upper harmonics of noisy percussion.
True. But the intent was to find upscaled MP3. For example, when you download a FLAC file, who’s to say it isn’t just an MP3 that someone has repackaged in FLAC. Also if you download 320K mp3 files. Nothing in the meta data tells you anything. The Frequency graph can bring artifacts to the surface that are caused by lossy algorithms and thus suits fine to “quickly” assert if a file is up-scaled or not.
But the article title is still misleading…
Frequency graphs show the LP filter + some additional fr removal and not the quant. noise which is more important.
The quantization noise makes an MP3 sound bad and harsh not the filtering.
Thought to share some objective tests I have made using Sevana AQuA mentioned above:
MP3_Torture_Test.wav vs. MP3_Torture_Test-128kbps.wav
Voice Quality assessment using Aqua
Processing MP3_Torture_Test.wav vs. MP3_Torture_Test_320kbps_mp3.wav
Voice Quality assessment using Aqua
And now with “music option on”
MP3_Torture_Test.wav vs. MP3_Torture_Test-128kbps.wav
Voice Quality assessment using Aqua
Processing MP3_Torture_Test.wav vs. MP3_Torture_Test_320kbps_mp3.wav
Voice Quality assessment using Aqua
Clearly this clip has no need to be saved with 320kbps, but then I wanted to try it with Vorbis OGG:
By default OGG compression gives
Voice Quality assessment using Aqua
Quite high quality actually… but let’s try OGG with fixed bitrate of 128
Voice Quality assessment using Aqua
Still the same… and now with 48kbps to just try…
Voice Quality assessment using Aqua
It’s almost the same… 48kbps OGG file is 3 times smaller than 128kbps OGG file and 3 times smaller than MP3 128kpbs file…
Conclusion: Sevana AQuA is an interesting tool…
Thank you for the info on your blog. I found an app which might do just what your looking for. I am testing it right now. The Spectrum Analysis feature will cost a few bucks. So I am wondering if that’s a good deal? Here is the info: http://www.similarityapp.com/
Perhaps someone would be willing to provide an answer to a dilemma I recently encountered. I rip ALL my music directly from CD at 320kbps using Windows Media Player (Windows 7) set to the highest audio quality possible. However, I noticed that when using Spek to analyze it, EVERY file pretty much cuts off at 16kHz, making it appear to be an upscaled 128kbps file. If it’s true that frequency range can determine bitrate, it appears that WMP rips files with a 128kbps quality and a 320kbps file size. Why do WMP files appear this way? Why do they appear to be upscaled 128kbps files when they are clearly ripped at 320kbps settings? Nothing I hate worse than a damned program making me out to look like a liar when offering 320kbps files.
Apparently Windows Media Player has been using Fraunhofer’s MP3 encoder since version 10, which explains why my 320kbps rips look like they are 128kbps upscales. Thank you, Microsoft, for yet another crappy program!
For best results use LAME encoder. It comes as a command line utility, but you can find many 3e party user interfaces for it.
I would suggest LameXP. It’s easy to use and has a nice interface.
find it here: http://lamexp.sourceforge.net/
If you want the command line version only, look for windows ports here: http://www.rarewares.org/mp3-lame-bundle.php
The LAME project page is here: http://lame.sourceforge.net/
Good luck. 🙂
Very interesting article, I decided to check my library and found out that most of the 320kbps mp3s I downloaded over the years are actually up scaled from 128kbps!
Good thing I mostly download flacs and none of them have a cut-off on higher frequencies, didnt want to have to download it all again from different sources.
Audioshell is a very useful freeware utility that will tell you which encoder was used to create an MP3.
Download and read more here: http://www.softpointer.com/AudioShell.htm
Hei … i finally found one tool which also checks if the bitrate is real and estimates the real bits and that for collections not only single files
supports german and englisch
Only saw this interesting article today. To quote Walter … “Most encoders cut off at 16 kHz for 128 kb/s to 8 kHz for 64 kb/s.” I found the cut-off for 64 kb/s is approximately 11 kHz. Made a whole lot of CBR tests from 8 kHz all the way up to 20kHz with the same song. Ripping to .wav was done with Cool Edit Pro and conversions to .mp3 using Easy CD-DA Extractor.
just found this article and since it appears as one of the first entries in google search it seems to be pretty well frequented. I would like to correct an error you made in your “clipping” part. The thin purple streaks that extend all the way to the top of the scale have absolutely nothing to do with clipping. As you yourself mentioned in the beginning of your article, the colors in the graphs indicate the amplitude/loudness. Hence the color of the streaks indicate that these parts show quite low amplitudes.
The streaks ranging all the way to 20 or 22kHz just indicate that you have some type of (white) noise at relatively low amplitudes (check the color of the streaks on your color scale), because the streaks go over the whole frequency spectrum present in the sound file.
If these streaks are already visible in the original wav or flac file (which they are in your samples) something already happened in the studio during recording/mastering.
I dont know the reason why these streaks remain after mp3 compression because they artificially increase the file size while not maintaining any usable information. It could be an mathematical artefact of the algorithms used or something else. This I dont know.
Back to the clipping. Clipping is identified by color in this type of spectrum, not by the position or extension of datapoints.
I’m actually not sure where the white noise originates, but as clipping is “mathematically equivalent” to the addition of impulse noise, it made the most sense at the time. But it could just as well originate somewhere else. You’re right though that the encode should not take these extra frequencies in, even if those are in the original recording. So maybe it’s noise added while playing the data back out. It makes an interesting side-story I like to dig out further someday.
Thanks for pointing it out!
The streaks do represent clipping. They remain in the MP3 spectrogam because the signal was re-clipped again after decoding. If the stream was decoded in floating point and scaled to where clipping does not occur, only the impulse noise that the encoder considered audible would remain, and only at very high bitrate (clipped peaks would no longer remain as flat as they were in the source).
It’s always easier to say the difference between upscaled flac vs real one, just by seeing spectrogram than ABX.
I abx’ed couple of months ago between 192mp3 vs flac. Most of time i did able to distinguise but 320, no..never.
But i still use flacs even on my phone, so that i dont have to worry about losing data.
The culprit is my headphone though,..it’s sony mdr.
But sometimes it is just doesnt seems to be that easier. Dont believe me, just encode a flac to aac ~256VBR, it’ll keep identical frequency as flac.
So, an upscaled aac cannot be identified by seeing graphs.
can someone please lighten me up that if aac keeps identical freq as flac, so it’s maybe weak in lower signals. If not, how aac use bandwidths so precisely than it can go up to 22kHz?
Not all AAC encoders will do that. Most will show spectrograms similar to MP3, minus the sharp 16kHz cutoff. Apple has tuned both their AAC and MP3 encoders to attempt to preserve full bandwidth. We can speculate that maybe Apple wanted to impress audiophiles who listen with their eyes.
If you use a spectrum analyzer that is detailed enough, you will still see blockiness in the upper few kHz while strong signal exists in lower bands to mask the high frequencies. You must generate a sonogram (a realtime analyzer won’t do), and it must be sensitive enough to reveal detail down to about -110 dB(FS), and at least five or so samplings per second.
Older codecs exist that will preserve full bandwidth (with some added noise): Qdesign MP2, Dolby AC-3, Microsoft ADPCM. They don’t necessarily sound good.
Great article. Also seems to be creeping at the top of the “mp3 encoding reader” search on Google.
I am trying the AudioExpert app now which states:
“AudioExpert can detect musicfiles that were re-encoded with a higher bitrate
Re-encoding audiofiles with a higher bitrate does NOT improve the quality, it just claims that it is better. In reality, the quality gets even worse. AudioExpert can help you to identify those songs and fix this by re-importing a better version.”
If that doesn’t work I will test it against:
I am going to check Amazon’s MP3 download of a DVD I bought. Chris Botti in Boston to see if Amazon is ripping me off.
Looks like AudioExpert worked for me.
Not sure it tells me enough info to make a good decision about Amazon. But they seem to be using VBR at 250kb and top out around 16hz. BUT…I do at leadt see some sound reaching up into the 20-22Hz reach when it needs to. So I suppose I can live with it. Here is an image of the Spectrum analyzer.
Ok. I checked another song (If I ever lose my faith in you – Live – Boston Chris Botti), and it looks like Amazon is doing a good job. This one is 275VBR, but it reaches all the way up into the 22hz area. So I guess it is really song dependent.
Again. Great article. Really helped me figure this out.
The Fraunhofer encoder will actually not utilize the entire 320 kbit rate. Streams encoded with FhG contain unused space in every frame, which can be seen with a hex editor. This space can be reclaimed using MP3Packer without loss, and the output bitrate will be approximately 270 kbit/s. FhG doesn’t use joint stereo, nor the bit reservoir at the highest bitrate, and is shy of encoding any energy above 16kHz. Early versions of the encoder distributed by Radium didn’t even have the 320 setting enabled, which was certainly practical. The part of the spectrum below 16 kHz is almost completely solid in a 320 version, but contains dropouts at much lower bitrates, where the sound gets quieter.
LAME 3.90.3 preserves the most sound data at –preset insane. It will also not use joint stereo even on mono input, which can be tweaked with –nsmsfix 1.5. Later versions of LAME limited the use of the bit reservoir and thus maximum instantaneous bitrate, after some decoders were discovered that couldn’t handle that much data at once. (search for a discussion by halb27)
A spectrum anlyzer with good resolution and natural colors, as shown in this article, works well for analyzing encodes. Many freeware tools have too saturated colors, and don’t show low level signal. The harsh colors cut to black even on normal lossless input. I prefer to use SOX with Frontah as a frontend for analysis. It is possible to generate spectrograms of multiple files and switch between them without accidentally zooming in or out, and then waiting for the sonogram to rebuild. SoX does not respect LAME’s encoder delay, so Mp3 will always appear shifted relative to the source file.
This configuration zooms into the spectrum enough to see 16-bit noise floor; fixed time resolution (X) makes it easier to compare files of different length:
-n spectrogram -X 24 -Z -10 -z 120 -t “file” -o “file-24.png”
I’ve observed that some encoders are very difficult to detect (visually or by AuCDtect.exe): Qdesign MP2 in fast mode will encode full spectrum in high detail at 320 kbit. Dolby AC-3 will encode up to 20kHz, collapse the upper frequencies to mono, and partially bury them in smooth noise floor, which is not clearly apparent without a reference. However, these encoders are not used in practice. We could expect transcodes from Ogg Opus though, which appears flawless up to the 20kHz limit.
interesting! I found this tutorial a bit more clear: http://stealmylyrics.com/lossy — but it has video clips, not static pictures
One vertical column of pixels in a spectrogram represents one frame in a conventional analyzer, which makes it easier to use for this application, particularly in difficult examples such as the “AAC 320kbps” item. You don’t have to remember how the analyzer moved in the past and follow it, to notice a pattern. You just look left or right on a picture that is always in front of you.
So compare… it is that easy. My player already shows the true output and asks me to shrink the fakes.
This article is misleading and misinformed. Parts are correct or good advice (i.e. don’t transcode because it won’t increase quality, and spectrograms can help spot suspected transcodes being passed off as lossless) … but the use of spectrograms and high-frequency content to assess subjective quality is basically just wrong.
The amount of content in the highest frequency band(s), and how it looks on a spectrogram, is not a measure of quality. Quality can only be determined by blind listening tests, and when you can’t tell the difference, “transparency” and thus maximum quality (as good as lossless) has been achieved, and it does not matter at all what the spectrogram looks like. With lossy codecs, transparency occurs even when the spectrograms are easily distinguishable. Likewise, obvious differences can sometimes be heard even when spectrograms look about the same.
The most important concept that seems to be missed here is that lossy codecs, and MP3 especially, simply cannot preserve the highest frequencies without sacrificing the quality of the lower ones. It’s partly a matter of lack of space, partly a matter of design priorities, and partly psychoacoustics. Being so rarely audible, the highest frequencies matter the least, and they are expensive to keep…so they are the first to be discarded, by design. (Also if the bitrate is very low, the input must be downsampled, which requires using a lowpass filter cutting off at half of the new sample rate, to prevent aliasing.)
The difficulty of keeping overall quality high when preserving high frequencies is why, when you disabled the lowpass filter in the FhG encoder, the encoder was very selective about what parts of the 16+ kHz range it let through. It only kept what the psychoacoustic model predicted would be audible. The look of the spectrogram is good in this regard; you want to see that ghostly cutoff because it means that the encoder is being judicious about what to keep and when to sacrifice quality. Unfortunately, whenever it kept anything at all in that uppermost band, it was sucking away precious bits from the 0-16 kHz range, adding noise to the parts of the music to which your ear, not eye, is far more sensitive. Hopefully it only did this where you wouldn’t notice, but in general it’s safe to say the result of forcing the encoder to use a higher cutoff is likely worse/riskier than using the defaults it was tuned for.
Exactly how much high frequency content you can let through without noticing the quality decline in the lower bands is hard to determine; it will vary by listener and selection of music, and of course the listening gear & conditions, and the efficiency of the encoder in other regards. LAME’s developers made it a priority; they’ve adjusted the lowpass cutoffs several times over the years so that it defaults to optimum values as allowed by the efficiency of that particular version of LAME. FhG did not make it a priority, hence cutoffs still visible at 320 kbps, but their encoder is still pretty good overall, at high bitrates.
Again, we hear with our ears, not our eyes, so comparing graphs which exaggerate minuscule objective differences is not a substitute for proper ABX testing.
Interesting. Why doesn’t the encoder just fatten the file and keep all the data?
i was trying to find something similar, with less precision, i just don’t want any mp3 less than 128 kbps, i have a list with many folder with 2,700 mp3 files, so i want an automated tool to find any file with less bitrate than 128, so i could clean up and organize it.
maybe this will help someone of you, right now i’m trying: http://www.similarityapp.com/
check it out and reply back.
My Best Regards
super helpful, thank you!
Don’t be scared of VBR mp3 files. Lame mp3 VBR at the highest setting V0 (+- 256kbps) will achieve almost identical results and a far smaller file size. To 99% of people the VBR mp3 produced will sound will sound indistinguishable from the source.
Most people believe hold CBR 320 kb/s up as the holy grail and turn their noses at VBR (V0 Q0 the highest VBR quality setting) but for all intents and purposes they will NEVER ever actually hear the difference.
Do not be fooled into thinking the 256 kilobits VBR V0 that amazon uses is inferior to 320 kb/s CBR actually the sound quality difference is imperceptible and the file size is 15% smaller for V0 and 30% smaller for V1 then 320 kbps.
“LAME offers CBR and VBR encoding modes. VBR is best used to target a specific quality level, and CBR is best used to target a specific bitrate. Unlike in VBR, the perceived quality of decoded audio will tend vary across a CBR file.
CBR encoding is not efficient. Whereas VBR modes can supply more bits to complex music passages and save bits on simpler ones, CBR encodes every frame at the same bitrate.
CBR is only recommended for usage in streaming situations where the upper bitrate must be strictly enforced or for hardware that has problems with VBR.
For example, using the same encoder, a 128 kbps CBR MP3 will almost never sound better than a VBR MP3 that averages 128 kbps, because in VBR, the simple parts of audio can be better compressed than in CBR, thereby allowing more bits to be available for the complex parts.
At high bitrates, the quality difference between typical CBR and VBR files approaches zero.
In general, however, for most types of input, assuming identical input, identical encoding methods, and sensible targets for VBR quality and bitrate bounds, VBR will almost always produce equal or better perceived-quality results than CBR for files of the same size or average bitrate, and this has been demonstrated in numerous double-blind listening tests.” ~ Hydrogen Audio
If you want the absolute highest possible quality and don’t care about file size (read: likelihood the MP3 will sound indistinguishable from the source) use 320 kbps CBR.* The command line is -b320. That is all.
If you want BOTH the highest possible and smallest file size use VBR -V0 -Q0 which will probably achieve almost identical results from most sources, while not wasting space on 320 kbps frames when they’re not needed.
Some more pointers: Don’t use simple stereo use joint stereo. Don’t resample.
While technically 320kps CBR is slightly better then VBR you will never hear the difference. VBR has a smaller file size and in my experience has less problems with this fake 320kps which are actually re-encodes of 128kps.
While the gurus at Hydrogen Audio agree that CBR is technically the best what I find very interesting is that when I did some tests myself using the spectrum analyser to my surprise I often found that the VBR file showed more detail and intensity then the CBR file for the same .flac source file and an encode to CBR 320kbos and to VBR V0 Q1 (which results in an average of +-256kbps).
So while you all use 320kps I am going to stick to my preference for 30% smaller V1 encodings because you can not hear the difference even if you can see a difference in the spectrum analyser.
I’d like to find a program that will identify the version of mp3 compressor used. I ran across an mp3 with a spectro of a superior flac but my spectro which identifies mp3 versions from 3.9 & earlier, only displayed unknown on this file. Either mp3 has a new drastically improved compressor I’d like to get my hands on for my editor or all filtering was removed somehow. Anyways, if anyone knows of such a tester, plz post it. thx!
“While the gurus at Hydrogen Audio agree that CBR is technically the best what I find very interesting is that when I did some tests myself using the spectrum analyser to my surprise I often found that the VBR file showed more detail and intensity then the CBR file for the same .flac source file and an encode to CBR 320kbos and to VBR V0 Q1 (which results in an average of +-256kbps).”
@Anton: I was OK with your argument until here. 320CBR-JS will always be better than VBR-JS, UNLESS, your VBR sometimes goes above the 320kbits bitrate. AFAIK, MP3’s top out at 320kbs for any single frame. The “detail and intensity” you might have found could be artifacts of poor filtering that escape the low-pass and other high-frequency garbage.
So therefore as a corollary, during encoding, NEVER use a sampling frequency more than 2x what your spectrum analysis shows of your original source. The caveat to NEVER is if the encoder has a bad low-pass filter, hence the artifacts you “might have” noticed.
Just wanted to say thank you for the informative article. Appreciate the work you put into this.