³ÉÈË¿ìÊÖ Proms in Extra High Quality on the Internet- The Tech
Rupert Brun has written a great post on the Internet Blog outlining the XHQ experiment with the Proms.Ìý He has very kindly agreed to give us on the R&D blog a more detailed technical overview of his team's work, for those of us with a liking for the techy stuff!
This post explains the signal path used to deliver the 320Kb/s AAC internet stream of Radio 3 for the final week of the ³ÉÈË¿ìÊÖ Proms. For background information about the experimental extra high quality feed, you may wish to read the entry on the ³ÉÈË¿ìÊÖ Internet Blog and to listen to the audio, visit the web page hosting the experiment here.
The Radio 3 Proms at the Royal Albert Hall, available this year in Extra High Quality. Image CC Steve Bowbrick
Ìý
The signal from the microphones at the Royal Albert Hall is converted on the stage to 48ks/s 24bit audio and sent to the outside broadcast vehicle over fibre. Each microphone has appropriate equalisation and time alignment applied and the sound is mixed down to stereo for broadcast on Radio 3.
Still at 24 bit 48ks/s the stereo audio is fed over an "E1" 2Mb/s circuit to London Broadcasting House and passes through the main audio router to the Radio 3 Continuity Studio. Here it is unfortunately necessary to sample rate convert the audio to 44.1ks/s. The reasons for this are largely historic. When Radio 3 moved from analogue tape to digital production, the majority of the audio was stored on CD - either CD(R) for ³ÉÈË¿ìÊÖ recordings or commercial CDs. Due to limitations in faster-than-real-time sample rate conversion at the time, this in turn meant that the computer playout system used to hold audio for transmission had to operate at 44.1ks/s. so that CDs could be "ripped" into it. None the less, the playout system does work with uncompressed BWAV files rather than MP2 as was normal at the time. The same system is still in use today and it has so far not been possible to convert the system (and all the content within it) to 48ks/s. London Broadcasting House is a multi-media site and in preparation for the arrival of TV news we have set our core audio router to operate at 48ks/s.Ìý This means that for the immediate future radio works in a mixed economy of sample rates. By operating the studios and playout system at 44.1ks/s the number of conversions is minimised. For the live ³ÉÈË¿ìÊÖ Proms concerts there would be fewer conversions if we switched the Radio 3 continuity suite to 48ks/s but for the majority of the time this would not be the best configuration and it is not the sort of change that can be made on a regular basis - certainly not whilst the studio is on air. Once we have a new playout system (planned for 2012) it is hoped that it will be possible to operate at 48ks/s end to end, although archive content and commercial CDs will obviously need to be converted from 44.1ks/s for broadcast.
The signal passes through the continuity suite mixing desk, to allow the broadcast to be faded up at the appropriate time to become part of Radio 3's output. The mixing desk output feeds a "Transmission Router" which is used to send the studio broadcasting Radio 3 at any given time to the Radio 3 transmitters. This involves a second sample rate conversion, back to 48ks/s. The transmission router feeds a number of transmission chains such as FM, DAB and DTV (terrestrial and satellite). For Radio 3 we do not use transmission processing for any digital platform, so the feed to the DAB coders is the same as that to the digital television platforms and the internet. For other radio networks, transmission processing is used and it is matched to the platform. We use the digital television feed for the internet because we believe the processing used is the most appropriate - the bit rates and intended listening environments are similar. Ideally we would use separate transmission chains and processing for the internet but the small audience size does not yet justify the cost of this.
The audio is then fed to a small router which feeds the sound cards of our Coyopa system. Coyopa codes (with one exception) all network radio audio for the internet, including live streaming and on demand. The exception is the production of podcasts, which usually require a separate editorial version of the programme so the files for podcasts are created using the desktop production tools used to edit them.
Coyopa has two halves for resilience, each with about 60 servers. It creates audio streams for each of our network radio stations in a number of formats. For each radio station there are both national and international streams because we don't have the rights to make all of our content available outside the UK and have to give international listeners a restricted service at times. Coyopa also records all of our output according the broadcast schedule (in essence the ) and uses these recordings to create the "on demand" files for programmes.
The sound cards in the servers carry out the third and final sample rate conversion to 44.1ks/s because the domestic codecs used to replay the audio in listeners' computers don't support a wide range of bit rates at 48ks/s. The sound cards are also used to provide some protection limiting and gain adjustment in order that the codecs are fed at the correct level. We feed the codecs with a peak level of -4dBFS because the codecs themselves can generate overshoots and if we fed them with a 0dBFS signal, clipping would occur. The codecs use the Fraunhofer encoder which outputs AAC-LC. The audio streams from both halves of the Coyopa system are then sent to a third party for distribution over the internet. The iPlayer just provides a link which points to the appropriate stream. The 320ks/s experimental feed uses exactly the same audio and codecs as the normal 192ks/s feed; the only difference is that the codec is set to deliver a higher bit rate.
ÌýRupert Brun is Head of Technology for ³ÉÈË¿ìÊÖ Audio and Music
- Listen to Radio 3's Extra High Quality Proms audio on the Radio 3 web site during live broadcasts of The Proms until 11 September 2010. On the same page you'll find a link to a survey about the experiment. Please take a minute to complete it once you've tried the Extra High Quality experience.
- Help us spread the word about Extra Quality Audio for the Proms by tweeting about the experiment using the hashtag .
- Read Rupert's FAQ for answers to the big questions about PromsXHQ.
- Read this blog post by Radio 3 Interactive Editor Gabriel Gilson on the Radio 3 blog for some additional context.
Comment number 1.
At 9th Sep 2010, michaelkenward wrote:Had any problems with the experiment?
A message over on a music fans' group said something about level problems. (Now fixed, apparently.)
And, is this stream available only in the UK?
Nice idea. Unlike the latest iPlayer update!
Complain about this comment (Comment number 1)
Comment number 2.
At 10th Sep 2010, simoncn wrote:Thanks for this very detailed information. Is the current experimental live stream available in any form other than Flash/rtmp? My internet radio music system can only consume http streams, so I have been unable to listen to the 320 kbs Flash stream through my hi-fi system.
When you have the new playout system and are able to operate at 48ks/s end to end, will there be an internet stream at 48ks/s? This would be beneficial for those with receiver equipment that can handle a 48ks/s stream.
Complain about this comment (Comment number 2)
Comment number 3.
At 10th Sep 2010, Rupert Brun wrote:I'm afraid the experiment is only available in Flash/rtmp, although I see Squeezebox have picked it up and made it available to a wider range of devices.
We currently plan to stick with 44.1ks/s for final distribution even when the production chain is 48ks/s end to end, because 44.1 is the "domestic" sample rate and allows for the best compatibility in domestic products. Such decisions are reviewed periodically in the light of revisions to receiver profiles and standards; by the time we get our systems to 48ks/s the situation may have changed.
Rupert Brun, Head of Technology, ³ÉÈË¿ìÊÖ Audio & Music.
Complain about this comment (Comment number 3)
Comment number 4.
At 10th Sep 2010, cantobel wrote:Rupert, does that mean that the Astra satellite stream, for example, which is at 48kHz, is just upsampled from 44.1?
Complain about this comment (Comment number 4)
Comment number 5.
At 11th Sep 2010, rmgalley wrote:#4 cantobel.
Yes, this is exactly what Rupert means, until at least 2012. All the variants of Radio 3 go through the Radio 3 Continuity desk which operates at 44.1 kHz sampling frequency. The digital variants are then up-sampled to 48 kHz for routing to the DAB and both DTV platforms (to share commonality with the TV audio routing system). This 48 kHz signal is down-converted again to 44.1 kHz in the Coyopa servers for the on-line streams to ensure better compatibility with domestic PC on-board audio (most dedicated sound cards would nowadays support 48 kHz or higher sampling rates).
In the case of this years Proms the signal is re-sampled twice for the DAB and DTV platforms (and FM?) and three times for the internet streams.
Complain about this comment (Comment number 5)
Comment number 6.
At 11th Sep 2010, simoncn wrote:Rupert,
Thanks for these answers. I understand that most of the general listening audience might have equipment that copes better with 44.1ks/s. However, listeners who are most interested in the better quality of the 320 kbs stream are presumably also more likely to have equipment that can take advantage of a non-resampled 48ks/s stream. Would it make sense to distribute a "regular" stream at 44.1/192 and an "audiophile" stream at 48/320?
Complain about this comment (Comment number 6)
Comment number 7.
At 13th Sep 2010, rmgalley wrote:Rupert,
Thank you very much for the detailed information about the setup at the RAH and the signal routings from there to the various digital transmission platforms. This clarification was important as you confirm the Coyopa sound cards do ‘provide some protection limiting and gain adjustment’. Limiters are present on the outputs of both the RAH OB truck, Radio 3 Continuity and the Coyopa system.
I haven’t seen any evidence of limiting on either of the DTV platforms so I presume what I am still seeing on the internet streams (but to a very much lesser degree now) is caused by those operating in the Coyopa sound cards.
With regard to comments I made earlier, I did some checks on the level of overshoots caused by aac coding. For this I used the Nero v1.5.4 aac coder examining the results on Sound Forge 9 DAW. I tried a variety of audio content coding at 128, 192 and 320 kbps all AAC-LC CBR.
I found the encoded > decoded waveform differed from the original by varying amounts being greatest for the lowest bit-rate. I found the coding artefacts could either add to or subtract from the original. The greatest deviation from the original was 0.2 dB with 128 kbps and 0.1 dB for 192 and 320 kbps. On this evidence it would suggest the limiting thresholds of the Coyopa system could be relaxed. A more important factor is likely to be the speed and behaviour of the limiters when subjected to a severe overload. Prior to Wednesday, when audio content was trying to exceed the – 4 dB threshold by about 6 dB, content was occasionally getting through to a level of – 3.5 dB of FS.
Until the ‘Last Night’ I had recorded the XHQ streams on my main PC using Sound Forge 9 DAW with a Creative SB Audigy 2ZS audio card. The 192 kbps stream was recorded on another PC with a lesser Creative SB Audigy-SE with a Sound Forge 5. The later recording were transferred to the main PC via a memory stick. These rolls were swapped for the Last Night.
My intention was to prepare a level matched edit prepared from DSat, 192 aac and 320 aac and fed through to the Hi-Fi for evaluation. It has become apparent the sound card in the second PC degrades the audio to a greater degree than the difference between the two online streams. I have not been able to undertake the careful tests I had intended and can only, for the purpose of evaluating the 320 kbps stream, rely upon impressions gained listening to the live or recorded content via the main PC. These are however entirely favourable and the XHQ stream has been a delight to listen to.
For what it is worth the steps I took when making my comments were:
• Remove any DC offsets from the waveforms
• Check relative levels by measuring the RMS value of carefully defined sections where there was lower level content using Dsat as reference. Compare that with the instantaneous peak levels in loud sections looking for limiting.
• Check the polarity of the waveform so they are all the same.
• For listening tests adjust the RMS levels to within 0.1 dB.
One other anomoly, for the Last Night with the Freeview mp2 as reference I found the 192 kbps stream was now 0.7 dB higher and the 320 kbps stream 1.6 dB higher. I will check the 192 stream again as previously I had found both to be 1.9 dB higher than the mp2.
Had it been possible to record the aac streams directly (before conversion to wav files) in the same manner I am able for the mp2 Dsat and Freeview feeds, the evaluation could have been better expedited. I guess this is a limitation of having the Flash/rtmp wrapper.
One other question, do you have a remit for the audio and music on networks other than Radio 3? When I last looked the on-line Radio 4 stream was also suffering from too high audio and the threshold of the limiters was also set too low.
Finally I think I have discovered the significance of a recording of a Bridgewater Hall Mahler Series recording made last May which was marred by internet breaks and level changes. In addition to the breaks frequently the level would jump up significantly with the limiters being hit. I have just checked and the level was increasing by 6.9 dB! I think I know the reason why - I was getting one of the other banks of servers with the non-updated software.
Thank you again for setting up the experiment. It has given much food for thought and other unanticipated benefits - such as remedying the impairments on the daytime 'Listen Live' and 'Listen Again' streams by removing the dynamic range compression. Your experiment has been very well received.
Complain about this comment (Comment number 7)
Comment number 8.
At 13th Sep 2010, rmgalley wrote:Today I was able to compare the afternoon repeat of the Mahler Symphony 1 with the small part I’d recorded live from the XHQ stream on 3 September. At this end the equipment and signal paths were identical other than the bit-rate. The other potential difference was between the live source and this afternoons playout source.
Levels were carefully matched by choosing sections where limiting wasn’t taking place on the original XHQ stream. I did find a suspicion of dynamic range compression as, after matching levels on very quiet passages, intermediate loudness sections were different on the XHQ stream but only by about 2 dB – not, I think, subjectively significant.
The conclusion reached was there were tangible but subtle benefits heard on the XHQ stream as compared with this afternoons repeat broadcast. (The benefits were much more obvious comparing with the Dsat mp2 feed).
The fine detail was better reproduced on the XHQ stream. You could hear deeper into the sound coming from the instruments. The rosin on bow sound of the stringed instruments was just more ‘there’, with a delicacy and precision not quite heard to the same extent on the 192 kbps stream. The space between the instruments was more tangible giving a more solid, three-dimensional portrayal of the space. Massed strings were sweeter, easier to listen to and free of any strain. Wind instruments such as flute exhibited a greater delicacy with a feathery breathiness. Despite the limiting on XHQ crescendos these seemed to have more ‘bite’ or attack. Also great gains were heard in the reproduction of human speech compared to the mp2 feed where the lack of a spread or smeared image was such a relief. Compared to the normal aac stream, speech on the XHQ portrayed more subtle nuances and it was so very easy on the ear.
All in all this has been a very worthwhile experiment and, while the normal stream is very good I would characterise the XHQ stream as superb. Internet feeds are susceptible to momentary interruptions so, if a multiplicity of these aac streams for all ³ÉÈË¿ìÊÖ radio networks could be provided in the future, I would suggest the ideal platform would be via satellite.
Thank you again, Rupert.
Complain about this comment (Comment number 8)
Comment number 9.
At 15th Sep 2010, Rupert Brun wrote:Thank you for the thorough evaluation. Yes, my remit does cover more than Radio 3. I'm responsible for all the "National Radio Networks", so Radio 1, 1X, 2, 3, 4, 5L, 5LSx, 6, 7, Asian Network.
I agree there's almost certainly room to move the limiter threshold up a bit in Coyopa. Our measurements found that the limiters overshoot more with heavily compressed content than with Radio 3. We have also found that the codecs overshoot more at lower bit rates. It’s clearly likely that the optimal settings are different for different networks. We have started some experimental work to determine the optimum settings but as the experimental stream will be "up and down" and carrying various networks whilst we do this I will not be making it available to the public. If we make changes as a result of these experiments, I’ll let you know through the blogs.
Rupert Brun, Head of Technology for ³ÉÈË¿ìÊÖ Audio & Music.
Complain about this comment (Comment number 9)
Comment number 10.
At 15th Sep 2010, Alexander Melhuish wrote:This sounds like an interesting experiment, and I only wish I'd read my news feeds last week so I could have caught it! Hopefully there'll be further experiments in the future.
Regarding the assertion that most consumer equipment is based on 44.1 ks/s: certainly as far as computer equipment is concerned, nearly all devices manufactured in the last 10 years for 'on-board' audio use chips compliant with Intel's AC97 standard. According the Wikipedia, the base revision of this standard (1.x) requires fixed 48ks/s sampling rate capability, with support for other sampling rates implemented through (usually crude) software.
Recent revisions allow for different sampling rates, but I've read that many of these implementations use crude resampling techniques, while running the DAC at 48ks/s for simplicity. Consequently, by far the best sound quality is achieved by a 48ks/s signal chain.
I'd be interested to know exactly what the codec limitations are regarding bitrate and sampling rate. The only two codecs currently used (AAC and WMA), to my knowledge, have no bitrate restrictions with respect to sampling rate. It wouldn't surprise me if RealAudio does, but now that that service has been phased out, that shouldn't be a restriction now?
I do realise, however, that a lot of consumer electronics (mobile phones, internet radios, etc.) are less likely to support 48ks/s, especially as part of their decoding firmware (which I think is your main concern). Hopefully this will improve though, as many of these new consumer devices support 24-bit 96 ks/s audio now.
Re: rmgalley and AAC codec clipping: It was noted in the original post that Fraunhofer codecs are used, which I presume are reference implementations. The Nero AAC encoder is arguably more able than the reference codec, and continues to be improved. So while your measurements may show rare clipping using Nero's encoder, that doesn't necessarily apply to the codecs used by the ³ÉÈË¿ìÊÖ.
It would be interesting to know what kind of sound quality improvement could be achieved by using a more capable encoder like Nero's, though I don't know if it's possible to use the Nero codec for live streams.
Finally @rmgalley: it is possible to record the AAC RTMP streams with the right tools. They're well known among other commenters here; a little googling should show them up. I shan't name them here, as their use isn't condoned by the ³ÉÈË¿ìÊÖ for licensing reasons!
Complain about this comment (Comment number 10)
Comment number 11.
At 20th Sep 2010, rmgalley wrote:Just a quick reply to Alexander Mellish # 10.
With regard to the merits of the variously available aac codecs my main point, confirmed by the checks I did, were that any codec which purports to provide very high quality audio must by definition provide an output waveform virtually identical to the input - the differences should be minimal. The point, correctly made by Rupert, was that content at or close to FS could be clipped on the output. This is generally the case but would only be a problem with codec/bit-rates which did not produce a near identical output. The XHQ stream output is unlikely to differ from the input by more than 0.1 dB so the threshold of the Coyopa limiters, sometimes set at -6 dB FS, could be relaxed. This is especially so if, as stated, the servers are fed directly from the output of the Radio 3 Continuity, which has its own limiter.
Thank you for the advice about recording the RTMP streams. After investigation I discovered I already had software that could accomplish this. But, as it was entitled 'Streaming Video Recorder', it never occurred to me it could be used for the radio streams. It is a pity I didn't realise this while the experiment was in progress.
Complain about this comment (Comment number 11)
Comment number 12.
At 27th Sep 2010, 2Bdecided wrote:I was delighted to hear this experiment (sad to be on holiday for the last night!). Difficult to make a fair comparison without a level-matched double-blind test, but it sounded superb. TBH the existing 192k feed is already a beacon of best practice and excellent sound (unless audible dynamic range compression kicks in, which is rare).
Only just spotted the comment that Rupert is responsible for _all_ the National Radio Networks...
Rupert - please - there are far bigger issues with the other networks than bitrate: the audio processing on Radio 2 (especially) _shreds_ the slightly more "serious" music that sometimes gets broadcast on that network. Can't it be turned down/off for the iPlayer/DTT/DSat feeds? It's not like people can listen to these in noisy cars anyway, so there's no need to have it "as loud as possible".
Programmes like Friday Night is Music Night, The Organist Entertains, and the whole Sunday evening sequence, just don't sound right with aggressive dynamic range compression. If an older recording is included, the background noise bounces all over the place. If a quieter recording is played, it's dragged up to be as loud as anything else. If there's a loud moment in a recording, it's really obviously dragged down - sometimes it sounds like someone is fighting over the volume control, dragging it up and down as the music plays. It might be OK for pop music (though modern CDs are dynamically compressed more than enough already IMO), but it doesn't work for light music, jazz, etc.
Please take a look and see if anything can be done. I find the audio on iPlayer near-perfect otherwise, and it's a shame the processing seems to work against some of the material.
Cheers,
David.
Complain about this comment (Comment number 12)