See my MIDI notes for more on related topics.

The Crazy Riot Of Linux Sound Technologies

Getting a computer running Linux to make the sounds you want it to make can be a real challenge. The problem seems inherent in the solution - open source software (and the Unix philosophy) tends to fragment into very specific functional components. So a low level hardware driver related function is cleanly separated from a user level function. With Linux sound there are actually many layers and some projects encompass several layers. This creates some choice, flexibility, developer modularity, and, well, confusion.

Here is an excellent guide to troubleshooting Linux sound problems. (Hint, MM is mute and OO is not mute in alsamixer. Sounds obvious, but when everything goes mute suddenly for no discernible reason there can be uncertainty.)


"OpenAL is a cross-platform 3D audio API appropriate for use with gaming applications and many other types of audio applications." This library is focused on creating correct sound levels to model multichannel sound of events in 3d space (and explosion on your left comes more out of the speaker on your left). This is probably most useful for virtual reality applications and games. Output: ALSA (default), native, sdl, arts (unstable), ESD, OpenAL enabled hardware devices (like some Audigy and X-Fi cards)


This is the original Open Sound System. There are ports to it on BSD. The naming conventions that were established with this in the dark ages persist today even though other facilities actually manage the traditional /dev/ locations. There is also OSSv4 which is not, apparently, completely obsolete. Generally it’s best to use one of the new sound models though. I think the general criticism was that OSS applications had sound device locking problems.


Advanced Linux Sound Architecture. This is the main standard for low level Linux sound processing. It can output directly to kernel hardware drivers (which start with snd_). It can also output to legacy OSS devices. Inputs: PulseAudio, JACK, GStreamer, Xine, SDL, ESD Outputs: Hardware drivers, OSS


A "sound server". This correctly implies that it will serve audio events over a network connection, though in practice this seems hard to set up a realistic example. It also mixes, sets volumes (per application and per sound device) and prevents bad applications (e.g. Flash) from monopolizing the sound output resources. It is also useful for sending audio output to multiple sound devices and can capture audio from many sources. Inputs: GStreamer, Xine, ALSA Outputs: ALSA, JACK, ESD, OSS


Enlightenment Sound Daemon. This is similar to PulseAudio in attempted scope. It has good mixing but limited support and is being replaced by PulseAudio.


"PortAudio is a free, cross-platform, open-source, audio I/O library. It lets you write simple audio programs in C or C++ that will compile and run on many platforms including Windows, Macintosh OS X, and Unix (OSS/ALSA). It is intended to promote the exchange of audio software between developers on different platforms." Outputs: Windows (MME, DirectSound, and ASIO), ALSA, OSS, JACK, Mac OSX Core Audio Input: Audacity


Supposed to be low-latency. Needs applications that are JACK aware. Doesn’t go directly to sound device output. Basically just seems to route between JACK aware applications (which is no small thing). Inputs: GStreamer, PulseAudio, ALSA Outputs: OSS, FFADO, ALSA


Simple DirectMedia Layer. "Simple DirectMedia Layer (SDL) is a cross-platform, free and open source multimedia library written in C that presents a simple interface to various platforms' graphics, sound, and input devices." There is a joystick subsystem (Hmm…).


This is the back end library for the Xine media player. This library supported a very wide range of different codecs and the library is a good choice for projects needing broad support of multifarious formats. It can, in theory, output to JACK, but sometimes that support is not compiled in. Inputs: Phonon Outputs: PulseAudio, ALSA, ESD, (JACK)


An advanced encoder/decoder stack. Inputs: Phonon Outputs: ALSA, PulseAudio, JACK, ESD


This is a framework that was used by QT to make "cross-platform" capable stuff where the details of sound production would be transparent to the programmer. The details were generally passed on to GStreamer for actual sound production. Does for QT what SDL does for games. Apparently this is interface wasn’t too popular overall and may be dropped. Inputs: Qt and KDE apps Outputs: GStreamer, Xine


Free Firewire Audio Drivers. This basically links high end studio stuff (fancy physical audio gear) to JACK. Apparently in high end stuff firewire is (was?) considered the good way to go. I think this one should be safely ignored by most people. Inputs: JACK Outputs: Fancy Audio Hardware


Analog Real Time Synthesizer. This was an audio framework for KDE (old versions). Mercifully it has conceded defeat. It was replaced with Phonon. The sound daemon was called artsd. If you come across this, the documentation is old. It is safely ignored.


"VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files as well as DVD, Audio CD, VCD, and various streaming protocols."


"Linux Audio Developer’s Simple Plugin API is a standard that allows software audio processors and effects to be plugged into a wide range of audio synthesis and recording packages. For instance, it allows a developer to write a reverb program and bundle it into a LADSPA plugin library. Ordinary users can then use this reverb within any LADSPA-friendly audio application. Most major audio applications on Linux support LADSPA."


Might stand for "Disposable Soft Synth Interface". "DSSI (pronounced dizzy) is an API for audio processing plugins, particularly useful for software synthesis plugins with user interfaces. DSSI is an open and well-documented specification developed for use in Linux audio applications, although portable to other platforms. It may be thought of as LADSPA-for-instruments, or something comparable to VSTi."


Open Sound Control "…a protocol for communication among computers, sound synthesizers, and other multimedia devices that is optimized for modern networking technology…" Seems like this could be obsolete. Couldn’t quite figure its place out. This is a good place to start.


Virtual Studio Technology. From wikipedia: " an interface for integrating software audio synthesizer and effect plugins with audio editors and hard-disk recording systems. VST and similar technologies use digital signal processing to simulate traditional recording studio hardware with software. Thousands of plugins exist, both commercial and freeware, and VST is supported by a large number of audio applications." Might be some proprietary nonsense with this one. Perhaps the proprietary inspiration for LADSPA. Also VSTi is the "i"nstrument plugin. Just being complete with the weird acronyms one encounters.

Making MP3 Files

The general technique for making mp3s from raw wav files is something like this:

ffmpeg -b 192k -i The_Rain_Song.wav The_Rain_Song.mp3

After ripping the file, it might be nice to make the ID3 tags sensible. There are a couple of programs that do this. The ones I know about are id3ed, mp3info, and id3. The latter seems reasonable to me. Here’s how to query the ID3 tags:

$ id3 -l Elmore_James-The_Sky_Is_Crying/01.Dust_My_Broom.mp3
Title  : Dust My Broom                   Artist: Elmore James
Album  : The Sky Is Crying               Year:     , Genre: Blues (0)
Comment:                                 Track: 1

Recording Sounds

One thing I sometimes want to do but often find quite strangely difficult is to simply record things. Sometimes there is a generic "sound recorder" utility ( "gnome-sound-recorder"?) present on some distros. If that works great. For quick command line work also think about rec from the sox package. I think the serious tools are Ardour and Audacity. Ardour seems great but I couldn’t get it to work.

I used Audacity to record all of my videos.

That leaves Audacity which also seems great. The classic problem is that there is OSS/ALSA/JACK/PA all fighting and emulating each other - a big mess. With Audacity you can go to Edit-→Devices and skip a lot of that.

One of the things I need to do is to records from my Boss GT10 Guitar Effects Processor. This capital piece of hardware has a USB port on it which presents itself to computers as would a USB sound device. So in Audacity I can just set the Edit-→Devices-→Recording-→Device to "GT-10: USB Audio (hw:1,0)". I can leave the Edit-→Devices-→Playback at "Default" and recording will be done from the guitar and playback will go to the headphones on the computer. What’s annoying and seems intractable is that I can’t monitor the guitar input in the headphones. If I figure that out I’ll note it but right now I can just click the "Recording Level" meter to monitor it graphically - but it’s not audible until recorded. The answer is to use headphones on the GT-10. This is ok since it will remove latency in the bargain. Then use headphones on the computer to verify the recording is ok and to edit it.

Some tricks I use…
  • Set a loop of guitar noise to practice recording with.

  • Use "r" to start recording and "space" to stop.

  • Lots of effects to play with in Audacity.

Using A Computer As A Remote Listening Device

Imagine that you have two computers that can ping each other on some network (could be the wild internet, could be a home LAN). There is the computer you’re sitting at which we’ll call "home" and the one that’s not near you which we’ll call "away". Now imagine that you want to listen to what’s going on around "away" while sitting at "home".

Start by telling the "home" computer that you want it to wait for a network connection (which presumably will be from "away") and when it makes the connection to pipe it to something that will send the data to the speakers for you to hear. You need to choose a port, here I’ll use 7777. You need sudo to use ports lower than 1000.

[home]$ nc -l -p 7777 | aplay -

That should appear to do pretty much nothing, but it’s actually waiting for a network connection from somewhere.

Next we log in to "away" and fire up the microphone and send it to home (substitute the name of your host in the home position).

[away]$ rec -t raw -b 16 -e signed -c1 -r 44100 - | nc -v home 7777

The 7777 is the port number and can be whatever you like that is not being used or blocked by a firewall. The -b is for bits (per sample?). The -c1 is for mono channel. The -r 44100 is sampling rate. Another simpler setup might be -t raw -b8 -c1 -r8k - which are pretty modest and designed to keep things simple for efficient transfer of basic noises, especially talking. But maybe they don’t quite work. Moving a recording of a violin concerto over a network is a different problem.

I found that the sound quality was kind of scratchy. I made that better by recording some very quiet time on "away" and using that to generate a noise reduction profile that could then be used to filter the recording process.

[away]$ rec -t wav - > noise-profile-silent-sample.wav
[away]$ sox noise-profile-silent-sample.wav -n trim 0 1.5 noiseprof away.noise-profile
[away]$ rec -t raw -b8 -c1 -r8k - noiseprof away.noise-profile | nc home 7777


Same deal. To make a live walkie-talkie, setting up a receiver and then a transmitter on each unit should be possible. First, if there are problems with undetected devices, make sure this environment variable is set everywhere.

[unitA] export AUDIODEV=hw:1,0
[unitB] export AUDIODEV=hw:1,0

You can double check what value it should be by having a look at the output of arecord -l. Note that the first number is the one shown in card X: ... and the second one is the one shown in Sub device #Y: ....

Next set up a listener on each unit. Note the different ports.

[unitA]$ nc -T critical -u -l -p 12334 | play -q --buffer 128 -t raw -b16 -e signed -c1 -r44100 -
[unitB]$ nc -T critical -u -l -p 12333 | play -q --buffer 128 -t raw -b16 -e signed -c1 -r44100 -

Then set up a transmitter.

[unitA]$ rec -q --buffer 128 -t raw -b 16 -e signed -c1 -r 44100 - | nc -T critical -u $UNITB 12333
[unitB]$ rec -q --buffer 128 -t raw -b 16 -e signed -c1 -r 44100 - | nc -T critical -u $UNITA 12334

The -T option specifies what type of connection and we’re trying to give the system a hint that this is a real-time critical activity. Also the low buffer setting will (hopefully) keep choppiness down and force the system to send what it has immediately.

If you need to keep a full recording of the conversation it’s easy to do by putting a tee in the pipeline.

[unitB]$ nc -T critical -u -l -p 12333 | tee /tmp/chat.raw | play -q --buffer 128 -t raw -b16 -e signed -c1 -r44100 -

This didn’t slow things down very much as far as I could tell, even to an microSD card from a Raspberry Pi Zero W. Still, it might be good to not have both sides of the conversation saved on one unit if the units are lightweight. I haven’t yet worked out how to merge the two halves of the conversation. Presumably there will be some timing registration error. Maybe some other format other than raw will be needed so that timing information can be encoded.

Ubuntu’s Problem With Multiple Sound Users

It can happen that everything works fine in controlled testing but while deploying in a live situation, nothing works. It used to be that if some user was logged in and playing sound (or had a Flash thing going on in a browser) that other logged in users couldn’t access the sound device. Apparently this was cured and was only a problem for people who had legacy accounts which were members of the audio group. You can cure it by making sure no one is in the audio group:

$ sudo gpasswd -d xed audio

Counter intuitive, but seems to work. Fixing this will allow this remote listening strategy to work even if someone else is logged onto "away" and busy watching videos or something.

What’s With Those Damn .m4a Files?

Ok, these audio files are not cool. They often don’t work with mp3 infrastructure. Here’s how to convert them to mp3s:

ffmpeg -y -i ./BWV0666.m4a -ab 192000 -ac 2 ./BWV0666.mp3

I don’t really know if there’s a serious performance hit, but the resulting file sizes from the set I did were smaller. Probably just saved some space without any noticable loss of quality.

Troubleshooting ALSA

Here is a nice description of all the overly complex details of ALSA. Things like what does hw:0,1 mean? (Answer "first device on the second soundcard/device".)

Also to get useful information, try aplay -l or aplay -L or cat /proc/asound/cards

I tend to have problems getting mpg123 to work since that’s the player I tend to use. The trick is to do something like this:

mpg123 -oalsa -C Jimi_Hendrix-Red_House.mp3

Which option argument should you use after the -o? Try the --list-modules option to see. In the same style look at --test-cpu but pretty much everyone should have fancy decoding tricks in their CPUs by now.

Note that mpg123 can easily run into problems, but they are generally solvable. I often use this now.

alias mp3='mpg123 -o pulse --control'

Troubleshooting PulseAudio

Problem with the wrong mic or speakers or they’re just not working? Check out pavucontrol.

If you have an application that looks for /dev/dsp and doesn’t find it, check out the man page for padsp, a wrapper that sets the the LD_PRELOAD variable to be /usr/lib/x86_64-linux-gnu/pulseaudio/ and things magically start working.

Interesting Sound/Music Applications

To rip audio CDs from the command line (sometimes the only way I can figure out how to play the music on them) use abcde. This program works very well but note that it defaults to ogg. If you need mp3 for some player’s compatibility, try the -o mp3 option. This may want eyed3 and lame as dependency packages. To clean up the abominable file names that come from CDDB, try apt install detox. Run with detox -vr dir_filled_w_mp3s. Also -n for detox dry-run if you want to preview. Don’t confuse the program abcde with the following.

ABC notation is a way to comprehensively express music in ASCII. For details, check out this simple ABC primer or the full standard.

Standard Notation in XML It seems this is being actively worked on. The most standard of open standards, good to finally see some modern attention to the problem.

Linux Multimedia Studio "LMMS is a free cross-platform software which allows you to produce music with your computer. This covers creating melodies and beats, synthesizing and mixing sounds and arranging samples. You can have fun with your MIDI keyboard and much more - all in a user-friendly and modern interface. Furthermore LMMS comes with many ready-to-use instrument and effect plugins, presets and samples."

ZynAddSubFX A software synthesizer that allows one to create custom wave forms. Nice Demo here.

ReZound This is a very awesome sound editor. You can do all kinds of crazy things with this and I can only understand 1% of it. I have used it to crop sound files that had undesirable stuff at the ends, and isolate certain sounds in a longer sound file. I also managed to get it to slow down a song by 50% while bringing up the pitch by an octave so that I could try to figure out some guitar part. Unfortunately that produced a seg fault on my system when I tried to do too much at once. Probably my machine doesn’t have enough memory for this kind of thing. Still a great piece of software. Note that the right mouse button is used to define the right selection boundary. This can be a bit non-standard.

playitslowly This is a simple but quite functional program that basically plays songs at different speed while adjusting the pitch to remain constant. I have used this and it works great (contrast to my experience with rezound). It is ideal for figuring out lyrics, guitar riffs, and drum parts, etc.

SooperLooper "SooperLooper is a live looping sampler capable of immediate loop recording, overdubbing, multiplying, reversing and more. It allows for multiple simultaneous multi-channel loops limited only by your computer’s available memory." Uses JACK.

Hydrogen Drum "Hydrogen is an advanced drum machine for GNU/Linux. It’s main goal is to bring professional yet simple and intuitive pattern-based drum programming."

Freewheeling "Freewheeling allows us to build repetitive grooves by sampling and directing loops…"

kluppe "kluppe is a loop-player and recorder, designed for live use."

qtractor "Qtractor is an Audio/MIDI multi-track sequencer application written in C++ with the Qt4 framework. Target platform is Linux, where the Jack Audio Connection Kit (JACK) for audio, and the Advanced Linux Sound Architecture (ALSA) for MIDI, are the main infrastructures to evolve as a fairly-featured Linux desktop audio workstation GUI, specially dedicated to the personal home-studio."

Ardour "Digital Audio Workstation. record mix edit collaborate."

BEAST Stands for "BEtter Audio SysTem" and it comes with BSE, "Better Audio Engine" but it’s really an audio synthesizer and multi-track editor.

GTick "Tick is a metronome application written for GNU/Linux and other UN*X-like operting systems supporting different meters (Even, 2/4, 3/4, 4/4 and more) and speeds ranging from 10 to 1000 bpm. It utilizes GTK+ and OSS (ALSA compatible). It is part of the GNU Project."

Rosegarden "Rosegarden is a well-rounded audio and MIDI sequencer, score editor, and general-purpose music composition and editing environment. Rosegarden is an easy-to-learn, attractive application that runs on Linux, ideal for composers, musicians, music students, and small studio or home recording environments."

AlgoScore "AlgoScore is a graphical environment for algorithmic composition, where music is constructed directly in an interactive graphical score. The result is output as audio (through CSound), arbitrary control data (through JACK ports) for control of other applications, MIDI through JACK or to file, or OpenSoundControl messages. The generated audio can be played back through JACK or exported to an audiofile."