The Analysis & Resynthesis Sound Spectrograph : Basic Operation

Basic Operation

versions concerned : 0.2 and above

I. Launch program, load and save files

As of now, there are only three ways to load and save files with the ARSS.

a. Double-click the program

If you simply double-click the program, you'll be asked for a file to load from, and another file to save to. If the file is located in the same directory as the program, just type its name, with the extension included (i.e. you would enter 'sound.wav' and not just 'sound'). If your file isn't in the same directory as the program, you can also type a full path. On Windows that would be something like C:\Documents and Settings\Desktop\sound.wav as on Unix systems (such as Linux, Mac OS X, BSD and BeOS) that would be /home/user/sound.wav. Note that if you don't write the output file's name with an extension, the program will add the appropriate one to it.

If the file to load cannot be found, the program will ask you for its name/path again.

b. Drag and drop

Since version 0.2 you can load a file simply by dragging its icon onto the program's icon (while it's not running). The file can be located anywhere on your system. You will then be asked for the name of the output file. Unless you type a full path (i.e. if you type a simple file name), the output file will be saved in the current working directory.

Note : As of version 0.2, due to a bug in the Windows version you need to type the full path for the output file or find it in C:\windows\system32. Fortunately this can be fixed by writing a script under the form of a .bat file.

c. From the command line

The ARSS can be launched from the command line without any arguments, with only the input file as an argument or with both input and output file names.

Without any arguments, the program behaves in the same way as it does when launched by being double clicked, with the exception that simple file names refer to files in the current working directory instead of the directory the program is in.

With only the input file specified (for example by typing 'arss sound.wav'), the program acts the same way as with drag and drop. With both file names or paths specified (i.e. 'arss input.wav output.bmp'), the program will directly proceed to the next step.

Note : If you get an error about the output file, it means the file already exists and cannot be saved to because it's already opened by another program.

II. Mode selection

There are as of now three available processing modes in the ARSS. The first one is analysis, the process by which a sound is turned into an image (called a spectogram), and is automatically selected by the program if it detects that the input file is a sound and the output file is an image.

The two other processing modes are synthesis modes. They turn images into sounds. Sine and noise synthesis differ slightly in the sound they produce because of what type of sound they are based on. Whereas sine synthesis consists in mixing together a bunch of modulated sines, which is what anyone would think of as pure tones, noise synthesis replaces these pure tones with band-pass filtered noise. The first auditory difference between the two is that if you synthesise from a plain white image, with sine synthesis it will sound as if someone just sat on a church organ's keyboard, as with noise synthesis it will sound like pink noise (which sounds like white noise except with less treble and more bass).

Another difference can be pictured as follows. Picture an image, let's say a spectrogram, that we would vertically enlarge using two different methods. The first method would simply consist in smoothly enlarging in, in the same way you'd expect any decent image editing program to do. That would be noise synthesis. Now, imagine that instead of doing that, we cut the image into the thinest possible horizontal slices and that we simply put them further apart from each other, on a black background. That would be sine synthesis.

Now, that last method has an obvious issue, it cuts the continuity between image features unless they already are thin distinct and perfectly horizontal. And that's the main thing to remember about this method, it cuts the vertical continuity. You can hear the effects of this by synthesising from a diagonal line, or most noticably from noisy sound spectrograms, like drums. These need the vertical continuity of noise synthesis for the result to sound right.

Unfortunately, noise synthesis has its own flaw. Because of the relationship between a sound's spectrum and its envelope, this method makes envelopes of the sound it makes noisy, irregular. In simpler terms, it will make a piano, guitar note or human voice sound quite 'bubbly'.

That's why it's relatively important to think of all sounds as either sines or noises. The way to do it is simply to look at a spectrogram and consider anything that looks noisy and covers an area as made out of noise, and anything looking like thin continuous lines as made out sines. On the other hand, noise synthesis is by far the best general purpose synthesis mode, and as such it'll be most of the time the best choice of the two synthesis modes.

III. Processing parameters

The next step is the input of the processing parameters. They are the same regardless of the processing mode.

Note that you can for each of these parameters choose the value between brackets by not entering any value in the (default) interactive mode.

It is important to note that four of these parameters, image height, frequency resolution (bands per octave) and minimum and maximum frequency are interdependant, and because in synthesis mode the image height is set by the input image, you can only define two of the other three parameters. If you hesitate about which of these paramters to omit, choose maximum frequency.

Minimum and maximum frequency determine how low to how high the range of sound we care about goes. It's usually a good idea to let it cover the entire human hearing range, which ranges from 20 Hz to 20,000 Hz. Because there isn't much to hear under 30 Hz, and because A0 (also known as La 0) is right at 27.5 Hz, and because most adults hear little above 16 kHz, I usually make the lower frequency limit be 27.5 Hz and the upper limit be between 16,000 and 20,000 Hz.

The bands per octave parameter, or bpo for short, defines the frequency resolution, which can also be thought of as the image's vertical resolution.

It is crucial to note that a higher vertical resolution means a lower horizontal resolution. This has to do with the fact that the higher the frequency resolution, the smaller the bandwidth gets for each band, and that bandwidth equals half the band's envelope's resolution. Therefore the quality of the results in analysis mode depend a lot on this parameter, and the choice for this parameter depends a lot on the type of sound.

Basically, time resolution-demanding sounds like drums, explosions and such are best reproduced with the bpo parameter set between 2 and 6. Human voice, most types of music and high-pitched sounds require between 12 and 60 bpo. However feel free to experiment with various other settings.

Bands per octave means how many horizontal pixel lines there will be in an octave. Note that there are roughly 9.5 octaves in our hearing range. Also, because there are 12 semi-tones in an octave, it's best when dealing with music to choose multiples of 12. Used with a minimum frequency set of 27.5 Hz (or a power of 2 of it) it guarantees a neatly pitched reproduction of music notes (provided nothing is out of tune to begin with).

The pixels per seconds (pps for short) parameter is independent from any other. It defines the time resolution (independently from the aforementioned limitations) and thus the width of your image. Whereas you might expect to need about 1,000 pps, it turns out you actually only need about 100. 30 pps is quite the lower limit of usability, and you'll rarely need as much as 300 pps, no matter what you're doing. Besides, if you want to analyse an entire 4 minute-long song, keep in mind that even at only 100 pps the resulting image will be 24,000 pixels wide, which is about 20 times larger than your average PC screen.

You will meet another parameter in synthesis mode which is the sampling rate. It's often best to leave it at its default value, but don't be afraid to change it for another value. If it's too low the program will simply ask you to set it above the lower limit it will give you.