Just Stuff

Just another WordPress.com weblog

Archive for May 2009

Draft Paper for Citizen Scientist

leave a comment »

Auto-Correlation of Rectified Signal re DEMON spectral Analysis

(Slightly modified version to be published on Citizen Scientist in Feb of March 2010)

It has been pointed out (reference 1) that the speed of a domino wave can be estimated from peaks in the auto-correlation of the rectified recording of the noise of the domino array collapse (note here I am ignoring filtering etc which is useful with the auto-correlation method as it is also with the DEMON method (references 2 and 3). There can be no argument with this in principle since for periodic phenomenon the auto-correlation and spectrum of the rectified signal both encode the frequency and/or period of the phenomenon. The auto-correlation will contain a series of peaks at intervals corresponding to the period of the phenomenon (and the time delay of the for  such peak will be at a value corresponding to the period), while the DEMON spectrum will have a peak at a frequency corresponding to the frequency (1 over the period) of the phenomenon.

Time Domain Auto-Correlation Complexity

One option for computing the auto-correlation of a data vector is to do the calculations in the time domain. The pseudo-code below shows how this could be done.

————————————————————————————————–
Psuedo-Code for Time Domain Auto-Correlation:
// Assume input data (real) is in array(1..Npoints) and
// output data is in array ss(1..Ndelays) with the first
// point corresponding to zero delay

ss=zeros(1,Ndelays)
for delay=0 to Ndelays-1
  for idx=delay+1 to Npoints
    ss(delay+1)=ss(delay+1)+data(idx)*data(idx+delay);
  endfor
endfor
-----------------------------------------------------------

This allows us to compute the number of floating point add and multiply operations needed for this time domain auto-correlation, which is:

N_{op}=2 N_{points} ~N_{delays}+N_{delays}^2\ \ \ \ \ ... (Eq 1)

Now if as is often the case N_{points} \gg N_{delays}, then we can write:

N_{op} \approx 2 N_{points}~N_{delays}\ \ \ \ \ ... (Eq 1a)

Note: loop control variable increments and array index arithmetic operations are not counted as these are generally much faster than the (usually) double precision floating point operations.

Frequency Domain Auto-Correlation Complexity

The main alternative to the above time domain method for computing the auto-correlation is to use the convolution theorem to convert the auto correlation calculation from what is a discrete convolution in the time domain to a point wise product in the frequency domain, and then transforming back into the time domain. To do the conversion from time to frequency domain we will be using the Fast Fourier Transform (FFT) (see reference 4 for some general background for the FFT), which is an efficient implementation of the Discrete Fourier Transform (DFT). This is the Fourier transform for a discrete periodic signal,  so to avoid wrap around effects we need to pad the data with sufficient zeros so the wrap around does not effect the part of the result we are interested in (that is the FD auto-correlation we will compute using the DFT will be a circular auto-correlation). This means that if we want N_{delays} points of the auto-correlation we will need to pad our data vector with N_{delays} zeros giving an effective data vector length of M=N_{points}+N_{delays}.

Now the equivalent of the convolution theorem for correlations tells us that:

(data \star data)_i =\sum_j \overline{data}_j ~data_{i+j}=\left[\mathfrak{F}^{-1}\left( \mathfrak{F}(data).\overline{\mathfrak{F}(data)}\right)\right]_i

Where \star denotes the (circular) correlation operator (not to be confused with the convolution operator *), \mathfrak{F}^{-1}  the inverse DFT operator, \mathfrak{F} the DFT operator, “.” the pointwise product operator,i is the correlation delay which starts at 0 and index arithmetic is modulo the length of data. For convenience it is best to regard the indicies to follow the C/C++ convention and start at 0 and run up to M-1, where M is the length of data, rather than form 1 to M.

That this works can be demonstrated on zero mean random complex simulated data (2048 points 0-19 sample delays), the absolute value of the difference between the time and frequency domain results are shown in figure 1 below, where we can see that the differences are of the order of 10^{-14} (all calculations done in double precision), which is pretty good agreement.

Figure 1: Plot of the Absolute Value of the Time and Frequency Domain Auto-Correlations for a Random Complex Data Set

Figure 1: Plot of the Absolute Value of the Time and Frequency Domain Auto-Correlations for a Random Complex Data Set

The operations count for the frequency domain auto-correlation computation are a bit are difficult to compute than for the time domain equivalent. This is mainly because we will be using the FFT to compute the DFT’s and these all differ in efficiency. Other things being equal and a good FFT code used the number of operations to compute a FFT is:

N_{op} \approx k N \log_2 (N)

where N is the length of the transform, and k is a constant of the order of 10. for a complex transform and half that for a real transform.

So if we wish to compute N_{delays} points of the auto-correlation of a signal of length N_{points} we will need two FFT’s of length N_{delays}+N_{points} (one forward transform and one inverse transform) and an additional N_{delays}+N_{points} complex multiplies, which is equivalent to three times as many real multiplies and one real add but half of these can be eliminated if the original signal is real because of the symmetries in the FFT of a real signal.

So we have (to a handwaving approximation for the coefficient and a conservative allowance for the number of multiples) for a frequency domain auto correlation the floating point operations count is:

N_{op} \approx (N_{delays}+N_{points})\left[10\log_2(N_{delays}+N_{points})+4\right] \ ... (Eq 2)

Comparison Of Operations Counts for Time and Frequency Domain Auto-Correlation

We may plot the operations counts corresponding to equations 1 and 2, when we get the plot in figure 2, where we see a crossover where the frequency domain auto-correlation becomes more efficient than the time domain algorithm somewhere in the region of 50 delays (this is for a data sample of 30000 points, a value typical of a recording of a domino wave of duration slightly over 1 second). As for domino wave we know that we are looking for impact frequencies down to something between 15 and 20 Hz for a sampling frequency of the order of 20 kHz we will need to look at auto-correlation delays of up to something over 1000 sample intervals. In which case the frequency domain auto-correlation is significantly more efficient.

Figure 2: Plot comparing the Floating Point Operations Counts for Time and Frequency Domain Auto-Correlations as a Function of the Number of Delays Required for a Signal with 30000 Sample Points

Figure 2: Plot Comparing the Floating Point Operations Counts for Time and Frequency Domain Auto-Correlations as a Function of the Number of Delays Required for a Signal with 30000 Sample Points

That this is not just a theoretical result can be seen from timings for the two auto-correlations produced for random complex signals shown in figure 3. Here the crossover is at something around 20 sample interval delays. The exact point of the cut off depends on the relative efficiency of the implementations of the two auto-correlation algorithms, and here the time domain auto-correlation is relatively inefficient (this is coded in Euler, and so is partially interpreted code while the FD algorithm is almost entirely library calls to compiled code)
Figure 3:

Figure 3: Plot of Timings for TD and FD Auto-Correlations for a 30000 (complex) Random Signal.

Also of note in Figure 3 is the variation in the timing of the FD Auto-Correlation, this is because a mixed radix FFT is in use and its efficiency varies somewhat with the length of the FFT.

Complexity of DEMON analysis

In the previous section we have seen that if we wish to analyse a signal using the auto-correlation of the rectified signal we will use a frequency domain implementation of the auto-correlation. This is more or less equivalent in operations count to two FFT’s of length equal to the signal length plus the number of points in the auto correlation we require. However when analysing a recording of a domino collapse wave we can just as well look for the frequency of the peak in the DEMON spectrum (the spectrum of the absolute value of the signal) (references 2 and 3). This involves just one FFT so it requires no more than half the operations that looking at the auto-correlation requires.

This difference is largely academic if we are doing a one-off analysis but will become significant when a large number of analyses are to be performed (and in an embeded system analysing signals in real time even more so).

Refences

1. Hannon J., “Domino Speed”, Citizen Scientist, May 2009,  http://www.sas.org/tcs/weeklyIssues_2009/2009-05-01/project1/index.html

2. Larham R., Measuring the Speed of the Domino Effect Using the Windows Sound Recorder, Part 1, Citizen Scientist, Nov 2007,  http://www.sas.org/tcs/weeklyIssues_2007/2007-11-02/project2/index.html

3. Larham R., Measuring the Speed of the Domino Effect Using the Windows Sound Recorder, Part 2, Citizen Scientist, Dec 2007,  http://www.sas.org/tcs/weeklyIssues_2007/2007-12-07/project2/index.html

4. Press W. H, Teukolsky S. A. , et al. “Numerical Recipes in C” (or any other of the series), CUP, 2-nd ed., 1992.

Advertisements

Written by CaptainBlack

May 9, 2009 at 13:03

Posted in DSP

Tagged with ,