# Cross Correlation

Cross correlation presents a technique for comparing two time series and finding objectively how they match up with each other, and in particular where the best match occurs. It can also reveal any periodicities in the data.  The technique takes the two time series and lines them up with each other as follows:

lag 0

``````ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz``````

A correlation coefficient is computed to see how well one series predicts the values in the other. Then the series are shifted and the process repeated:

lag 1

`````` ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz``````

lag -1

``````ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
``````

lag 2

``````  ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz``````

lag -2

``````ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz``````

The lag refers to how far the series are offset, and its sign determines which series is shifted. Note that as the lag increases, the number of possible matches decreases because the series “hang out” at the ends and do not overlap. The value of the lag with the highest correlation coefficient represents the best fit between the two series.  The lag times the sampling interval gives the duration by which one series leads or trails the other--how long it takes the effect to propagate from one variable to the other.  If you have hourly data and the best lag in 12, the time difference between the two series is 12 hours.

You can plot the correlation coefficients versus lag to look for periodicities in the original time series. If the data is periodic, there will be an oscillation in the correlation coefficients with lag. They will be positive and have large values when the two series are in phase, and negative with large values when the two series are out of phase (peaks aligned with troughs).

 Correlogram (correlation coefficient versus lag) for two series. Note that even the best correlation is less than 0.5, and that the pattern is not symmetrical.

 A blowup of the diagram above, to show that the best correlation does not occur at 0 lag, so that if the two phenomena are related, there is a time delay to the effect. The pattern is also not symmetrical with positive and negative lags.

Some series (tides, or annual climate) can be strongly periodic.

Some variables will have very strong correlations, and others (such as many things dealing with human behavior or noisy natural phenomena) will have very weak correlations but still be significant.

Auto correlation is a special case of cross correlation with only one time series. It is used only to look for periodicity in the data set.

Directions for cross correlation.

Last revised 6/19/2015