Introduction

SinusoidalRegressions.jl provides a set of functions for conveniently fitting noisy data to a variety of sinusoidal models, with or without initial estimates of the parameters.

Features

This package supports five sinusoidal models:

  • 3-parameter sinusoidal, where the frequency $f$ is known exactly:

    $s_s(x; \textrm{DC}, I, Q) = \textrm{DC} + Q\sin(2πfx) + I\cos(2πfx)$

  • 4-parameter sinusoidal, where the frequency $f$ is unknown:

    $s_s(x; f, \textrm{DC}, I, Q) = \textrm{DC} + Q\sin(2πfx) + I\cos(2πfx)$

  • Mixed linear-sinusoidal:

    $s_{mls}(x; f, \textrm{DC}, I, Q, m) = \textrm{DC} + mx + Q\sin(2πfx) + I\cos(2πfx)$

  • (Not yet implemented) General sinusoidal with unknown coefficients $λ_1, λ_2, \ldots, λ_m$, and known functions $g_1(x), g_2(x), \ldots, g_m(x)$:

    $s_g(x; f, I, Q, λ_1, \ldots, λ_n) = Q\sin(2πfx) + I\cos(2πfx) + λ_1g_1(x) + \ldots + λ_mg_m(x)$

  • (Not yet implemented) Damped sinusoidal:

    $s_d(x; f, I, Q, α) = \exp(αx) ( Q\sin(2πfx) + I\cos(2πfx) )$

A note on the frequency

The frequency $f$ is assumed to be expressed in hertz throughout.

Notation

In the model $s(x; θ_1, θ_2, \ldots, θ_n)$, $x$ is the independent variable and $θ_i$, $i = 1, \ldots, n,$ are the unkown parameters.

Four types of sinusoidal fitting algorithms are provided:

  • IEEE 1057 3-parameter and 4-parameter algorithms[1]. These are able to fit only the sinusoidal model $s_s(x)$. The 4-parameter version requires a very close estimate of the frequency $f$ and dense sampling of several periods of the sinusoid.

  • The algorithms proposed by J. Jacquelin, based on the idea of finding an integral equation whose solution is the desired sinusoidal model[2]. This integral equation can be solved using linear least-squares. These algorithms have several benefits:

    • They are non-iterative.
    • They do not require initial estimates of any of the model's parameters.
    • They often perform quite well, but may also be used to calculate a good initial guess for the more powerful non-linear methods described below.
  • The non-linear fitting function curve_fit from the package LsqFit. This function uses the Levenberg-Marquardt algorithm[3][4][5] and is quite powerful, but it requires an initial estimate of the parameters. SinusoidalRegressions.jl "wraps" curve_fit, automatically defining the correct model and calculating initial parameter estimates (using IEEE 1057 or Jacquelin's algorithms, as appropriate) if none are provided by the user.

  • The algorithm proposed by Liang et al[6], designed for the case where only a fraction of a period of a sinusoid is sampled.

Installation

This package is in Julia's general registry. It can be installed by running

import Pkg; Pkg.add("SinusoidalRegressions")

Roadmap and Contributions

The following features are on the roadmap towards version 1.0:

  • Implement Jacquelin's algorithms for damped and general sinusoids.
  • Enhance the front-end to curve_fit by allowing some paramters to be declared as known a priori, and fitting only the remaining parameters (some progress done in v0.2).
  • Fine-tune the API (v0.2 should be close to final).
  • Implement other algorithms and add support for other models.
  • Improve tests.

Contributions, feature requests, bug reports and suggestions are welcome; please use the issue tracker on github, or open a discussion on the Julia Discourse forum or on Zulip.

Tutorial

Fitting experimental data

Fitting data using this package requires three steps:

  1. Define a regression problem. The problem encapsulates all known data: the sampling points and data at a minimum, and possibly also the frequency. Initial estimates and bounds are also part of the problem definition.
  2. Specify and possibly configure an algorithm to solve the problem.
  3. Run the sinfit function on the problem with the chosen algorithm. This function returns the estimated parameters in a struct of the appropriate type.

After the data is fit, it may be easily plotted, and its RMSE and MAE may be calculated.

Example 1

As a first example, let us fit noisy data to a mixed linear-sinusoidal model using Jacquelin's integral equation algorithm. The model is

$s(x ; f, \textrm{DC}, Q, I, m) = \textrm{DC} + mx + Q\sin(2πfx) + I\cos(2πfx)$

with parameters $f$ (the frequency is unknown), $\textrm{DC}$, $Q$, $I$ and $m$.

Our tasks are: define an "exact" model and generate noisy samples from it, fit the noisy samples, determine the error (a measure of the difference between the exact model and the fit), and plot the fit. First we need to learn about defining a model.

Model types

This package includes several types used to store the parameters of the different supported models. All model types are a subtype of the abstract type SRModel.

  • SinModel for representing sinusoidal models $s_s(x)$.
  • MixedLinSinModel for representing mixed linear-sinusoidal models $s_{mls}(x)$.
  • GenSinModel for representing general sinusoidal models $s_g(x)$ (not yet implemented).
  • DampedSinModel for representing damped sinusoidal models $s_d(x)$ (not yet implemented).

Instances of these types are also function types (functors), making it easy to evaluate the model at any point (or collection of points). A plot recipe is also provided. We will use both of these features below.

Since we're assuming a mixed linear-sinusoidal model, we'll use the type MixedLinSinModel.

Generating the exact and noisy data

We will first generate the "exact" or "true" data, and then add noise.

The exact parameters in this example are $f = 4.1$, $\textrm{DC} = 0.2$, $m = 1$, $Q = -0.75$, and $I = 0.8$. We define the model as follows:

using SinusoidalRegressions
param_exact = MixedLinSinModel(f = 4.1, DC = 0.2, m = 1, Q = -0.75, I = 0.8)
Mixed Linear-Sinusoidal model MixedLinSinModel{Float64}:
  Frequency (Hz)       : 4.1
  DC                   : 0.2
  Sine amplitude (Q)   : -0.75
  Cosine amplitude (I) : 0.8
  Linear term (m)      : 1.0

We now genereate 40 exact data points with a sampling rate of 40 Hz.

x = range(0, length = 40, step = 1/40)
data_exact = param_exact(x)

And finally, we add gaussian noise to the data:

using Random: seed!
seed!(6778899)

data = data_exact .+ 0.2*randn(40)  # noisy data

Alternatively, the data could have been generated directly, as in:

data_exact = 0.2 .+ m.*x .- 0.75*sin.(2*pi*4.1*x) .+ 0.8*cos.(2*pi*4.1*x)

Now we need to define the problem.

Defining the problem

There are several types used to define the different problems that this package can handle. All are subtypes of the abstract type SRProblem.

  • Sin3Problem to specify a 3-parameter sinusoidal problem (frequency is known).
  • Sin4Problem to specify a 4-parameter sinusoidal problem (frequency is unknown).
  • MixedLinSin4Problem for a mixed linear-sinusoidal problem with 4 parameters (the frequency is known).
  • MixedLinSin5Problem for a mixed linear-sinusoidal problem with 5 parameters (the frequency is unknown).

In our example, the frequency is unknown, which means MixedLinSin5Problem is the appropriate type. The problem is simply defined as follows:

problem = MixedLinSin5Problem(x, data)
5-Parameter Mixed Linear-Sinusoidal Problem MixedLinSin5Problem:
  X                    : StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64} with 40 elements
  Y                    : Vector{Float64} with 40 elements
Parameter estimates:
  Frequency (Hz)       : missing
  DC                   : missing
  Sine amplitude (Q)   : missing
  Cosine amplitude (I) : missing
  Linear term (m)      : missing
  Lower bounds         : missing
  Upper bounds         : missing

All parameters are missing because, in this example, we assume we know nothing about the underlying model beyond the assumption that it is mixed linear-sinusoid; all we have is the data samples.

Specifying an algorithm

Now, we need to choose and instantiate a solution algorithm, from the following list. All algorithms are subytpes of the abstract type SRAlgorithm.

Some of these algorithms can take configuration parameters. In our case, we want to use the integral equations method, which does not require any configuration:

algorithm = IntegralEquations()
IntegralEquations()

Fitting and error measurement

We are ready to calculate the fit:

fit = sinfit(problem, algorithm)
Mixed Linear-Sinusoidal model MixedLinSinModel{Float64}:
  Frequency (Hz)       : 4.007299758712902
  DC                   : 0.272063407279927
  Sine amplitude (Q)   : -0.9174404283779977
  Cosine amplitude (I) : 0.579524744987893
  Linear term (m)      : 0.9597992006642564

Note that sinfit returns a value of type MixedLinSinModel, which contains the estimated parameters in its fields.

We calculate the fit's root mean-square error and mean absolute error:

rmse(fit, param_exact, x)
0.1330860129428344
mae(fit, param_exact, x)
0.10876438830411586

Plotting

We can plot the measured data along with the fit:

using Plots
plot(x, data, fit)

Improving the fit with non-linear least squares

We may try to improve upon this fit by using Levenberg-Marquardt with the parameters estimated above as initial parameters. We can do so as follows:

(; f, DC, Q, I, m) = fit # transfer the fit to a new problem instance
problem_levmar = MixedLinSin5Problem(x, data; f, DC, Q, I, m)
5-Parameter Mixed Linear-Sinusoidal Problem MixedLinSin5Problem:
  X                    : StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64} with 40 elements
  Y                    : Vector{Float64} with 40 elements
Parameter estimates:
  Frequency (Hz)       : 4.007299758712902
  DC                   : 0.272063407279927
  Sine amplitude (Q)   : -0.9174404283779977
  Cosine amplitude (I) : 0.579524744987893
  Linear term (m)      : 0.9597992006642564
  Lower bounds         : missing
  Upper bounds         : missing

Note that now the problem includes initial estimates of the paramaters (except for the bounds). If these are not provided, then sinfit (when using LevMar()) would have used the integral equations algorithm to calculate them automatically. It is also possible to give estimates of only some of the paramaters.

Now we can fit the data:

fit_levmar = sinfit(problem_levmar, LevMar())
Mixed Linear-Sinusoidal model MixedLinSinModel{Float64}:
  Frequency (Hz)       : 4.04178681088101
  DC                   : 0.2615232845594382
  Sine amplitude (Q)   : -0.8538423033723734
  Cosine amplitude (I) : 0.6713713094475814
  Linear term (m)      : 0.9669430564245449

We can re-evaluate the error:

rmse(fit_levmar, param_exact, x)
0.08911610415562218
mae(fit_levmar, param_exact, x)
0.07433163202382657

As expected, this second fit has a smaller error. Plotting the data, the fit, and the exact function for comparison:

plot(x, data, fit_levmar, exact = param_exact)

Example 2: A more difficult scenario

Let us fit the same model as above, with fewer data points, more noise and non-equidistant samples.

x = sort(rand(20))
data_exact = param_exact(x)
data = data_exact .+ 0.3*randn(20)
problem = MixedLinSin5Problem(x, data)
fit = sinfit(problem, IntegralEquations())
Mixed Linear-Sinusoidal model MixedLinSinModel{Float64}:
  Frequency (Hz)       : 3.729391703172656
  DC                   : 0.7576672898795151
  Sine amplitude (Q)   : -0.8971422293277231
  Cosine amplitude (I) : -0.31594358703581404
  Linear term (m)      : 0.22201333974105733

Error for Jacquelin's algorithm:

rmse(fit, param_exact, x)
0.42852062995159534
mae(fit, param_exact, x)
0.3619237685526934

As expected, we see larger errors than in the more benign scenario above. Let us improve the fit using non-linear least-squares. Note that the Levenberg-Marqvardt algorithm requires initial estimates of the parameters, which we have not provided. In this case, sinfit will calculate the initial estimates "behind the scenes" using IntegralEquations().

fit_nls = sinfit(problem, LevMar())
Mixed Linear-Sinusoidal model MixedLinSinModel{Float64}:
  Frequency (Hz)       : 4.048762854758658
  DC                   : 0.36048744956040524
  Sine amplitude (Q)   : -0.8585200544595646
  Cosine amplitude (I) : 0.6046841485972138
  Linear term (m)      : 0.8732104974593492

Error for LsqFit.curve_fit:

rmse(fit_nls, param_exact, x)
0.10271143790817262
mae(fit_nls, param_exact, x)
0.08120521802966725

We see that LsqFit.curve_fit was able to improve the fit significantly. Plotting the data, the fit and the exact function:

plot(x, data, fit_nls, exact = param_exact)

References

  • 1IEEE, "IEEE Standard for Digitizing Waveform Recorders", 2018.
  • 2Jacquelin J., Régressions et Equations Intégrales, 2014, (online) https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales.
  • 3Levenberg, K., "A Method for the Solution of Certain Non-Linear Problems in Least Squares", Quarterly of Applied Mathematics, 2 (2), 1944. doi:10.1090/qam/10666.
  • 4Marquardt, D., "An Algorithm for Least-Squares Estimation of Nonlinear Parameters", SIAM Journal on Applied Mathematics, 11 (2), 1944. doi:10.1137/0111030.
  • 5LsqFit.jl User Manual, (online) https://julianlsolvers.github.io/LsqFit.jl/latest/
  • 6Liang et al, "Fitting Algorithm of Sine Wave with Partial Period Waveforms and Non-Uniform Sampling Based on Least-Square Method." Journal of Physics: Conference Series 1149.1 (2018)ProQuest. Web. 17 Apr. 2023.