Title: | Goodness-of-Fit Tests for Functional Data |
---|---|
Description: | Implementation of several goodness-of-fit tests for functional data. Currently, mostly related with the functional linear model with functional/scalar response and functional/scalar predictor. The package allows for the replication of the data applications considered in García-Portugués, Álvarez-Liébana, Álvarez-Pérez and González-Manteiga (2021) <doi:10.1111/sjos.12486>. |
Authors: | Eduardo García-Portugués [aut, cre] , Javier Álvarez-Liébana [aut] , Gonzalo Álvarez-Pérez [ctb], Manuel Febrero-Bande [ctb] |
Maintainer: | Eduardo García-Portugués <[email protected]> |
License: | GPL-3 |
Version: | 0.1.2 |
Built: | 2024-11-08 05:14:03 UTC |
Source: | https://github.com/egarpor/goffda |
Implementation of several goodness-of-fit tests for functional data. Currently, mostly related with the functional linear model with functional/scalar response and functional/scalar predictor. The package allows for the replication of the data applications considered in García-Portugués, Álvarez-Liébana, Álvarez-Pérez and González-Manteiga (2021) <doi:10.1111/sjos.12486>.
Eduardo García-Portugués and Javier Álvarez-Liébana.
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
García-Portugués, E., González-Manteiga, W. and Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3):761–778. doi:10.1080/10618600.2013.812519
Series of daily temperatures of 73 Spanish weather stations during the 40-year period 1974–2013.
aemet_temp
aemet_temp
A list with the following entries:
an fdata
with 2892 temperature
(in Celsius degrees) curves, discretized on 365 equispaced
grid points (days) on . Each curve corresponds to the
yearly records of a weather station.
a dataframe with metadata for each curve:
ind
: identifier of the weather station.
name
: name of the weather station.
year
: year of the observation.
For consistency with the fda.usc-package
's
aemet
dataset, the names and identifiers of the 73
weather stations are the same as in that dataset. Only a minor fix has been
applied to the "A CORUÑA/ALVEDRO" station, whose identifier was the same
as the "A CORUÑA" station, "1387"
. The former was set to
"1387A"
.
Information about the province, altitude, longitude, and latitude of
each weather station can be retrieved in df
from the
fda.usc-package
's aemet
dataset.
The dataset is a curated version of a larger database of 115 stations. It excludes stations with inconsistent records or that were relocated, closed, or opened during the 40-year period. There are 9 stations with missing years. The total of missing years is 28.
In leap years, the daily-average temperature is computed as the average of February 28th and 29th.
Original data processing scripts by Manuel Febrero-Bande and Manuel Oviedo de la Fuente. Adaptations by Eduardo García-Portugués.
The data was retrieved from the FTP of the
Meteorological State Agency of Spain
(AEMET) in 2014 using a processing script by the authors of the
fda.usc-package
.
Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical Computing in Functional Data Analysis: The R Package fda.usc. Journal of Statistical Software, 51(4):1–28. doi:10.18637/jss.v051.i04
## Data splitting # Load raw data data("aemet_temp") # Partition the dataset in the first and last 20 years with(aemet_temp, { ind_pred <- which((1974 <= df$year) & (df$year <= 1993)) ind_resp <- which((1994 <= df$year) & (df$year <= 2013)) aemet_temp_pred <<- list("df" = df[ind_pred, ], "temp" = temp[ind_pred]) aemet_temp_resp <<- list("df" = df[ind_resp, ], "temp" = temp[ind_resp]) }) # Average the temperature on each period mean_aemet <- function(x) { m <- tapply(X = 1:nrow(x$temp$data), INDEX = x$df$ind, FUN = function(i) colMeans(x$temp$data[i, , drop = FALSE], na.rm = TRUE)) x$temp$data <- do.call(rbind, m) return(x$temp) } # Build predictor and response fdatas aemet_temp_pred <- mean_aemet(aemet_temp_pred) aemet_temp_resp <- mean_aemet(aemet_temp_resp) # Plot old_par <- par(mfrow = c(1, 2)) plot(aemet_temp_pred) plot(aemet_temp_resp) par(old_par) # Average daily temperatures day_avg_pred <- func_mean(aemet_temp_pred) day_avg_resp <- func_mean(aemet_temp_resp) # Average yearly temperatures avg_year_pred <- rowMeans(aemet_temp_pred$data) avg_year_resp <- rowMeans(aemet_temp_resp$data) ## Test the linear model with functional response and predictor (comp_flmfr <- flm_test(X = aemet_temp_pred, Y = aemet_temp_resp, est_method = "fpcr_l1s")) beta0 <- diag(rep(1, length(aemet_temp_pred$argvals))) (simp_flmfr <- flm_test(X = aemet_temp_pred, Y = aemet_temp_resp, beta0 = beta0, est_method = "fpcr_l1s")) # Visualize estimation filled.contour(x = aemet_temp_pred$argvals, y = aemet_temp_resp$argvals, z = comp_flmfr$fit_flm$Beta_hat, color.palette = viridisLite::viridis, nlevels = 20) ## Test the linear model with scalar response and functional predictor (comp_flmsr <- flm_test(X = aemet_temp_pred, Y = avg_year_resp, est_method = "fpcr_l1s")) (simp_flmsr <- flm_test(X = aemet_temp_pred, Y = avg_year_resp, beta0 = 1 / 365, est_method = "fpcr_l1s")) # Visualize estimation plot(aemet_temp_pred$argvals, comp_flmsr$fit_flm$Beta_hat, type = "l", ylim = c(0, 30 / 365)) abline(h = 1 / 365, col = 2) ## Test the linear model with functional response and scalar predictor (comp_frsp <- flm_test(X = avg_year_pred, Y = aemet_temp_resp)) (simp_frsp <- flm_test(X = avg_year_pred, Y = aemet_temp_resp, beta0 = 1)) ## Test the linear model with scalar response and predictor (comp_srsp <- flm_test(X = avg_year_pred, Y = avg_year_resp)) (simp_srsp <- flm_test(X = avg_year_pred, Y = avg_year_resp, beta0 = 1))
## Data splitting # Load raw data data("aemet_temp") # Partition the dataset in the first and last 20 years with(aemet_temp, { ind_pred <- which((1974 <= df$year) & (df$year <= 1993)) ind_resp <- which((1994 <= df$year) & (df$year <= 2013)) aemet_temp_pred <<- list("df" = df[ind_pred, ], "temp" = temp[ind_pred]) aemet_temp_resp <<- list("df" = df[ind_resp, ], "temp" = temp[ind_resp]) }) # Average the temperature on each period mean_aemet <- function(x) { m <- tapply(X = 1:nrow(x$temp$data), INDEX = x$df$ind, FUN = function(i) colMeans(x$temp$data[i, , drop = FALSE], na.rm = TRUE)) x$temp$data <- do.call(rbind, m) return(x$temp) } # Build predictor and response fdatas aemet_temp_pred <- mean_aemet(aemet_temp_pred) aemet_temp_resp <- mean_aemet(aemet_temp_resp) # Plot old_par <- par(mfrow = c(1, 2)) plot(aemet_temp_pred) plot(aemet_temp_resp) par(old_par) # Average daily temperatures day_avg_pred <- func_mean(aemet_temp_pred) day_avg_resp <- func_mean(aemet_temp_resp) # Average yearly temperatures avg_year_pred <- rowMeans(aemet_temp_pred$data) avg_year_resp <- rowMeans(aemet_temp_resp$data) ## Test the linear model with functional response and predictor (comp_flmfr <- flm_test(X = aemet_temp_pred, Y = aemet_temp_resp, est_method = "fpcr_l1s")) beta0 <- diag(rep(1, length(aemet_temp_pred$argvals))) (simp_flmfr <- flm_test(X = aemet_temp_pred, Y = aemet_temp_resp, beta0 = beta0, est_method = "fpcr_l1s")) # Visualize estimation filled.contour(x = aemet_temp_pred$argvals, y = aemet_temp_resp$argvals, z = comp_flmfr$fit_flm$Beta_hat, color.palette = viridisLite::viridis, nlevels = 20) ## Test the linear model with scalar response and functional predictor (comp_flmsr <- flm_test(X = aemet_temp_pred, Y = avg_year_resp, est_method = "fpcr_l1s")) (simp_flmsr <- flm_test(X = aemet_temp_pred, Y = avg_year_resp, beta0 = 1 / 365, est_method = "fpcr_l1s")) # Visualize estimation plot(aemet_temp_pred$argvals, comp_flmsr$fit_flm$Beta_hat, type = "l", ylim = c(0, 30 / 365)) abline(h = 1 / 365, col = 2) ## Test the linear model with functional response and scalar predictor (comp_frsp <- flm_test(X = avg_year_pred, Y = aemet_temp_resp)) (simp_frsp <- flm_test(X = avg_year_pred, Y = aemet_temp_resp, beta0 = 1)) ## Test the linear model with scalar response and predictor (comp_srsp <- flm_test(X = avg_year_pred, Y = avg_year_resp)) (simp_srsp <- flm_test(X = avg_year_pred, Y = avg_year_resp, beta0 = 1))
Convenience function for fitting multivariate linear models
with multivariate response by relying on cv.glmnet
from the glmnet-package
. The function fits the
multivariate linear model
where is a
-dimensional vector,
and
are two
-dimensional vectors, and
is a
matrix.
If and
are centered
(i.e., have zero-mean columns), the function estimates
by solving, for the sample
, the elastic-net optimization problem
where stands for
the Frobenious norm of the matrix
and
for the Euclidean norm
of the
-th row of
. The choice
in the elastic-net penalization corresponds to ridge
regression, whereas
yields a lasso-type estimator.
The unpenalized least-squares estimator is obtained with
.
cv_glmnet(x, y, alpha = c("lasso", "ridge")[1], lambda = NULL, intercept = TRUE, thresh = 1e-10, cv_1se = TRUE, cv_nlambda = 50, cv_folds = NULL, cv_grouped = FALSE, cv_lambda = 10^seq(2, -3, length.out = cv_nlambda), cv_second = TRUE, cv_tol_second = 0.025, cv_log10_exp = c(-0.5, 3), cv_thresh = 1e-05, cv_parallel = FALSE, cv_verbose = FALSE, ...)
cv_glmnet(x, y, alpha = c("lasso", "ridge")[1], lambda = NULL, intercept = TRUE, thresh = 1e-10, cv_1se = TRUE, cv_nlambda = 50, cv_folds = NULL, cv_grouped = FALSE, cv_lambda = 10^seq(2, -3, length.out = cv_nlambda), cv_second = TRUE, cv_tol_second = 0.025, cv_log10_exp = c(-0.5, 3), cv_thresh = 1e-05, cv_parallel = FALSE, cv_verbose = FALSE, ...)
x |
input matrix of size |
y |
response matrix of size |
alpha |
elastic net mixing argument in |
lambda |
scalar giving the regularization parameter |
intercept |
flag passed to the |
thresh |
convergence threshold of the coordinate descending algorithm,
passed to the |
cv_1se |
shall the optimal lambda be the |
cv_nlambda |
the length of the sequence of |
cv_folds |
number of folds to perform cross-validation. If
|
cv_grouped |
passed to the |
cv_lambda |
passed to the |
cv_second |
flag to perform a second cross-validation search if the
optimal |
cv_tol_second |
tolerance for performing a second search if
|
cv_log10_exp |
expansion of the |
cv_thresh |
convergence threshold used during cross-validation in
|
cv_parallel |
passed to the |
cv_verbose |
flag to display information about the cross-validation
search with plots and messages. More useful if |
... |
further parameters to be passed to |
If , then the lasso-type fit shrinks to zero,
simultaneously, all the elements of certain rows of
, thus
helping the selection of the
most influential variables in
for explaining/predicting
.
The function first performs a cross-validation search for the optimal
if
lambda = NULL
(using cv_thresh
to control
the convergence threshold). After the optimal penalization parameter is
determined, a second fit (now with convergence threshold thresh
)
using the default sequence in
glmnet
is performed. The final estimate is obtained via
predict.glmnet
from the optimal
determined in the first step.
Due to its cross-validatory nature, cv_glmnet
can be
computationally demanding. Approaches for reducing the computation time
include: considering a smaller number of folds than n
, such as
cv_folds = 10
(but will lead to random partitions of the
data); decrease the tolerance of the coordinate descending
algorithm by increasing cv_thresh
; reducing the number of
candidate values with
nlambda
; setting
second = FALSE
to avoid a second cross-validation; or considering
cv_parallel = TRUE
to use a parallel backend (must be registered
before hand; see examples).
By default, the sequence is used with standardized
and
(both divided by their
columnwise variances); see
glmnet
and the
standardized
argument. Therefore, the optimal selected
value assumes standardization and must be used with care if the variables
are not standardized. For example, when using the ridge analytical
solution, a prior change of scale that depends on
needs to be done.
See the examples for the details.
A list with the following entries:
beta_hat |
the estimated |
lambda |
the optimal |
cv |
if |
fit |
the |
Eduardo García-Portugués. Initial contributions by Gonzalo Álvarez-Pérez.
Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22. doi:10.18637/jss.v033.i01
## Quick example for multivariate linear model with multivariate response # Simulate data n <- 100 p <- 10; q <- 5 set.seed(123456) x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p) e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q) beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q) y <- x %*% beta + e # Fit lasso (model with intercept, the default) cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)$beta_hat ## Multivariate linear model with multivariate response # Simulate data n <- 100 p <- 10; q <- 5 set.seed(123456) x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p) e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q) beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q) y <- x %*% beta + e # Fit ridge cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE, cv_verbose = TRUE) cv0$beta_hat # Same fit for the chosen lambda cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE)$beta_hat # Fit lasso (model with intercept, the default) cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE) cv1$beta_hat # Use cv_1se = FALSE cv1_min <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE, cv_1se = FALSE) # Compare it with ridge analytical solution. Observe the factor in lambda, # necessary since lambda is searched for the standardized data. Note also # that, differently to the case q = 1, no standardarization with respect to # y happens sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n) cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, thresh = 1e-20, intercept = FALSE)$beta_hat solve(crossprod(x) + diag(cv0$lambda * sd_x^2 * n)) %*% t(x) %*% y # If x is standardized, the match between glmnet and usual ridge # analytical expression does not require scaling of lambda x_std <- scale(x, scale = sd_x, center = TRUE) cv_glmnet(x = x_std, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE, thresh = 1e-20)$beta_hat solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y ## Simple linear model # Simulate data n <- 100 p <- 1; q <- 1 set.seed(123456) x <- matrix(rnorm(n * p), nrow = n, ncol = p) e <- matrix(rnorm(n * q), nrow = n, ncol = q) beta <- 2 y <- x * beta + e # Fit by ridge (model with intercept, the default) cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE) cv0$beta_hat cv0$intercept # Comparison with linear model with intercept lm(y ~ 1 + x)$coefficients # Fit by ridge (model without intercept) cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE, intercept = FALSE) cv0$beta_hat # Comparison with linear model without intercept lm(y ~ 0 + x)$coefficients # Same fit for the chosen lambda (and without intercept) cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE)$beta_hat # Same for lasso (model with intercept, the default) cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso") cv1$beta_hat ## Multivariate linear model (p = 3, q = 1) # Simulate data n <- 50 p <- 10; q <- 1 set.seed(123456) x <- matrix(rnorm(n * p, mean = 1, sd = rep(1:p, each = n)), nrow = n, ncol = p) e <- matrix(rnorm(n * q), nrow = n, ncol = q) beta <- ((1:p - 1) / p)^2 y <- x %*% beta + e # Fit ridge (model without intercept) cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE, cv_verbose = TRUE) cv0$beta_hat # Same fit for the chosen lambda cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE)$beta_hat # Compare it with ridge analytical solution. Observe the factor in lambda, # necessary since lambda is searched for the standardized data sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n) sd_y <- sd(y) * sqrt((n - 1) / n) cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE, thresh = 1e-20)$beta_hat solve(crossprod(x) + diag(cv0$lambda * sd_x^2 / sd_y * n)) %*% t(x) %*% y # If x and y are standardized, the match between glmnet and usual ridge # analytical expression does not require scaling of lambda x_std <- scale(x, scale = sd_x, center = TRUE) y_std <- scale(y, scale = sd_y, center = TRUE) cv_glmnet(x = x_std, y = y_std, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE, thresh = 1e-20)$beta_hat solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y_std # Fit lasso (model with intercept, the default) cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE) cv1$beta_hat # ## Parallelization # # # Parallel # doMC::registerDoMC(cores = 2) # microbenchmark::microbenchmark( # cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = TRUE), # cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = FALSE), # times = 10)
## Quick example for multivariate linear model with multivariate response # Simulate data n <- 100 p <- 10; q <- 5 set.seed(123456) x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p) e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q) beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q) y <- x %*% beta + e # Fit lasso (model with intercept, the default) cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)$beta_hat ## Multivariate linear model with multivariate response # Simulate data n <- 100 p <- 10; q <- 5 set.seed(123456) x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p) e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q) beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q) y <- x %*% beta + e # Fit ridge cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE, cv_verbose = TRUE) cv0$beta_hat # Same fit for the chosen lambda cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE)$beta_hat # Fit lasso (model with intercept, the default) cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE) cv1$beta_hat # Use cv_1se = FALSE cv1_min <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE, cv_1se = FALSE) # Compare it with ridge analytical solution. Observe the factor in lambda, # necessary since lambda is searched for the standardized data. Note also # that, differently to the case q = 1, no standardarization with respect to # y happens sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n) cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, thresh = 1e-20, intercept = FALSE)$beta_hat solve(crossprod(x) + diag(cv0$lambda * sd_x^2 * n)) %*% t(x) %*% y # If x is standardized, the match between glmnet and usual ridge # analytical expression does not require scaling of lambda x_std <- scale(x, scale = sd_x, center = TRUE) cv_glmnet(x = x_std, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE, thresh = 1e-20)$beta_hat solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y ## Simple linear model # Simulate data n <- 100 p <- 1; q <- 1 set.seed(123456) x <- matrix(rnorm(n * p), nrow = n, ncol = p) e <- matrix(rnorm(n * q), nrow = n, ncol = q) beta <- 2 y <- x * beta + e # Fit by ridge (model with intercept, the default) cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE) cv0$beta_hat cv0$intercept # Comparison with linear model with intercept lm(y ~ 1 + x)$coefficients # Fit by ridge (model without intercept) cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE, intercept = FALSE) cv0$beta_hat # Comparison with linear model without intercept lm(y ~ 0 + x)$coefficients # Same fit for the chosen lambda (and without intercept) cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE)$beta_hat # Same for lasso (model with intercept, the default) cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso") cv1$beta_hat ## Multivariate linear model (p = 3, q = 1) # Simulate data n <- 50 p <- 10; q <- 1 set.seed(123456) x <- matrix(rnorm(n * p, mean = 1, sd = rep(1:p, each = n)), nrow = n, ncol = p) e <- matrix(rnorm(n * q), nrow = n, ncol = q) beta <- ((1:p - 1) / p)^2 y <- x %*% beta + e # Fit ridge (model without intercept) cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE, cv_verbose = TRUE) cv0$beta_hat # Same fit for the chosen lambda cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE)$beta_hat # Compare it with ridge analytical solution. Observe the factor in lambda, # necessary since lambda is searched for the standardized data sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n) sd_y <- sd(y) * sqrt((n - 1) / n) cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE, thresh = 1e-20)$beta_hat solve(crossprod(x) + diag(cv0$lambda * sd_x^2 / sd_y * n)) %*% t(x) %*% y # If x and y are standardized, the match between glmnet and usual ridge # analytical expression does not require scaling of lambda x_std <- scale(x, scale = sd_x, center = TRUE) y_std <- scale(y, scale = sd_y, center = TRUE) cv_glmnet(x = x_std, y = y_std, alpha = "ridge", lambda = cv0$lambda, intercept = FALSE, thresh = 1e-20)$beta_hat solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y_std # Fit lasso (model with intercept, the default) cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE) cv1$beta_hat # ## Parallelization # # # Parallel # doMC::registerDoMC(cores = 2) # microbenchmark::microbenchmark( # cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = TRUE), # cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = FALSE), # times = 10)
Simulation of , a random variable in the Hilbert space
of square-integrable functions in
,
, and
, a random variable in
.
Together with the bivariate kernel
, they are the necessary
elements for sampling a Functional Linear Model with Functional Response
(FLMFR):
The next functions sample and
, and
construct
, using different proposals in the literature:
r_cm2013_flmfr
is based on the numerical example given in
Section 3 of Crambes and Mas (2013). Termed as S1 in Section 2 of
García-Portugués et al. (2021).
r_ik2018_flmfr
is based on the numerical example given in
Section 4 of Imaizumi and Kato (2018), but zeroing the first Functional
Principal Components (FPC) coefficients of (so the first FPC
are not adequate for estimation). S3 in Section 2 of
García-Portugués et al. (2021).
r_gof2021_flmfr
gives a numerical example in Section 2
of García-Portugués et al. (2021), denoted therein as S2.
r_cm2013_flmfr(n, s = seq(0, 1, len = 101), t = seq(0, 1, len = 101), std_error = 0.15, n_fpc = 50, concurrent = FALSE) r_ik2018_flmfr(n, s = seq(0, 1, l = 101), t = seq(0, 1, l = 101), std_error = 1.5, parameters = c(1.75, 0.8, 2.4, 0.25), n_fpc = 50, concurrent = FALSE) r_gof2021_flmfr(n, s = seq(0, 1, len = 101), t = seq(0, 1, len = 101), std_error = 0.35, concurrent = FALSE)
r_cm2013_flmfr(n, s = seq(0, 1, len = 101), t = seq(0, 1, len = 101), std_error = 0.15, n_fpc = 50, concurrent = FALSE) r_ik2018_flmfr(n, s = seq(0, 1, l = 101), t = seq(0, 1, l = 101), std_error = 1.5, parameters = c(1.75, 0.8, 2.4, 0.25), n_fpc = 50, concurrent = FALSE) r_gof2021_flmfr(n, s = seq(0, 1, len = 101), t = seq(0, 1, len = 101), std_error = 0.35, concurrent = FALSE)
n |
number of trajectories to sample. |
s , t
|
grid points where functional covariates and responses are valued, respectively. |
std_error |
standard deviation of the random variables
involved in the generation of the functional error |
n_fpc |
number of FPC to be taken into account for the data generation.
Must be greater than |
concurrent |
flag to consider a concurrent FLMFR (degenerate case).
Defaults to |
parameters |
vector of parameters, only required for
|
Descriptions of the processes and
,
and of
can be seen in the references.
A list with the following elements:
X_fdata |
functional covariates, an
|
error_fdata |
functional errors, an
|
beta |
either the matrix with |
Javier Álvarez-Liébana.
Cardot, H. and Mas, A. (2013). Asymptotics of prediction in functional linear regression with functional outputs. Bernoulli, 19(5B):2627–2651. doi:10.3150/12-BEJ469
Imaizumi, M. and Kato, K. (2018). PCA-based estimation for functional linear regression with functional responses. Journal of Multivariate Analysis, 163:15–36. doi:10.1016/j.jmva.2017.10.001
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
# FLMFR based on Imaizumi and Kato (2018) adopting different Hilbert spaces s <- seq(0, 1, l = 201) t <- seq(2, 4, l = 301) r_ik2018 <- r_ik2018_flmfr(n = 50, s = s, t = t, std_error = 1.5, parameters = c(1.75, 0.8, 2.4, 0.25), n_fpc = 50) plot(r_ik2018$X_fdata) plot(r_ik2018$error_fdata) image(x = s, y = t, z = r_ik2018$beta, col = viridisLite::viridis(20)) # FLMFR based on Cardot and Mas (2013) adopting different Hilbert spaces r_cm2013 <- r_cm2013_flmfr(n = 50, s = s, t = t, std_error = 0.15, n_fpc = 50) plot(r_cm2013$X_fdata) plot(r_cm2013$error_fdata) image(x = s, y = t, z = r_cm2013$beta, col = viridisLite::viridis(20)) # FLMFR in García-Portugués et al. (2021) adopting different Hilbert spaces r_gof2021 <- r_gof2021_flmfr(n = 50, s = s, t = t, std_error = 0.35, concurrent = FALSE) plot(r_gof2021$X_fdata) plot(r_gof2021$error_fdata) image(x = s, y = t, z = r_gof2021$beta, col = viridisLite::viridis(20)) # Concurrent model in García-Portugués et al. (2021) r_gof2021 <- r_gof2021_flmfr(n = 50, s = s, t = s, std_error = 0.35, concurrent = TRUE) plot(r_gof2021$X_fdata) plot(r_gof2021$error_fdata) plot(r_gof2021$beta)
# FLMFR based on Imaizumi and Kato (2018) adopting different Hilbert spaces s <- seq(0, 1, l = 201) t <- seq(2, 4, l = 301) r_ik2018 <- r_ik2018_flmfr(n = 50, s = s, t = t, std_error = 1.5, parameters = c(1.75, 0.8, 2.4, 0.25), n_fpc = 50) plot(r_ik2018$X_fdata) plot(r_ik2018$error_fdata) image(x = s, y = t, z = r_ik2018$beta, col = viridisLite::viridis(20)) # FLMFR based on Cardot and Mas (2013) adopting different Hilbert spaces r_cm2013 <- r_cm2013_flmfr(n = 50, s = s, t = t, std_error = 0.15, n_fpc = 50) plot(r_cm2013$X_fdata) plot(r_cm2013$error_fdata) image(x = s, y = t, z = r_cm2013$beta, col = viridisLite::viridis(20)) # FLMFR in García-Portugués et al. (2021) adopting different Hilbert spaces r_gof2021 <- r_gof2021_flmfr(n = 50, s = s, t = t, std_error = 0.35, concurrent = FALSE) plot(r_gof2021$X_fdata) plot(r_gof2021$error_fdata) image(x = s, y = t, z = r_gof2021$beta, col = viridisLite::viridis(20)) # Concurrent model in García-Portugués et al. (2021) r_gof2021 <- r_gof2021_flmfr(n = 50, s = s, t = s, std_error = 0.35, concurrent = TRUE) plot(r_gof2021$X_fdata) plot(r_gof2021$error_fdata) plot(r_gof2021$beta)
Estimation of the linear operator relating a
functional predictor with a functional response
in the
linear model
where is a random variable in the Hilbert space of
square-integrable functions in
,
,
and
are random variables
in
, and
and
.
The linear, Hilbert–Schmidt, integral operator is parametrized by
the bivariate kernel . Its estimation is done through the truncated expansion
of
in the tensor product of the data-driven
bases of the Functional Principal Components (FPC) of
and
, and through the fitting of the resulting multivariate
linear model. The FPC basis for
is truncated in
components, while the FPC basis for
is truncated in
components. Automatic selection of
and
is detailed below.
The particular cases in which either or
are
constant functions give either a scalar predictor or response.
The simple linear model arises if both
and
are scalar,
for which
is a constant.
flm_est(X, Y, est_method = "fpcr_l1s", p = NULL, q = NULL, thre_p = 0.99, thre_q = 0.99, lambda = NULL, X_fpc = NULL, Y_fpc = NULL, compute_residuals = TRUE, centered = FALSE, int_rule = "trapezoid", cv_verbose = FALSE, ...)
flm_est(X, Y, est_method = "fpcr_l1s", p = NULL, q = NULL, thre_p = 0.99, thre_q = 0.99, lambda = NULL, X_fpc = NULL, Y_fpc = NULL, compute_residuals = TRUE, centered = FALSE, int_rule = "trapezoid", cv_verbose = FALSE, ...)
X , Y
|
samples of functional/scalar predictors and functional/scalar
response. Either |
est_method |
either |
p , q
|
index vectors indicating the specific FPC to be
considered for the truncated bases expansions of |
thre_p , thre_q
|
thresholds for the proportion of variance
that is explained, at least, by the first |
lambda |
regularization parameter |
X_fpc , Y_fpc
|
FPC decompositions of |
compute_residuals |
whether to compute the fitted values |
centered |
flag to indicate if |
int_rule |
quadrature rule for approximating the definite
unidimensional integral: trapezoidal rule ( |
cv_verbose |
flag to display information about the estimation procedure
(passed to |
... |
further parameters to be passed to |
flm_est
deals seamlessly with either functional or scalar inputs
for the predictor and response. In the case of scalar inputs, the
corresponding dimension-related arguments (p
, q
,
thre_p
or thre_q
) will be ignored as in these cases either
or
.
The function translates the functional linear model into a multivariate
model with multivariate response and then estimates the
matrix of coefficients of
in the
tensor basis of the FPC of
X
and Y
. The following estimation
methods are implemented:
"fpcr"
: Functional Principal Components Regression (FPCR);
see details in Ramsay and Silverman (2005).
"fpcr_l2"
: FPCR, with ridge penalty on the associated
multivariate linear model.
"fpcr_l1"
: FPCR, with lasso penalty on the associated
multivariate linear model.
"fpcr_l1s"
: FPCR, with FPC selected by lasso regression
on the associated multivariate linear model.
The last three methods are explained in García-Portugués et al. (2021).
The FPC of
X
and FPC of
Y
are determined
as follows:
If p = NULL
, then p
is set as
p_thre <- 1:j_thre
, where j_thre
is the -th FPC of
X
for which the cumulated proportion of explained variance is
greater than thre_p
. If p != NULL
, then p_thre <- p
.
If q = NULL
, then the same procedure is followed with
thre_q
, resulting q_thre
.
Once p_thre
and q_thre
have been obtained, the methods
"fpcr_l1"
and "fpcr_l1s"
perform a second selection
of the FPC that are effectively considered in the estimation of .
This subset of FPC (of
p_thre
) is encoded in p_hat
. No further
selection of FPC is done for the methods "fpcr"
and "fpcr_l2"
.
The flag compute_residuals
controls if Y_hat
,
Y_hat_scores
, residuals
, and residuals_scores
are
computed. If FALSE
, they are set to NULL
. Y_hat
equals
and
residuals
stands for , both for
.
Y_hat_scores
andresiduals_scores
are the matrices of coefficients (or scores) of these
functions in the FPC of
Y
.
Missing values on X
and Y
are automatically removed.
A list with the following entries:
Beta_hat |
estimated |
Beta_hat_scores |
the matrix of coefficients of |
H_hat |
hat matrix of the associated fitted multivariate
linear model, a matrix of size |
p_thre |
index vector indicating the FPC of |
p_hat |
index vector of the FPC considered by the methods
|
q_thre |
index vector indicating the FPC of |
est_method |
the estimation method employed. |
Y_hat |
fitted values, either an |
Y_hat_scores |
the matrix of coefficients of |
residuals |
residuals of the fitted model, either an
|
residuals_scores |
the matrix of coefficients of
|
X_fpc , Y_fpc
|
FPC of |
lambda |
regularization parameter |
cv |
cross-validation object returned by
|
Eduardo García-Portugués and Javier Álvarez-Liébana.
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
Ramsay, J. and Silverman, B. W. (2005). Functional Data Analysis. Springer-Verlag, New York.
## Quick example of functional response and functional predictor # Generate data set.seed(12345) n <- 50 X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2) epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5) Y_fdata <- 2 * X_fdata + epsilon # Lasso-selection FPCR (p and q are estimated) flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s") ## Functional response and functional predictor # Generate data set.seed(12345) n <- 50 X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2) epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5) Y_fdata <- 2 * X_fdata + epsilon # FPCR (p and q are estimated) fpcr_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr") fpcr_1$Beta_hat_scores fpcr_1$p_thre fpcr_1$q_thre # FPCR (p and q are provided) fpcr_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr", p = c(1, 5, 2, 7), q = 2:1) fpcr_2$Beta_hat_scores fpcr_2$p_thre fpcr_2$q_thre # Ridge FPCR (p and q are estimated) l2_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2") l2_1$Beta_hat_scores l2_1$p_hat # Ridge FPCR (p and q are provided) l2_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", p = c(1, 5, 2, 7), q = 2:1) l2_2$Beta_hat_scores l2_2$p_hat # Lasso FPCR (p and q are estimated) l1_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1") l1_1$Beta_hat_scores l1_1$p_thre l1_1$p_hat # Lasso estimator (p and q are provided) l1_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1", p = c(1, 5, 2, 7), q = 2:1) l1_2$Beta_hat_scores l1_2$p_thre l1_2$p_hat # Lasso-selection FPCR (p and q are estimated) l1s_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s") l1s_1$Beta_hat_scores l1s_1$p_thre l1s_1$p_hat # Lasso-selection FPCR (p and q are provided) l1s_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", p = c(1, 5, 2, 7), q = 1:4) l1s_2$Beta_hat_scores l1s_2$p_thre l1s_2$p_hat ## Scalar response # Generate data set.seed(12345) n <- 50 beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3) X_fdata <- fdata_cen(r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)) epsilon <- rnorm(n, sd = 0.25) Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta)) + epsilon # FPCR fpcr_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr") fpcr_4$p_hat # Ridge FPCR l2_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l2") l2_4$p_hat # Lasso FPCR l1_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1") l1_4$p_hat # Lasso-selection FPCR l1s_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1s") l1s_4$p_hat ## Scalar predictor # Generate data set.seed(12345) n <- 50 X <- rnorm(n) epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5) beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3) beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data), byrow = TRUE) Y_fdata <- beta * X + epsilon # FPCR fpcr_4 <- flm_est(X = X, Y = Y_fdata, est_method = "fpcr") plot(beta, col = 2) lines(beta$argvals, drop(fpcr_4$Beta_hat))
## Quick example of functional response and functional predictor # Generate data set.seed(12345) n <- 50 X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2) epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5) Y_fdata <- 2 * X_fdata + epsilon # Lasso-selection FPCR (p and q are estimated) flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s") ## Functional response and functional predictor # Generate data set.seed(12345) n <- 50 X_fdata <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2) epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5) Y_fdata <- 2 * X_fdata + epsilon # FPCR (p and q are estimated) fpcr_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr") fpcr_1$Beta_hat_scores fpcr_1$p_thre fpcr_1$q_thre # FPCR (p and q are provided) fpcr_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr", p = c(1, 5, 2, 7), q = 2:1) fpcr_2$Beta_hat_scores fpcr_2$p_thre fpcr_2$q_thre # Ridge FPCR (p and q are estimated) l2_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2") l2_1$Beta_hat_scores l2_1$p_hat # Ridge FPCR (p and q are provided) l2_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", p = c(1, 5, 2, 7), q = 2:1) l2_2$Beta_hat_scores l2_2$p_hat # Lasso FPCR (p and q are estimated) l1_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1") l1_1$Beta_hat_scores l1_1$p_thre l1_1$p_hat # Lasso estimator (p and q are provided) l1_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1", p = c(1, 5, 2, 7), q = 2:1) l1_2$Beta_hat_scores l1_2$p_thre l1_2$p_hat # Lasso-selection FPCR (p and q are estimated) l1s_1 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s") l1s_1$Beta_hat_scores l1s_1$p_thre l1s_1$p_hat # Lasso-selection FPCR (p and q are provided) l1s_2 <- flm_est(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", p = c(1, 5, 2, 7), q = 1:4) l1s_2$Beta_hat_scores l1s_2$p_thre l1s_2$p_hat ## Scalar response # Generate data set.seed(12345) n <- 50 beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3) X_fdata <- fdata_cen(r_ou(n = n, t = seq(0, 1, l = 201), sigma = 2)) epsilon <- rnorm(n, sd = 0.25) Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta)) + epsilon # FPCR fpcr_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr") fpcr_4$p_hat # Ridge FPCR l2_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l2") l2_4$p_hat # Lasso FPCR l1_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1") l1_4$p_hat # Lasso-selection FPCR l1s_4 <- flm_est(X = X_fdata, Y = Y, est_method = "fpcr_l1s") l1s_4$p_hat ## Scalar predictor # Generate data set.seed(12345) n <- 50 X <- rnorm(n) epsilon <- r_ou(n = n, t = seq(0, 1, l = 201), sigma = 0.5) beta <- r_ou(n = 1, t = seq(0, 1, l = 201), sigma = 0.5, x0 = 3) beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data), byrow = TRUE) Y_fdata <- beta * X + epsilon # FPCR fpcr_4 <- flm_est(X = X, Y = Y_fdata, est_method = "fpcr") plot(beta, col = 2) lines(beta$argvals, drop(fpcr_4$Beta_hat))
Computation of the Projected Cramér–von Mises (PCvM) test
statistic and its associated matrix.
For a sample of functional covariates
, the test
statistic is computed from
,
the coefficients (scores) of the sample in a
-truncated basis
expansion, such as Functional Principal Components (FPC).
The PCvM statistic is defined as
where
is the
matrix of multivariate residuals, and
is a
matrix whose
-th element is
, for
depending on
. Its exact expression can be seen in
Escanciano (2006) and García-Portugués et al. (2021).
flm_stat(E, p, Adot_vec, constant = TRUE) Adot(X)
flm_stat(E, p, Adot_vec, constant = TRUE) Adot(X)
E |
the matrix of multivariate residuals, with dimension
|
p |
dimension of the covariates space. Must be a positive integer. |
Adot_vec |
output from |
constant |
whether to include the constant of the PCvM test
statistic, |
X |
a matrix of size |
Adot
assumes that X
does not have repeated rows or otherwise
NaN
s will be present in the result. If X
has repeated rows,
Adot
will throw a warning.
The implementation of the PCvM test statistic for scalar response is
addressed in García-Portugués et al. (2014), whereas García-Portugués et al.
(2021) presents its multivariate extension and shows that
induces a weighted quadratic norm (if
there are no repetitions in the sample). The PCvM statistic is rooted in
the proposal by Escanciano (2006).
Both flm_stat
and A_dot
are coded in C++.
flm_stat
: the value of the test statistic, a scalar.
A_dot
: a vector of length n * (n - 1) / 2 + 1
.
The first entry contains the common diagonal element of
. The remaining entries are the upper
triangular matrix (excluding the diagonal) of
, stacked by columns.
Eduardo García-Portugués.
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
Escanciano, J. C. (2006) A consistent diagnostic test for regression models using projections. Econometric Theory, 22(6):1030–-1051. doi:10.1017/S0266466606060506
García-Portugués, E., González-Manteiga, W. and Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3):761–778. doi:10.1080/10618600.2013.812519
## flm_stat # Generate data n <- 200 q <- 2 p <- 3 E <- matrix(rnorm(n * q), nrow = n, ncol = q) X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101)) # Compute FPC X_fpc <- fpc(X_fdata) # Adot Adot_vec <- Adot(X = X_fpc[["scores"]]) # Check equality constant <- n^(-2) * 2 * pi^((p / 2) - 1) / gamma(p / 2) constant * .Fortran("pcvm_statistic", n = as.integer(n), Adot_vec = Adot_vec, residuals = E[, 2], statistic = 0)$statistic flm_stat(E = E[, 2, drop = FALSE], p = p, Adot_vec = Adot_vec, constant = FALSE) ## Adot # Generate data n <- 200 X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101)) # Compute FPC X_fpc <- fpc(X_fdata) # Using inprod_fdata and Adot Adot_vec <- Adot(X = X_fpc[["scores"]]) # Check with fda.usc::Adot with adequate inprod head(drop(Adot_vec)) head(fda.usc::Adot(X_fdata)) # Obtention of the entire Adot matrix Ad <- diag(rep(Adot_vec[1], n)) Ad[upper.tri(Ad, diag = FALSE)] <- Adot_vec[-1] head(Ad <- t(Ad) + Ad - diag(diag(Ad))) # Positive definite eigen(Ad)$values # # Warning if X contains repeated observations # Adot(X = rbind(1:3, 1:3, 3:5)) # Comparison with .Fortran("adot", PACKAGE = "fda.usc") n <- as.integer(n) a <- as.double(rep(0, (n * (n - 1) / 2 + 1))) inprod <- X_fpc[["scores"]] %*% t(X_fpc[["scores"]]) inprod <- inprod[upper.tri(inprod, diag = TRUE)] X <- X_fpc[["scores"]] microbenchmark::microbenchmark( .Fortran("adot", n = n, inprod = inprod, Adot_vec = a, PACKAGE = "fda.usc"), Adot(X = X), times = 50, control = list(warmup = 10))
## flm_stat # Generate data n <- 200 q <- 2 p <- 3 E <- matrix(rnorm(n * q), nrow = n, ncol = q) X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101)) # Compute FPC X_fpc <- fpc(X_fdata) # Adot Adot_vec <- Adot(X = X_fpc[["scores"]]) # Check equality constant <- n^(-2) * 2 * pi^((p / 2) - 1) / gamma(p / 2) constant * .Fortran("pcvm_statistic", n = as.integer(n), Adot_vec = Adot_vec, residuals = E[, 2], statistic = 0)$statistic flm_stat(E = E[, 2, drop = FALSE], p = p, Adot_vec = Adot_vec, constant = FALSE) ## Adot # Generate data n <- 200 X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101)) # Compute FPC X_fpc <- fpc(X_fdata) # Using inprod_fdata and Adot Adot_vec <- Adot(X = X_fpc[["scores"]]) # Check with fda.usc::Adot with adequate inprod head(drop(Adot_vec)) head(fda.usc::Adot(X_fdata)) # Obtention of the entire Adot matrix Ad <- diag(rep(Adot_vec[1], n)) Ad[upper.tri(Ad, diag = FALSE)] <- Adot_vec[-1] head(Ad <- t(Ad) + Ad - diag(diag(Ad))) # Positive definite eigen(Ad)$values # # Warning if X contains repeated observations # Adot(X = rbind(1:3, 1:3, 3:5)) # Comparison with .Fortran("adot", PACKAGE = "fda.usc") n <- as.integer(n) a <- as.double(rep(0, (n * (n - 1) / 2 + 1))) inprod <- X_fpc[["scores"]] %*% t(X_fpc[["scores"]]) inprod <- inprod[upper.tri(inprod, diag = TRUE)] X <- X_fpc[["scores"]] microbenchmark::microbenchmark( .Fortran("adot", n = n, inprod = inprod, Adot_vec = a, PACKAGE = "fda.usc"), Adot(X = X), times = 50, control = list(warmup = 10))
Computation of the functional linear term
of a Functional Linear Model with Functional Response (FLMFR), where
is a random variable in the Hilbert space of
square-integrable functions in
,
,
is the bivariate kernel of the FLMFR, and
is a random variable in
.
flm_term(X_fdata, beta, t, int_rule = "trapezoid", equispaced = NULL, concurrent = FALSE)
flm_term(X_fdata, beta, t, int_rule = "trapezoid", equispaced = NULL, concurrent = FALSE)
X_fdata |
sample of functional data as an
|
beta |
matrix containing the values |
t |
grid points where responses are valued. |
int_rule |
quadrature rule for approximating the definite
unidimensional integral: trapezoidal rule ( |
equispaced |
flag to indicate if |
concurrent |
flag to consider a concurrent FLMFR (degenerate case).
Defaults to |
Functional linear model term as the integral (in s
) between
X_fdata
and beta
, as an fdata
object of
length n
.
Javier Álvarez-Liébana.
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
## Generate a sample of functional responses via FLMFR # Bivariate kernel beta(s,t) as an egg carton shape s <- seq(0, 1, l = 101) t <- seq(0, 1, l = 201) beta <- outer(s, t, FUN = function(s, t) sin(6 * pi * s) + cos(6 * pi * t)) # Functional data as zero-mean Gaussian process with exponential variogram X_fdata <- fda.usc::rproc2fdata(n = 50, t = s, sigma = "vexponential", list = list(scale = 2.5)) # Functional error as an OU process with variance 0.075 sigma <- sqrt(0.075) * 2 error_fdata <- r_ou(n = 50, t = t, sigma = sigma) Y_fdata <- flm_term(X_fdata = X_fdata, beta = beta, t = t) + error_fdata plot(Y_fdata) ## Generate a sample of functional responses via concurrent model # Function beta(t) s <- seq(1, 3, l = 201) t <- seq(2, 5, l = 201) beta <- sin(pi * t) + cos(pi * t) # Functional data as zero-mean Gaussian process with exponential variogram X_fdata <- fda.usc::rproc2fdata(n = 50, t = s, sigma = "vexponential", list = list(scale = 2.5)) # Functional error as an OU process with variance 0.075 sigma <- sqrt(0.075) * 2 error_fdata <- r_ou(n = 50, t = t, sigma = sigma) Y_fdata <- flm_term(X_fdata = X_fdata, beta = beta, t = t, concurrent = TRUE) + error_fdata plot(Y_fdata)
## Generate a sample of functional responses via FLMFR # Bivariate kernel beta(s,t) as an egg carton shape s <- seq(0, 1, l = 101) t <- seq(0, 1, l = 201) beta <- outer(s, t, FUN = function(s, t) sin(6 * pi * s) + cos(6 * pi * t)) # Functional data as zero-mean Gaussian process with exponential variogram X_fdata <- fda.usc::rproc2fdata(n = 50, t = s, sigma = "vexponential", list = list(scale = 2.5)) # Functional error as an OU process with variance 0.075 sigma <- sqrt(0.075) * 2 error_fdata <- r_ou(n = 50, t = t, sigma = sigma) Y_fdata <- flm_term(X_fdata = X_fdata, beta = beta, t = t) + error_fdata plot(Y_fdata) ## Generate a sample of functional responses via concurrent model # Function beta(t) s <- seq(1, 3, l = 201) t <- seq(2, 5, l = 201) beta <- sin(pi * t) + cos(pi * t) # Functional data as zero-mean Gaussian process with exponential variogram X_fdata <- fda.usc::rproc2fdata(n = 50, t = s, sigma = "vexponential", list = list(scale = 2.5)) # Functional error as an OU process with variance 0.075 sigma <- sqrt(0.075) * 2 error_fdata <- r_ou(n = 50, t = t, sigma = sigma) Y_fdata <- flm_term(X_fdata = X_fdata, beta = beta, t = t, concurrent = TRUE) + error_fdata plot(Y_fdata)
Goodness-of-fit test of a functional linear model with
functional response and functional predictor
, where
is the Hilbert space of
square-integrable functions in
.
The goodness-of-fit test checks the linearity of the regression model
that relates
and
by
where is a random variable in
and
. The check is formalized as the
test of the composite hypothesis
where
is the linear, Hilbert–Schmidt, integral operator parametrized by
the bivariate kernel . Its estimation is done by the
truncated expansion of
in the tensor product of the
data-driven bases of Functional Principal Components (FPC) of
and
. The FPC basis for
is truncated in
components, while the FPC basis for
is truncated in
components.
The particular cases in which either or
are
constant functions give either a scalar predictor or response.
The simple linear model arises if both
and
are scalar,
for which
is a constant.
flm_test(X, Y, beta0 = NULL, B = 500, est_method = "fpcr", p = NULL, q = NULL, thre_p = 0.99, thre_q = 0.99, lambda = NULL, boot_scores = TRUE, verbose = TRUE, plot_dens = TRUE, plot_proc = TRUE, plot_max_procs = 100, plot_max_p = 2, plot_max_q = 2, save_fit_flm = TRUE, save_boot_stats = TRUE, int_rule = "trapezoid", refit_lambda = FALSE, ...)
flm_test(X, Y, beta0 = NULL, B = 500, est_method = "fpcr", p = NULL, q = NULL, thre_p = 0.99, thre_q = 0.99, lambda = NULL, boot_scores = TRUE, verbose = TRUE, plot_dens = TRUE, plot_proc = TRUE, plot_max_procs = 100, plot_max_p = 2, plot_max_q = 2, save_fit_flm = TRUE, save_boot_stats = TRUE, int_rule = "trapezoid", refit_lambda = FALSE, ...)
X , Y
|
samples of functional/scalar predictors and functional/scalar
response. Either |
beta0 |
if provided (defaults to |
B |
number of bootstrap replicates. Defaults to |
est_method |
either |
p , q
|
either index vectors indicating the specific FPC to be
considered for the truncated bases expansions of |
thre_p , thre_q
|
thresholds for the proportion of variance
that is explained, at least, by the first |
lambda |
regularization parameter |
boot_scores |
flag to indicate if the bootstrap shall be applied to the
scores of the residuals, rather than to the functional residuals. This
improves the computational expediency notably. Defaults to |
verbose |
flag to show information about the testing progress. Defaults
to |
plot_dens |
flag to indicate if a kernel density estimation of the
bootstrap statistics shall be plotted. Defaults to |
plot_proc |
whether to display a graphical tool to identify the
degree of departure from the null hypothesis. If |
plot_max_procs |
maximum number of bootstrapped processes to plot in
the graphical tool. Set as the minimum of |
plot_max_p , plot_max_q
|
maximum number of FPC directions to be
considered in the graphical tool. They limit the resulting plot to be at
most of size |
save_fit_flm , save_boot_stats
|
flag to return |
int_rule |
quadrature rule for approximating the definite
unidimensional integral: trapezoidal rule ( |
refit_lambda |
flag to reselect |
... |
further parameters to be passed to |
The function implements the bootstrap-based goodness-of-fit test for the functional linear model with functional/scalar response and functional/scalar predictor, as described in Algorithm 1 in García-Portugués et al. (2021). The specifics are detailed there.
By default cv_1se = TRUE
for cv_glmnet
is
considered, unless it is changed via ...
. This is the recommended
choice for conducting the goodness-of-fit test based on regularized
estimators, as the oversmoothed estimate of the regression model under the
null hypothesis notably facilitates the calibration of the test (see
García-Portugués et al., 2021).
The graphical tool obtained with plot_proc = TRUE
is based on
an extension of the tool described in García-Portugués et al. (2014).
Repeated observations on X
are internally removed, as otherwise they
would cause NaN
s in Adot
. Missing values on X
and
Y
are also automatically removed.
An object of the htest
class with the following elements:
statistic |
test statistic. |
p.value |
|
boot_statistics |
the bootstrapped test statistics, a vector
of length |
method |
information on the type of test performed. |
parameter |
a vector with the dimensions |
p |
the index of the FPC considered for |
q |
the index of the FPC considered for |
fit_flm |
the output resulted from calling |
boot_lambda |
bootstrapped |
boot_p |
a list with the bootstrapped indexes of the FPC considered
for |
data.name |
name of the value of |
Eduardo García-Portugués.
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
García-Portugués, E., González-Manteiga, W. and Febrero-Bande, M. (2014). A goodness-of-fit test for the functional linear model with scalar response. Journal of Computational and Graphical Statistics, 23(3):761–778. doi:10.1080/10618600.2013.812519
## Quick example for functional response and predictor # Generate data under H0 n <- 100 set.seed(987654321) X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 2) epsilon <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 0.5) Y_fdata <- epsilon # Test the FLMFR flm_test(X = X_fdata, Y = Y_fdata) # Simple hypothesis flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0) # Generate data under H1 n <- 100 set.seed(987654321) sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = seq(0, 1, l = 101), t = seq(0, 1, l = 101), nonlinear = "quadratic") X_fdata <- sample_frm_fr[["X_fdata"]] Y_fdata <- sample_frm_fr[["Y_fdata"]] # Test the FLMFR flm_test(X = X_fdata, Y = Y_fdata) ## Functional response and predictor # Generate data under H0 n <- 50 B <- 100 set.seed(987654321) t <- seq(0, 1, l = 201) X_fdata <- r_ou(n = n, t = t, sigma = 2) epsilon <- r_ou(n = n, t = t, sigma = 0.5) Y_fdata <- epsilon # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) # Simple hypothesis flm_test(X = X_fdata, Y = Y_fdata, beta0 = 2, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr_l1s", B = B) # Generate data under H1 n <- 50 B <- 100 set.seed(987654321) sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = t, t = t, nonlinear = "quadratic") X_fdata <- sample_frm_fr$X_fdata Y_fdata <- sample_frm_fr$Y_fdata # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) ## Scalar response and functional predictor # Generate data under H0 n <- 50 B <- 100 set.seed(987654321) t <- seq(0, 1, l = 201) X_fdata <- r_ou(n = n, t = t, sigma = 2) beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 2) epsilon <- rnorm(n = n) Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta) + epsilon) # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) # Simple hypothesis flm_test(X = X_fdata, Y = Y, beta0 = beta, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr_l1s", B = B) # Generate data under H1 n <- 50 B <- 100 set.seed(987654321) X_fdata <- r_ou(n = n, t = t, sigma = 2) beta <- r_ou(n = 1, t = t, sigma = 0.5) epsilon <- rnorm(n = n) Y <- drop(exp(inprod_fdata(X_fdata1 = X_fdata^2, X_fdata2 = beta)) + epsilon) # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) ## Functional response and scalar predictor # Generate data under H0 n <- 50 B <- 100 set.seed(987654321) X <- rnorm(n) t <- seq(0, 1, l = 201) beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3) beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data), byrow = TRUE) epsilon <- r_ou(n = n, t = t, sigma = 0.5) Y_fdata <- X * beta + epsilon # With boot_scores = TRUE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B) # With boot_scores = FALSE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B) # Simple hypothesis flm_test(X = X, Y = Y_fdata, beta0 = beta[1], est_method = "fpcr", B = B) flm_test(X = X, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B) # Generate data under H1 n <- 50 B <- 100 set.seed(987654321) X <- rexp(n) beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3) beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data), byrow = TRUE) epsilon <- r_ou(n = n, t = t, sigma = 0.5) Y_fdata <- log(X * beta) + epsilon # With boot_scores = TRUE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B) # With boot_scores = FALSE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B)
## Quick example for functional response and predictor # Generate data under H0 n <- 100 set.seed(987654321) X_fdata <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 2) epsilon <- r_ou(n = n, t = seq(0, 1, l = 101), sigma = 0.5) Y_fdata <- epsilon # Test the FLMFR flm_test(X = X_fdata, Y = Y_fdata) # Simple hypothesis flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0) # Generate data under H1 n <- 100 set.seed(987654321) sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = seq(0, 1, l = 101), t = seq(0, 1, l = 101), nonlinear = "quadratic") X_fdata <- sample_frm_fr[["X_fdata"]] Y_fdata <- sample_frm_fr[["Y_fdata"]] # Test the FLMFR flm_test(X = X_fdata, Y = Y_fdata) ## Functional response and predictor # Generate data under H0 n <- 50 B <- 100 set.seed(987654321) t <- seq(0, 1, l = 201) X_fdata <- r_ou(n = n, t = t, sigma = 2) epsilon <- r_ou(n = n, t = t, sigma = 0.5) Y_fdata <- epsilon # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) # Simple hypothesis flm_test(X = X_fdata, Y = Y_fdata, beta0 = 2, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, beta0 = 0, est_method = "fpcr_l1s", B = B) # Generate data under H1 n <- 50 B <- 100 set.seed(987654321) sample_frm_fr <- r_frm_fr(n = n, scenario = 3, s = t, t = t, nonlinear = "quadratic") X_fdata <- sample_frm_fr$X_fdata Y_fdata <- sample_frm_fr$Y_fdata # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y_fdata, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) ## Scalar response and functional predictor # Generate data under H0 n <- 50 B <- 100 set.seed(987654321) t <- seq(0, 1, l = 201) X_fdata <- r_ou(n = n, t = t, sigma = 2) beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 2) epsilon <- rnorm(n = n) Y <- drop(inprod_fdata(X_fdata1 = X_fdata, X_fdata2 = beta) + epsilon) # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) # Simple hypothesis flm_test(X = X_fdata, Y = Y, beta0 = beta, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, beta0 = 0, est_method = "fpcr_l1s", B = B) # Generate data under H1 n <- 50 B <- 100 set.seed(987654321) X_fdata <- r_ou(n = n, t = t, sigma = 2) beta <- r_ou(n = 1, t = t, sigma = 0.5) epsilon <- rnorm(n = n) Y <- drop(exp(inprod_fdata(X_fdata1 = X_fdata^2, X_fdata2 = beta)) + epsilon) # With boot_scores = TRUE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", B = B) # With boot_scores = FALSE flm_test(X = X_fdata, Y = Y, est_method = "fpcr", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l2", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1", boot_scores = FALSE, B = B) flm_test(X = X_fdata, Y = Y, est_method = "fpcr_l1s", boot_scores = FALSE, B = B) ## Functional response and scalar predictor # Generate data under H0 n <- 50 B <- 100 set.seed(987654321) X <- rnorm(n) t <- seq(0, 1, l = 201) beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3) beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data), byrow = TRUE) epsilon <- r_ou(n = n, t = t, sigma = 0.5) Y_fdata <- X * beta + epsilon # With boot_scores = TRUE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B) # With boot_scores = FALSE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B) # Simple hypothesis flm_test(X = X, Y = Y_fdata, beta0 = beta[1], est_method = "fpcr", B = B) flm_test(X = X, Y = Y_fdata, beta0 = 0, est_method = "fpcr", B = B) # Generate data under H1 n <- 50 B <- 100 set.seed(987654321) X <- rexp(n) beta <- r_ou(n = 1, t = t, sigma = 0.5, x0 = 3) beta$data <- matrix(beta$data, nrow = n, ncol = ncol(beta$data), byrow = TRUE) epsilon <- r_ou(n = n, t = t, sigma = 0.5) Y_fdata <- log(X * beta) + epsilon # With boot_scores = TRUE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", B = B) # With boot_scores = FALSE flm_test(X = X, Y = Y_fdata, est_method = "fpcr", boot_scores = FALSE, B = B)
Computation of Functional Principal Components (FPC) for equispaced and non equispaced functional data.
fpc(X_fdata, n_fpc = 3, centered = FALSE, int_rule = "trapezoid", equispaced = FALSE, verbose = FALSE)
fpc(X_fdata, n_fpc = 3, centered = FALSE, int_rule = "trapezoid", equispaced = FALSE, verbose = FALSE)
X_fdata |
sample of functional data as an
|
n_fpc |
number of FPC to be computed. If |
centered |
flag to indicate if |
int_rule |
quadrature rule for approximating the definite
unidimensional integral: trapezoidal rule ( |
equispaced |
flag to indicate if |
verbose |
whether to show or not information about the |
The FPC are obtained by performing the single value decomposition
where is the matrix of discretized functional data,
is a diagonal matrix of weights (computed by
w_integral1D
according to int_rule
),
is the diagonal matrix with singular values (standard deviations of FPC),
is a matrix whose columns contain the left singular
vectors, and
is a matrix whose columns contain the
right singular vectors (FPC).
An "fpc"
object containing the following elements:
d |
standard deviations of the FPC (i.e., square roots of eigenvalues of the empirical autocovariance estimator). |
rotation |
orthonormal eigenfunctions (loadings or functional
principal components), as an |
scores |
rotated samples: inner products.
between |
l |
vector of index of FPC. |
equispaced |
|
Javier Álvarez-Liébana and Gonzalo Álvarez-Pérez.
Jolliffe, I. T. (2002). Principal Component Analysis. Springer-Verlag, New York.
## Computing FPC for equispaced data # Sample data X_fdata1 <- r_ou(n = 200, t = seq(2, 4, l = 201)) # FPC with trapezoid rule X_fpc1 <- fpc(X_fdata = X_fdata1, n_fpc = 50, equispaced = TRUE, int_rule = "trapezoid") # FPC with Simpsons's rule X_fpc2 <- fpc(X_fdata = X_fdata1, n_fpc = 50, equispaced = TRUE, int_rule = "Simpson") # Check if FPC are orthonormal norms1 <- rep(0, length(X_fpc1$l)) for (i in X_fpc1$l) { norms1[i] <- integral1D(fx = X_fpc1$rotation$data[i, ]^2, t = X_fdata1$argvals) } norms2 <- rep(0, length(X_fpc2$l)) for (i in X_fpc2$l) { norms2[i] <- integral1D(fx = X_fpc2$rotation$data[i, ]^2, t = X_fdata1$argvals) } ## Computing FPC for non equispaced data # Sample data X_fdata2 <- r_ou(n = 200, t = c(seq(0, 0.5, l = 201), seq(0.51, 1, l = 301))) # FPC with trapezoid rule X_fpc3 <- fpc(X_fdata = X_fdata2, n_fpc = 5, int_rule = "trapezoid", equispaced = FALSE) # Check if FPC are orthonormal norms3 <- rep(0, length(X_fpc3$l)) for (i in X_fpc3$l) { norms3[i] <- integral1D(fx = X_fpc3$rotation$data[i, ]^2, t = X_fdata2$argvals) } ## Efficiency comparisons # fpc() vs. fda.usc::fdata2pc() data(phoneme, package = "fda.usc") mlearn <- phoneme$learn[1:10, ] res1 <- fda.usc::fdata2pc(mlearn, ncomp = 3) res2 <- fpc(X_fdata = mlearn, n_fpc = 3) plot(res1$x[, 1:3], col = 1) points(res2$scores, col = 2) microbenchmark::microbenchmark(fda.usc::fdata2pc(mlearn, ncomp = 3), fpc(X_fdata = mlearn, n_fpc = 3), times = 1e3, control = list(warmup = 20))
## Computing FPC for equispaced data # Sample data X_fdata1 <- r_ou(n = 200, t = seq(2, 4, l = 201)) # FPC with trapezoid rule X_fpc1 <- fpc(X_fdata = X_fdata1, n_fpc = 50, equispaced = TRUE, int_rule = "trapezoid") # FPC with Simpsons's rule X_fpc2 <- fpc(X_fdata = X_fdata1, n_fpc = 50, equispaced = TRUE, int_rule = "Simpson") # Check if FPC are orthonormal norms1 <- rep(0, length(X_fpc1$l)) for (i in X_fpc1$l) { norms1[i] <- integral1D(fx = X_fpc1$rotation$data[i, ]^2, t = X_fdata1$argvals) } norms2 <- rep(0, length(X_fpc2$l)) for (i in X_fpc2$l) { norms2[i] <- integral1D(fx = X_fpc2$rotation$data[i, ]^2, t = X_fdata1$argvals) } ## Computing FPC for non equispaced data # Sample data X_fdata2 <- r_ou(n = 200, t = c(seq(0, 0.5, l = 201), seq(0.51, 1, l = 301))) # FPC with trapezoid rule X_fpc3 <- fpc(X_fdata = X_fdata2, n_fpc = 5, int_rule = "trapezoid", equispaced = FALSE) # Check if FPC are orthonormal norms3 <- rep(0, length(X_fpc3$l)) for (i in X_fpc3$l) { norms3[i] <- integral1D(fx = X_fpc3$rotation$data[i, ]^2, t = X_fdata2$argvals) } ## Efficiency comparisons # fpc() vs. fda.usc::fdata2pc() data(phoneme, package = "fda.usc") mlearn <- phoneme$learn[1:10, ] res1 <- fda.usc::fdata2pc(mlearn, ncomp = 3) res2 <- fpc(X_fdata = mlearn, n_fpc = 3) plot(res1$x[, 1:3], col = 1) points(res2$scores, col = 2) microbenchmark::microbenchmark(fda.usc::fdata2pc(mlearn, ncomp = 3), fpc(X_fdata = mlearn, n_fpc = 3), times = 1e3, control = list(warmup = 20))
Computation of coefficients and reconstructions based on
Functional Principal Components (FPC). The function fpc_coefs
allows
to project a functional data sample into a basis of FPC; the reconstruction
of the sample from its projections and the FPC is done with
fpc_to_fdata
. The functions beta_fpc_coefs
and
fpc_to_beta
do analogous operations but for the
bivariate kernel and the tensor product
of two FPC bases.
fpc_coefs(X_fdata, X_fpc, ind_X_fpc = 1:3, int_rule = "trapezoid") beta_fpc_coefs(beta, X_fpc, Y_fpc, ind_X_fpc = 1:3, ind_Y_fpc = 1:3, int_rule = "trapezoid") fpc_to_fdata(coefs, X_fpc, ind_coefs = seq_len(ncol(coefs))) fpc_to_beta(beta_coefs, X_fpc, Y_fpc, ind_coefs_X = seq_len(nrow(beta_coefs)), ind_coefs_Y = seq_len(ncol(beta_coefs)))
fpc_coefs(X_fdata, X_fpc, ind_X_fpc = 1:3, int_rule = "trapezoid") beta_fpc_coefs(beta, X_fpc, Y_fpc, ind_X_fpc = 1:3, ind_Y_fpc = 1:3, int_rule = "trapezoid") fpc_to_fdata(coefs, X_fpc, ind_coefs = seq_len(ncol(coefs))) fpc_to_beta(beta_coefs, X_fpc, Y_fpc, ind_coefs_X = seq_len(nrow(beta_coefs)), ind_coefs_Y = seq_len(ncol(beta_coefs)))
X_fdata |
sample of functional data as an
|
X_fpc , Y_fpc
|
|
ind_X_fpc , ind_Y_fpc
|
vectors giving the FPC indexes for whom the
coefficients are computed. Their lengths must be smaller than the number of
FPC in |
int_rule |
quadrature rule for approximating the definite
unidimensional integral: trapezoidal rule ( |
beta |
a matrix containing the bivariate kernel |
coefs |
a vector of coefficients to combine linearly the FPC. Its
length must be smaller than the number of FPC in |
ind_coefs , ind_coefs_X , ind_coefs_Y
|
indexes of FPC to associate to the
provided coefficients. By default, from the first FPC to the sizes of
|
beta_coefs |
a matrix of coefficients to combine linearly the tensor
products of FPC. Its size must be smaller than the number of FPC in
|
fpc_coefs |
a vector of the same length as |
beta_fpc_coefs |
a matrix of the same size as |
fpc_to_fdata |
an |
fpc_to_beta |
a matrix with the reconstructed kernel and size |
Eduardo García-Portugués.
Jolliffe, I. T. (2002). Principal Component Analysis. Springer-Verlag, New York.
## Compute FPC coefficients and reconstruct data # Sample data X_fdata <- r_ou(n = 200, t = seq(2, 4, l = 201)) # Compute FPC X_fpc <- fpc(X_fdata = X_fdata, n_fpc = 50) # FPC coefficients are the same if the data is centered fpc_coefs(X_fdata = fdata_cen(X_fdata), X_fpc = X_fpc)[1:4, ] X_fpc$scores[1:4, 1:3] # Reconstruct the first two curves for an increasing number of FPC plot(X_fdata[1:2, ], col = 1) n_fpc <- c(2, 5, 10, 25, 50) for (j in 1:5) { lines(fpc_to_fdata(X_fpc = X_fpc, coefs = X_fpc$scores[, 1:n_fpc[j]])[1:2, ], col = j + 1) } ## Project and reconstruct beta # Surface beta_fun <- function(s, t) sin(6 * pi * s) + cos(6 * pi * t) s <- seq(0, 1, l = 101) t <- seq(0, 1, l = 201) beta_surf <- outer(s, t, FUN = beta_fun) # Functional data as zero-mean Gaussian process with exponential variogram X_fdata <- fda.usc::rproc2fdata(n = 100, t = s, sigma = "vexponential", list = list(scale = 2.5)) Y_fdata <- flm_term(X_fdata = X_fdata, beta = beta_surf, t = t) + r_ou(n = 100, t = t, sigma = sqrt(0.075) * 2) # FPC X_fpc <- fpc(X_fdata = X_fdata, n_fpc = 50) Y_fpc <- fpc(X_fdata = Y_fdata, n_fpc = 50) # Coefficients beta_coefs <- beta_fpc_coefs(beta = beta_surf, X_fpc = X_fpc, Y_fpc = Y_fpc, ind_X_fpc = 1:50, ind_Y_fpc = 1:50) # Reconstruction beta_surf1 <- fpc_to_beta(beta_coefs = beta_coefs[1:2, 1:5], X_fpc = X_fpc, Y_fpc = Y_fpc) beta_surf2 <- fpc_to_beta(beta_coefs = beta_coefs[1:15, 1:10], X_fpc = X_fpc, Y_fpc = Y_fpc) beta_surf3 <- fpc_to_beta(beta_coefs = beta_coefs[1:50, 1:50], X_fpc = X_fpc, Y_fpc = Y_fpc) # Show reconstructions old_par <- par(mfrow = c(2, 2)) col <- viridisLite::viridis(20) image(s, t, beta_surf, col = col, zlim = c(-2.5, 2.5), main = "Original") image(s, t, beta_surf1, col = col, zlim = c(-2.5, 2.5), main = "2 x 5") image(s, t, beta_surf2, col = col, zlim = c(-2.5, 2.5), main = "15 x 10") image(s, t, beta_surf3, col = col, zlim = c(-2.5, 2.5), main = "50 x 50") par(old_par)
## Compute FPC coefficients and reconstruct data # Sample data X_fdata <- r_ou(n = 200, t = seq(2, 4, l = 201)) # Compute FPC X_fpc <- fpc(X_fdata = X_fdata, n_fpc = 50) # FPC coefficients are the same if the data is centered fpc_coefs(X_fdata = fdata_cen(X_fdata), X_fpc = X_fpc)[1:4, ] X_fpc$scores[1:4, 1:3] # Reconstruct the first two curves for an increasing number of FPC plot(X_fdata[1:2, ], col = 1) n_fpc <- c(2, 5, 10, 25, 50) for (j in 1:5) { lines(fpc_to_fdata(X_fpc = X_fpc, coefs = X_fpc$scores[, 1:n_fpc[j]])[1:2, ], col = j + 1) } ## Project and reconstruct beta # Surface beta_fun <- function(s, t) sin(6 * pi * s) + cos(6 * pi * t) s <- seq(0, 1, l = 101) t <- seq(0, 1, l = 201) beta_surf <- outer(s, t, FUN = beta_fun) # Functional data as zero-mean Gaussian process with exponential variogram X_fdata <- fda.usc::rproc2fdata(n = 100, t = s, sigma = "vexponential", list = list(scale = 2.5)) Y_fdata <- flm_term(X_fdata = X_fdata, beta = beta_surf, t = t) + r_ou(n = 100, t = t, sigma = sqrt(0.075) * 2) # FPC X_fpc <- fpc(X_fdata = X_fdata, n_fpc = 50) Y_fpc <- fpc(X_fdata = Y_fdata, n_fpc = 50) # Coefficients beta_coefs <- beta_fpc_coefs(beta = beta_surf, X_fpc = X_fpc, Y_fpc = Y_fpc, ind_X_fpc = 1:50, ind_Y_fpc = 1:50) # Reconstruction beta_surf1 <- fpc_to_beta(beta_coefs = beta_coefs[1:2, 1:5], X_fpc = X_fpc, Y_fpc = Y_fpc) beta_surf2 <- fpc_to_beta(beta_coefs = beta_coefs[1:15, 1:10], X_fpc = X_fpc, Y_fpc = Y_fpc) beta_surf3 <- fpc_to_beta(beta_coefs = beta_coefs[1:50, 1:50], X_fpc = X_fpc, Y_fpc = Y_fpc) # Show reconstructions old_par <- par(mfrow = c(2, 2)) col <- viridisLite::viridis(20) image(s, t, beta_surf, col = col, zlim = c(-2.5, 2.5), main = "Original") image(s, t, beta_surf1, col = col, zlim = c(-2.5, 2.5), main = "2 x 5") image(s, t, beta_surf2, col = col, zlim = c(-2.5, 2.5), main = "15 x 10") image(s, t, beta_surf3, col = col, zlim = c(-2.5, 2.5), main = "50 x 50") par(old_par)
Real dataset employed Benatia et al. (2017). Contains the hourly electricity consumption and air temperature curves in the province of Ontario (Canada). It features a set of daily curves during the summer months of 2010–2014.
ontario
ontario
A list with the following entries:
an fdata
with 368 smoothed
daily temperature (in Celsius degrees) curves of the Ontario province,
discretized on 73 equispaced grid points on
(see examples).
an fdata
with the daily
electricity consumption (in gigawatts) curves of the Ontario province.
Discretized on 25 equispaced grid points on .
a dataframe with time metadata for each curve:
date
: the date of the observation, a POSIXct
object.
weekday
: the weekday of the observation.
The summer months correspond to June 1st to September 15th. Weekend days and holidays are disregarded.
The smoothed temperature curves are constructed by a weighted average of the temperatures of 41 Ontarian cities that is afterwards smoothed with a local polynomial regression. The curves correspond to a 3-days window of the temperature (see examples). The temperature is standardized such that its original minimum, 6 ºC, is subtracted.
The electricity consumption curves are discretized on the interval
. That means that the last observation of the
-th curve is the same as the first observation of the
-th curve if the curves correspond to consecutive days.
See more details about the construction of the dataset in Benatia et al. (2017).
Data gathered and processed by David Benatia, Marine Carrasco, and Jean-Pierre Florens. Javier Álvarez-Liébana and Eduardo García-Portugués imported the dataset and added temporal metadata.
The dataset comes from the companion data to Benatia et al. (2017), which was retrieved from the first author's website. The source of the electricity consumption data is the System operator's website. The source of the preprocessed temperature values is the Environment Canada's website.
Benatia, D., Carrasco, M. and Florens, J. P. (2017) Functional linear regression with functional response. Journal of Econometrics, 201(2):269–291. doi:10.1016/j.jeconom.2017.08.008
## Show data # Load data data("ontario") # Plot old_par <- par(mfrow = c(1, 2)) plot(ontario$temp) plot(ontario$elec) par(old_par) # Observe the 3-day windows for each observation plot(ontario$temp$argvals, ontario$temp$data[2, ], type = "o", xlim = c(-48, 72), ylim = c(7, 13), xlab = "Hours", ylab = "Electricity consumption", pch = 16) points(ontario$temp$argvals - 24, ontario$temp$data[1, ], col = 3, pch = 2) points(ontario$temp$argvals + 24, ontario$temp$data[3, ], col = 2, cex = 1.5) abline(v = 24 * -2:3, lty = 2) legend("top", legend = c("Curve 1", "Curve 2", "Curve 3"), col = c(3, 1, 2), pt.cex = c(1, 1, 1.5), pch = c(2, 16, 1)) # If the days are not consecutive, then the electricity consumptions at the # end of one day and the beginning of the next do not match head(abs(ontario$elec$data[-368, 25] - ontario$elec$data[-1, 1])) head(diff(ontario$df$date)) ## Test the linear model with functional response and predictor (comp_flmfr <- flm_test(X = ontario$temp, Y = ontario$elec, est_method = "fpcr_l1s")) (simp_flmfr <- flm_test(X = ontario$temp, Y = ontario$elec, beta0 = 0, est_method = "fpcr_l1s")) # Visualize estimation filled.contour(x = ontario$temp$argvals, y = ontario$elec$argvals, z = comp_flmfr$fit_flm$Beta_hat, color.palette = viridisLite::viridis, nlevels = 20)
## Show data # Load data data("ontario") # Plot old_par <- par(mfrow = c(1, 2)) plot(ontario$temp) plot(ontario$elec) par(old_par) # Observe the 3-day windows for each observation plot(ontario$temp$argvals, ontario$temp$data[2, ], type = "o", xlim = c(-48, 72), ylim = c(7, 13), xlab = "Hours", ylab = "Electricity consumption", pch = 16) points(ontario$temp$argvals - 24, ontario$temp$data[1, ], col = 3, pch = 2) points(ontario$temp$argvals + 24, ontario$temp$data[3, ], col = 2, cex = 1.5) abline(v = 24 * -2:3, lty = 2) legend("top", legend = c("Curve 1", "Curve 2", "Curve 3"), col = c(3, 1, 2), pt.cex = c(1, 1, 1.5), pch = c(2, 16, 1)) # If the days are not consecutive, then the electricity consumptions at the # end of one day and the beginning of the next do not match head(abs(ontario$elec$data[-368, 25] - ontario$elec$data[-1, 1])) head(diff(ontario$df$date)) ## Test the linear model with functional response and predictor (comp_flmfr <- flm_test(X = ontario$temp, Y = ontario$elec, est_method = "fpcr_l1s")) (simp_flmfr <- flm_test(X = ontario$temp, Y = ontario$elec, beta0 = 0, est_method = "fpcr_l1s")) # Visualize estimation filled.contour(x = ontario$temp$argvals, y = ontario$elec$argvals, z = comp_flmfr$fit_flm$Beta_hat, color.palette = viridisLite::viridis, nlevels = 20)
Simulation of trajectories of the Ornstein–Uhlenbeck process
. The process is the solution to the stochastic
differential equation
whose stationary distribution is , for
and
.
Given an initial point and the evaluation times
, a sample trajectory
can be obtained by sampling the joint Gaussian distribution of
.
r_ou(n, t = seq(0, 1, len = 201), mu = 0, alpha = 1, sigma = 1, x0 = rnorm(n, mean = mu, sd = sigma/sqrt(2 * alpha)))
r_ou(n, t = seq(0, 1, len = 201), mu = 0, alpha = 1, sigma = 1, x0 = rnorm(n, mean = mu, sd = sigma/sqrt(2 * alpha)))
n |
number of trajectories to sample. |
t |
evaluation times for the trajectories, a vector. |
mu |
mean of the process, a scalar. |
alpha |
strength of the drift, a positive scalar. |
sigma |
diffusion coefficient, a positive scalar. |
x0 |
a vector of length |
Random trajectories, an fdata
object of
length n
and t
as argvals
.
Eduardo García-Portugués.
# Same initial point plot(r_ou(n = 20, x0 = 5), col = viridisLite::viridis(20)) # Different initial points plot(r_ou(n = 100, alpha = 2, sigma = 4, x0 = 1:100), col = viridisLite::viridis(100))
# Same initial point plot(r_ou(n = 20, x0 = 5), col = viridisLite::viridis(20)) # Different initial points plot(r_ou(n = 100, alpha = 2, sigma = 4, x0 = 1:100), col = viridisLite::viridis(100))
Simulation of a Functional Regression Model with Functional Response (FRMFR) comprised of an additive mix of a linear and nonlinear terms:
where is a random variable in the Hilbert space of
square-integrable functions in
,
,
is the bivariate kernel of the FRMFR,
is a random variable in
,
and
is a nonlinear term.
In particular, the scenarios considered in García-Portugués et al. (2021) can be easily simulated.
r_frm_fr(n, scenario = 3, X_fdata = NULL, error_fdata = NULL, beta = NULL, s = seq(0, 1, l = 101), t = seq(0, 1, l = 101), std_error = 0.15, nonlinear = NULL, concurrent = FALSE, int_rule = "trapezoid", n_fpc = 50, verbose = FALSE, ...) nl_dev(X_fdata, t = seq(0, 1, l = 101), nonlinear = NULL, int_rule = "trapezoid", equispaced = equispaced, verbose = FALSE)
r_frm_fr(n, scenario = 3, X_fdata = NULL, error_fdata = NULL, beta = NULL, s = seq(0, 1, l = 101), t = seq(0, 1, l = 101), std_error = 0.15, nonlinear = NULL, concurrent = FALSE, int_rule = "trapezoid", n_fpc = 50, verbose = FALSE, ...) nl_dev(X_fdata, t = seq(0, 1, l = 101), nonlinear = NULL, int_rule = "trapezoid", equispaced = equispaced, verbose = FALSE)
n |
sample size, only required when |
scenario |
an index from |
X_fdata |
sample of functional covariates |
error_fdata |
sample of functional errors |
beta |
matrix containing the values |
s , t
|
grid points. If |
std_error |
standard deviation of the random variables
involved in the generation of the functional error |
nonlinear |
nonlinear term. Either a character string ( |
concurrent |
flag to consider a concurrent FLRFR (degenerate case).
Defaults to |
int_rule |
quadrature rule for approximating the definite
unidimensional integral: trapezoidal rule ( |
n_fpc |
number of components to be considered for the generation of
functional variables. Defaults to |
verbose |
flag to display information about the sampling procedure.
Defaults to |
... |
further parameters passed to
|
equispaced |
flag to indicate if |
r_frm_fr
samples the above regression model,
where the nonlinear term is computed by
nl_dev
.
Functional covariates, errors, and are generated
automatically from the scenarios in García-Portugués et al. (2021) when
scenario != NULL
(see the documentation of
r_gof2021_flmfr
). If scenario = NULL
,
covariates, errors and must be provided.
When concurrent = TRUE
, the concurrent FRMFR
is considered.
nl_dev
computes a nonlinear deviation
:
(for
"exp"
),
(
"quadratic"
) or
(
"sin"
). Also, can be manually set as an
fdata
object of length n
and valued in
the same grid as error_fdata
.
A list with the following elements:
X_fdata |
functional covariates, an
|
Y_fdata |
functional responses, an
|
error_fdata |
functional errors, an
|
beta |
either the matrix with |
nl_dev |
nonlinear term, an |
Javier Álvarez-Liébana.
García-Portugués, E., Álvarez-Liébana, J., Álvarez-Pérez, G. and Gonzalez-Manteiga, W. (2021). A goodness-of-fit test for the functional linear model with functional response. Scandinavian Journal of Statistics, 48(2):502–528. doi:10.1111/sjos.12486
## Generate samples for the three scenarios # Equispaced grids and Simpson's rule s <- seq(0, 1, l = 101) samp <- list() old_par <- par(mfrow = c(3, 5)) for (i in 1:3) { samp[[i]] <- r_frm_fr(n = 100, scenario = i, s = s, t = s, int_rule = "Simpson") plot(samp[[i]]$X_fdata) plot(samp[[i]]$error_fdata) plot(samp[[i]]$Y_fdata) plot(samp[[i]]$nl_dev) image(x = s, y = s, z = samp[[i]]$beta, col = viridisLite::viridis(20)) } par(old_par) ## Linear term as a concurrent model # The grids must be have the same number of grid points for a given # nonlinear term and a given beta function s <- seq(1, 2, l = 101) t <- seq(0, 1, l = 101) samp_c_1 <- r_frm_fr(n = 100, scenario = 3, beta = sin(t) - exp(t), s = s, t = t, nonlinear = fda.usc::fdata(mdata = t(matrix(rep(sin(t), 100), nrow = length(t))), argvals = t), concurrent = TRUE) old_par <- par(mfrow = c(3, 2)) plot(samp_c_1$X_fdata) plot(samp_c_1$error_fdata) plot(samp_c_1$Y_fdata) plot(samp_c_1$nl_dev) plot(samp_c_1$beta) par(old_par) ## Sample for given X_fdata, error_fdata, and beta # Non equispaced grids with sinusoidal nonlinear term and intensity 0.5 s <- c(seq(0, 0.5, l = 50), seq(0.51, 1, l = 101)) t <- seq(2, 4, len = 151) X_fdata <- r_ou(n = 100, t = s, alpha = 2, sigma = 4, x0 = 1:100) error_fdata <- r_ou(n = 100, t = t, alpha = 1, sigma = 1, x0 = 1:100) beta <- r_gof2021_flmfr(n = 100, s = s, t = t)$beta samp_Xeps <- r_frm_fr(scenario = NULL, X_fdata = X_fdata, error_fdata = error_fdata, beta = beta, nonlinear = "exp", int_rule = "trapezoid") old_par <- par(mfrow = c(3, 2)) plot(samp_Xeps$X_fdata) plot(samp_Xeps$error_fdata) plot(samp_Xeps$Y_fdata) plot(samp_Xeps$nl_dev) image(x = s, y = t, z = beta, col = viridisLite::viridis(20)) par(old_par)
## Generate samples for the three scenarios # Equispaced grids and Simpson's rule s <- seq(0, 1, l = 101) samp <- list() old_par <- par(mfrow = c(3, 5)) for (i in 1:3) { samp[[i]] <- r_frm_fr(n = 100, scenario = i, s = s, t = s, int_rule = "Simpson") plot(samp[[i]]$X_fdata) plot(samp[[i]]$error_fdata) plot(samp[[i]]$Y_fdata) plot(samp[[i]]$nl_dev) image(x = s, y = s, z = samp[[i]]$beta, col = viridisLite::viridis(20)) } par(old_par) ## Linear term as a concurrent model # The grids must be have the same number of grid points for a given # nonlinear term and a given beta function s <- seq(1, 2, l = 101) t <- seq(0, 1, l = 101) samp_c_1 <- r_frm_fr(n = 100, scenario = 3, beta = sin(t) - exp(t), s = s, t = t, nonlinear = fda.usc::fdata(mdata = t(matrix(rep(sin(t), 100), nrow = length(t))), argvals = t), concurrent = TRUE) old_par <- par(mfrow = c(3, 2)) plot(samp_c_1$X_fdata) plot(samp_c_1$error_fdata) plot(samp_c_1$Y_fdata) plot(samp_c_1$nl_dev) plot(samp_c_1$beta) par(old_par) ## Sample for given X_fdata, error_fdata, and beta # Non equispaced grids with sinusoidal nonlinear term and intensity 0.5 s <- c(seq(0, 0.5, l = 50), seq(0.51, 1, l = 101)) t <- seq(2, 4, len = 151) X_fdata <- r_ou(n = 100, t = s, alpha = 2, sigma = 4, x0 = 1:100) error_fdata <- r_ou(n = 100, t = t, alpha = 1, sigma = 1, x0 = 1:100) beta <- r_gof2021_flmfr(n = 100, s = s, t = t)$beta samp_Xeps <- r_frm_fr(scenario = NULL, X_fdata = X_fdata, error_fdata = error_fdata, beta = beta, nonlinear = "exp", int_rule = "trapezoid") old_par <- par(mfrow = c(3, 2)) plot(samp_Xeps$X_fdata) plot(samp_Xeps$error_fdata) plot(samp_Xeps$Y_fdata) plot(samp_Xeps$nl_dev) image(x = s, y = t, z = beta, col = viridisLite::viridis(20)) par(old_par)