## Based on the code publish on qrmturorial.org
## Original Code written by Alexander McNeil and Marius Hofert

#################################################################################
########## EXERCISE 1: EMPIRICAL PROPERTIES OF FINANCIAL LOG-RETURNS ############
#################################################################################
## The goal of this exercise is to empirically verify the following observations about
## log-returns. We focus on univariate time series, and refer to the Quantitative Risk Management
## book by McNeil. et al, as well as the summary paper "Empirical Properties of Asset
## Returns: stylized facts and statistical issues" by Rama Cont for further details. 

## The stylized facts that we describe typically apply to time series of daily log-returns
## and often continue to hold for weekly and monthly data as well. Very high-frequency
## financial time series have their own stylized facts that we do not cover here. 


## Univariate stylized facts:
## (U1) Return series are not iid although they show little serial correlation;
## (U2) Series of absolute (or squared) returns show profound serial correlation;
## (U3) Conditional expected returns are close to zero;
## (U4) Volatility (conditional standard deviation) appears to vary over time;
## (U5) Extreme returns appear in clusters;
## (U6) Return series are leptokurtic or heavy-tailed (power-like tail). This holds for short time horizon. 
## (U6 bis): Aggregational Gaussianity: For longer time horizon, distribution of returns look more Gaussian. 
## (U7) Leverage effect: market information should have an assymetric effect
##      on volatility, whereby bad news leading to a fall in the equity value
##      of a company tend to increase the volatility, good news, leading to
##      an increase in the equity value have much smaller effect on the volatility. 


### Setup ######################################################################
# You will need to use the following packages. 
# If not already installed, you need to run the install.packages("package_name") command
# before running the following lines. 

library(xts) # Provide for uniform handling of R's different time-based data classes by extending zoo
library(qrmtools) # Functions and data sets for reproducing selected results from the book "Quantitative Risk Management: Concepts, Techniques and Tools"
library(quantmod)

### 1 Getting and preprocessing the data ######################################
# In this exercise you will work with stock market data of Goldman Sachs (GS), Amazon (AMZN) and Microsoft (MSFT).
# Make sure you understand how the data is downloaded and preprocessed. 
# If you don't know an R function, use the ?function_name command to open the help file. 

tickers <- c("GS", "AMZN", "MSFT")
getSymbols(tickers,
           from = "2000-01-01",
           to = "2020-03-15")
# For this exercise we only use the close prices. 
GS <- GS[,"GS.Close"]
AMZN <- AMZN[,"AMZN.Close"]
MSFT <- MSFT[,"MSFT.Close"]

## Build log-returns
GS.X <- returns(GS)
AMZN.X <- returns(AMZN)
MSFT.X <- returns(MSFT)
## Remove zero returns (caused almost certainly by market closure)
GS.X <- GS.X[GS.X != 0]
AMZN.X <- AMZN.X[AMZN.X != 0]
MSFT.X <- MSFT.X[MSFT.X != 0]

## Merge and aggregate
X <- merge(GS = GS.X, AMZN = AMZN.X, MSFT = MSFT.X, all = FALSE)
X.w <- apply.weekly(X, FUN = colSums) # weekly log-returns (summing within weeks)
X.m <- apply.monthly(X, FUN = colSums) # monthly log-returns


### QUESTION 1 #################################################################
## The goal of this question is to verify (U1) and (U2)
## Plot the Auto- (and crosscorrelations) of returns and absolute returns. 
## What do you observe? 

### SOLUTION 1 ################################################################
# The correlograms of the raw and absoulte data can be obtained using the following commands
acf(X)
acf(abs(X))
acf(X.w)
acf(abs(X.w))
# We observe that there is little evidence for serial correlation in the raw data, but the
# absolute values of the raw financial data appear to show evidence of serial dependence. 
# More than 5 % of the estimated correlations lie outside of the dashed lines, which represent the 
# 95 % confidence intervals for serial correlations when the underlying process consists of iid
# finite variance random variables. This serial dependence seems to confirm the presence of volatility 
# clustering, and provides a strong evidence against the iid hypothesis for the log returns. In particular, 
# this contradicts the geometric Brownian motion hypothesis for the continuous time development of prices, 
# on which the Black Scholes pricing theory is based. 

### QUESTION 2 ################################################################

## a) Plot the log returns of each index using the plot.zoo() function
##    Plot the evolution of the stock prices for the same period. 
## b) Volatility is often formally modelled as the conditional standard deviation of financial returns
##    Compute the monthly volatilities using the apply.monthly() function and plot the results. 
## c) What do you observe? 
 
### SOLUTION 2 ################################################################

# Plot for log-returns
plot.zoo(X, xlab = "Time", main = "Log-returns") # this (partly) confirms (U3)
# Plot for the stock prices
S = merge(GS,AMZN,MSFT, all = FALSE)
names(S) <- c("GS", "AMZN", "MSFT")
plot.zoo(S, xlab = "Time", main = "Stock prices")
## Plot of volatilities
X.vols <- apply.monthly(X, FUN = function(x) apply(x, 2, sd)) # componentwise volatilities
plot.zoo(X.vols, xlab = "Time", main = "Volatility estimates")

## ALthough the conditional expected returns are consistently close to zero (U3), 
## the presence of volatility clustering suggests that conditional standard deviations are
## continually changing in a partly predictable manner (U4).

### QUESTION 3 #####################################################################
# In this question we study the extreme returns and further illustrate the volatility clustering effect

## a) To get a first empirical validation of (U5), you are asked to plot the largest losses. 
##    Define the losses as the negative log returns, and compute an empirical (1-r)-quantile
##    for r = 0.01. Finally plot the exceedances over the chosen empirical quantile. 
##     What do you observe? 

## b) One can show (admitted) that the very largest values of an iid data will occur
##    like events in a Poisson process, separated by waiting times that are iid with an 
##    exponential distribution. Compute these waiting times, and compared them to an exponential 
##    distribution using a QQ-plot. Does the exponential hypothesis hold? 

### SOLUTION 3 #####################################################################

## a) 
L <- -AMZN.X # consider -log-returns of Amazon
r <- 0.01 # probability to determine large losses
u <- quantile(L, probs = 1 - r) # determine empirical (1-r)-quantile
xtr.L <- L[L > u] # exceedances over the chosen empirical quantile = largest so-many losses
plot(as.numeric(xtr.L), type = "h", xlab = "Time",
     ylab = substitute("Largest"~r.*"% of losses ("*n.~"losses)",
                       list(r. = 100 * r, n. = length(xtr.L))))
## We can partially already conclude that extremes returns tend to appear in clusters.

## b)
## Now consider spacings
spcs <- as.numeric(diff(time(xtr.L)))
qq_plot(spcs, FUN = function(p) qexp(p, rate = r)) # r = exceedance probability
## The QQ-plot provides a poor fit, suggesting a strong evidence against the 
## exponential hypothesis. They are too many short and long waiting times caused 
## by the clustering of extremes values. 


### QUESTION 4 #####################################################################
# We study the typical non-normality and heavy tails of daily returns. . 
# Plot a normal QQ-plot for each of the three indices. What can you conclude about the 
# tail behaviour of the daily log-returns? Repeat the exercise with the weekly and monthly log-returns. 
# What do you observe? 

### Solution 4 #####################################################################
# daily log-returns
layout(t(1:3)) # (1,3)-layout
for (i in 1:3)
    qq_plot(X[,i], FUN = qnorm, method = "empirical", # ... to account for the unknown location (mu) and scale (sig) of qnorm()
            main = names(X)[i]) # equivalent to qqnorm(); qqline()
layout(1) # restore layout
## The QQ-plots illustrate the leptokurtosis (heavier tails than a normal) of the daily log-returns. 
## The inverted S-shaped curve of points suggest that the more extreme empirical quantiles of the 
## data tend to be larger than the corresponding quantiles of a normal distribution, indicating that 
## the normal distribution does not place enough probability mass on extreme returns. This again
## contradicts the assumptions of the famous Black-Scholes model in continuous time. 

# weekly log-returns
layout(t(1:3)) # (1,3)-layout
for (i in 1:3)
  qq_plot(X.w[,i], FUN = qnorm, method = "empirical", # ... to account for the unknown location (mu) and scale (sig) of qnorm()
          main = names(X.w)[i]) # equivalent to qqnorm(); qqline()
layout(1) # restore layout

# monthly log-returns
layout(t(1:3)) # (1,3)-layout
for (i in 1:3)
  qq_plot(X.m[,i], FUN = qnorm, method = "empirical", # ... to account for the unknown location (mu) and scale (sig) of qnorm()
          main = names(X.m)[i]) # equivalent to qqnorm(); qqline()
layout(1) # restore layout

# We observe that the weekly data still illustrates leptokurtic behaviour, but much less severe than the daily returns. 
# The heavy tailed behaviour is much less pronounced for the monthly returns, where the Gaussian drotribution sees reasonable. 
# This confirms (U6 bis), and makes sense intuitively. 
# Indeed, note that the h-period (h=5 for a week, h=25 for monthly) log returns at time t satisfy
# X_t^(h) = ln(S_t/S_{t-h}) = ln(S_t/S_{t-1} S_{t-1}/S_{t-2} ... S_{t-h+1}/S_{t-h}) = \sum_{j=0}^{h-1} X_{t-j}
# Due to the sum structure of the h-period returns, it is expected that some central limit effect will take place, whereby
# their distribution becomes less leptokurtic and more normal as h is increased. 

### Question 5 #######################################################################
## The goal of this question is to formally study the leverage effect.
## Recall that X.vols contains monthly volatilities, and X.m monthly returns.
## Compute the correlation between X.vols and X.m. What do you expect? 

### Solution 5 #######################################################################
# We expect a negative correlation between the price movements and the volatility due
# to the leverage effect. Indeed, 
cor(X.m,X.vols)
# is negative for all assets, suggesting that drops in stock prices will lead to higher uncertainty (volatilty).  

#################################################################################
######################### EXERCISE 2: GARCH models ##############################
#################################################################################
# The goal of this exercise is to demonstrate that GARCH models can replicate 
# the empirical observations (U1)-(U6) made in the previous exercise. We also show 
# limits of GARCH models, and introduce some generalizations. 

# load additional library to study GARCH models
library(rugarch)

### QUESTION 1 ##################################################################
# Walk through the following code which specifies a GARCH(1,1) model with Gaussian
# innovations, and then simulates two paths of lenght 2000 from the specified model. 
# The paths are saved in an R object of class 'uGARCHpath'. For more details about 
# such objects, we can use the getClass("uGARCHpath") command. 

### 1 Specify the GARCH(1,1) model #############################################
## GARCH(1,1) model specification
(uspec <- ugarchspec(variance.model = list(model = "sGARCH", # standard GARCH
                                           garchOrder = c(1, 1)), # GARCH(1,1)
                     mean.model = list(armaOrder = c(0, 0), # no ARMA part
                                       include.mean = FALSE), # no mean included
                     distribution.model = "norm", # normal innovations
                     fixed.pars = list(omega = 0.02, # cond. var. constant (= alpha_0)
                                       alpha1 = 0.15, # alpha_1
                                       beta1 = 0.8))) # beta_1

### 2 Generate paths from a specified model ####################################
## Generate two realizations of length 2000
m <- 2 # number of paths
n <- 2000 # sample size
(paths <- ugarchpath(uspec, n.sim = n, n.start = 50, # burn-in sample size
                     m.sim = m))
str(paths) # S4 object of class 'uGARCHpath'
str(paths@path) # simulated processes X_t, volatilities sigma_t and unstandardized residuals epsilon_t

## We can use getClass() to see more information about such objects
getClass("uGARCHpath")

## We can also use getSlots() to see the composition
getSlots("uGARCHpath")

## EXERCISE: a) What are the parameters of the simulated GARCH model? 
##           b) Try simulating paths with different innovation distributions (e.g. Student t)

## SOLUTION: a) alpha_0 = 0.2, alpha_1 = 0.15, beta_1 = 0.8
##           b) replace the distribution.model='norm' argument of the ugarchspec function
##              by distribution.model='std'

### QUESTION 2 ###############################################################################
## We first extract the simulated series
X <- fitted(paths) # simulated process X_t = mu_t + epsilon_t for epsilon_t = sigma_t * Z_t
sig <- sigma(paths) # volatilities sigma_t (conditional standard deviations)
eps <- paths@path$residSim # unstandardized residuals epsilon_t = sigma_t * Z_t
## Note: There are no extraction methods for the unstandardized residuals epsilon_t
## Sanity checks (=> fitted() and sigma() grab out the right quantities)
stopifnot(all.equal(X,   paths@path$seriesSim, check.attributes = FALSE),
          all.equal(sig, paths@path$sigmaSim,  check.attributes = FALSE))

## a) Plot the simulated paths (X_t)
## b) Plot the corresponding volatilities (sigma_t)
## c) Plot the corresponding unstandardized residuals (epsilon_t)
## d) Plot the standardized residuals.
## e) Check using an appropriate QQ-plot for each simulation
##    the dsitribution of the standarized residuals. 

### SOLUTIONS: 
## a) Plot the simulated paths (X_t)
plot(X[,1], type = "l", ylab = expression(X[1]))
plot(X[,2], type = "l", ylab = expression(X[2]))

## b) Plot the corresponding volatilities (sigma_t)
plot(sig[,1], type = "h", ylab = expression(sigma[1]))
plot(sig[,2], type = "h", ylab = expression(sigma[2]))

## c) Plot the corresponding unstandardized residuals (epsilon_t)
plot(eps[,1], type = "l", ylab = expression(epsilon[1]))
plot(eps[,2], type = "l", ylab = expression(epsilon[2]))

## d) Compute the standardized residuals (Z_t)
Z <- eps/sig
plot(Z[,1], type = "p", ylab = expression(Z[1]))
plot(Z[,2], type = "p", ylab = expression(Z[2]))
## e) Check their N(0,1) distribution
qq_plot(Z[,1])
qq_plot(Z[,2])


### QUESTION 3 ########################################################
## In this question, consider the first path only. 
## Does the simulated series conform to our univariate stylized facts?
## Which styled facts are reproduced by GARCH models? Is there some empirical 
## property of financial returns that GARCH models cannot model? 

### SOLUTION ##########################################################
## We (only) consider the first path here
# (U1)
acf(X[,1], main = expression(X[1])) # => no serial correlation (U1: correct)
# (U2)
acf(abs(X[,1]), main = expression(group("|",X[1],"|"))) # => profound serial correlation (U2: correct)
# (U3)
mean(X[,1]) # close to zero (U3: correct)
# (U4)
plot(sig[,1], type = "h", ylab = expression(sigma[1])) # volatility varies over time (U4:correct)
# (U5)
L <- ts(-X[,1]) # consider -log-returns
r <- 0.05 # probability to determine large losses
u <- quantile(L, probs = 1 - r) # determine empirical (1-r)-quantile
xtr.L <- L[L > u] # exceedances over the chosen empirical quantile = largest so-many losses
plot(as.numeric(xtr.L), type = "h", xlab = "Time",
     ylab = substitute("Largest"~r.*"% of losses ("*n.~"losses)",
                       list(r. = 100 * r, n. = length(xtr.L)))) # shows evidence for volatility clustering (U5: correct)
# (U6)
qq_plot(X[,1], method = "empirical", main = "Normal Q-Q plot") # => heavier tailed than normal (U6: correct)
shapiro.test(X[,1]) # Normal rejected (U6: correct)
# (U7)
## Note that by default, in GARCH models the volatility reacts to recent returns regardless of their sign in a 
## symmetric way. Economic theory however suggests that market information should have an assymetric effect on 
## volatility (U7). One way of adding leverage effect to GARCH models is by making the time t dynamics depend on
## whether the previous value of the process X_{t-1} was below or above a fixed threshold. For more details, 
## please refer to the QRM book page 122 and 123.