A Panel Regression Analysis to Determine Returns on Assets of Banks in Ghana

This article aimed at examining the determinants of the returns on assets of banks using a panel regression analysis. The explanatory variables considered in the regression model were Capital Adequacy, Assets Quality, Management Quality and Liquidity Ratio. The effects of the explanatory variables on the returns on assets were explored by fitting a panel multiple regression model to the data. The results of the study show that the observed returns across the banks do not change over time, thus indicating that the series is stationary. There is evidence of significant differences across the banks considered for the study, hence the use of Random effects model. All the explanatory variables considered in the study including the intercept were found to have significant effects statistically on the returns on assets across the banks. About 73% of the variations in the returns on assets across the banks can be accounted for by the independent variables.


Introduction
Banks formed the most significance element and play an important role in the functioning of the financial economy of every country as the greater composition of the financial sector is typical made up of the banks. As such analyzing a variable that form part of the determinants of the overall performance of the bank is a matter of imperative importance to the economies of many countries. (IMF, 2009). Earlier research show that there is a strong positive relationship between determinants such as equity capital ratio, deposits, returns on assets and returns on equity with the overall performance of the bank (Eski et al., 2011). The findings of the research paper of Kaya (2002) on the Turkish banking sector profitability determinant using panel data indicates that capital, loans, liquidity among others are the most important contributors of the returns of assets of an institution such as the bank.
Returns of assets is a measure of the profitability of the bank's assets in generating revenue, hence a higher percentage of returns on assets shows that the bank is more profitable. Panel data (also known as longitudinal or cross-sectional time-series data) is a data set in which the behavior of entities is observed across time (Ljung, and Box, 1978).
It's a method used to estimate the economic relationship with cross section series observed over a period of time (Torres-Reyna, 2010). In this piece of work, the researcher's main idea is to use panel data regression model to analyze the determinants of the Return on Assets of Ghanaian Banks listed on the Ghana Stock Exchanged from 2009 to 2019(Quaicoe et al., 2015. The explanatory variables of the regression model were expressed using the CAMEL rating-an international bank rating system where banks are assessed using five factors that are abbreviated CAMEL; C-Capital adequacy, a measure of the ability of the bank to retain enough equity capital to pay depositors whenever they ask for their money whiles still having enough to fund to enhance the assets of the bank (Berger, 1995). A-Asset quality is a capital measure of risk that specifies the quality of the loans granted by the bank; M-Management quality deals with the ability of top management to offer professional opinion to policies and procedures, ability to take risk and formulate strategic plans for the bank; E-Earning ability is the how management of the bank ensures the sufficient quantity of capital through retained earnings; L-Liquidity is the tendency of management to make cash ready or turn short term assets into cash. (Nuriyeva, 2014). However, earning ability of the banks were not considered in this work due to certain constraint in the data collection.

Methodology
This piece of work is a model based on information and real data source from the Ghana Stock Exchange and the Bank of Ghana. The analysis includes fitting a panel data multiple regression model for the data, i.e. finding the optimum model for the data. The sample data include annual nominal data of CAML and returns on assets from 2009 to 2019 comprising of 66 data points and was analyzed using the statistical computing package R. The stationarity of data is usually described by the time plots and the correlogram. The unit root determines whether a time series is stable around its level or stable around the difference in its level. (Dickey-Fuller or Augmented Dickey-Fuller root test) ( Dickey, & Fuller 1979) The Linear Panel Regression Model The basic linear panel models used in econometrics can be described through suitable restrictions of the following general model: Where i = 1,…, M, represents the individual banks, t = 1, ..., T is the time index and uit is a random disturbance term of mean 0. Random disturbance (uit) is not estimable with N = n × T data points. A number of assumptions are usually made about the parameters, the errors and the exogeneity of the regressors, giving rise to a taxonomy of feasible models for panel data.
The most common assumption is parameter homogeneity, which means all parameters (constant and slope coefficients) are identical, that is, αi = α for all i and βi=β for all i,t. The resulting model is represented in (2) (Hurling, 2018).
Equation 2 is a standard linear model pooling all the data across i and t. To model individual heterogeneity, we often assume that the error term has two separate components-the observed and unobserved error terms. The unobserved error term, also known as the idiosyncratic, is specific to the individual and doesn't change over time. This is called the unobserved effects model (Equation 3).
The appropriate estimation method for this model depends on the properties of the two error components. The idiosyncratic uit is usually assumed well-behaved and independent of both the regressors Xit and the individual error component μi. The individual component may be in turn either independent of the regressors or correlated. If it is correlated, the ordinary least squares (OLS) estimator of β would be inconsistent, so it is customary to treat the μi as a further set of n parameters to be estimated, as if in the general model αit = αi for all t. This is called the fixed effects (within) model or least squares dummy variables which are usually estimated by ordinary least square on transformed data and gives consistent estimates for β. (Torres-Reyna, 2010;Hurling, 2018).
If the individual-specific component μi is uncorrelated with the regressors, a situation which is usually termed random effects, the overall error μit in similar fashion, uncorelates with the regressors thereby rendering the OLS estimator consistent. Nevertheless, the common error component over individuals induces correlation across the composite error terms, making OLS estimation inefficient. As a result, one has to resort to some form of feasible generalized least squares (GLS) estimators. This is based on the estimation of the variance of the two error components, for which there are a number of different procedures available. In general, if the error terms are correlated, then the fixed model is not suitable, and we may consider the random effect. The rationale behind the random effects model is that unlike the fixed effects model, the variation across entities (country, company, etc.) is assumed to be random and uncorrelated with the predictor variables included in the model. The crucial distinction between fixed and random effects is whether the unobserved individual effect embodies elements that are correlated with the regressors in the model, not whether these effects are stochastic or not" (Greene, 2008). If we have any reason to believe that differences across entities have some influence on our dependent variable, then we may consider the random effects model. An advantage of random effects is its flexibility to include time invariant variables which in the fixed effects model, these variables are absorbed by the intercept. Random effects assume that the entity's error term is not correlated with the predictors which allows for time-invariant variables to play a role as explanatory variables. In random effects, one needs to specify those individual characteristics that may or may not influence the predictor variables. The problem with this is that some variables may not be available therefore leading to omitted variable bias in the model. The random effect allows us to generalize the inferences beyond the sample used in the model.
If the individual components of the error term are missing altogether, pooled OLS is the most efficient estimator for β. This assumption is usually labeled pooling model. This assumption actually refers to the properties of the errors and the appropriate estimation method rather than the model itself. If one relaxes the usual hypotheses of well-behaved white noise errors and allows for the idiosyncratic error to be arbitrarily heteroskedastic and serially correlated over time, a more general kind of feasible GLS is needed, called the unrestricted or general GLS. This specification can also be augmented with individual-specific error components possibly correlated with the regressors in which case it is termed fixed effects GLS.
Another way of estimating unobserved effects models through removing timeinvariant individual components is by first-differencing the data: lagging the model and subtracting, the time-invariant components (the intercept and the individual error component) are eliminated, and the model is presented in Equation 4.
In Equation 4, ΔYit = Yit -Yi,t-1 =ΔXit -Xit -Xi, t-i and from Equation 3, Δuit = uit-ui,t-1=Δεit for t = 2, ……., T. Parameters can be consistently estimated by pooled OLS. This is called the first-difference, or FD estimator. The relative efficiency of the FD estimator sufficiently explains why it is chosen over other consistent alternatives. The properties of the error term play integral role in explaining this relative efficiency. The FD estimator is usually preferred if the errors uit are strongly persistent in time since Δ uit will tend to be serially uncorrelated. Lastly, the between model, which is computed on time (group) averages of the data, discards all the information due to intergroup variability but is consistent in some settings (e.g., non-stationarity) where the others are not, and is often preferred to estimate longrun relationships.
Variable coefficients models relax the assumption that βit = β for all i, t. Fixed coefficients models allow the coefficients to vary along one dimension, like βit = β for all t. Random coefficients models assume that coefficients vary randomly around a common average, as βit = β +ni for all t, where ni is a group-specific effect with mean zero at a given period of time. (Torres-Reyna, 2010).

Fixed or Random Model: Hausman Test
The choice between fixed or random effects depends on the outcome of the Hausman test which tests the null hypothesis which states that the preferred model hinges on random effects against the alternative hypothesis that model is based on fixed effects (Greene, 2008). Fundamentally, the test ascertains whether, or otherwise, the unique errors (ui) are correlates of the regressors. In this test, we save the estimates from running both fixed effects and random effects models after which the Hausman test is performed. In the event that the p-value is significant (for example, p <0.05) then, the fixed effects applies, otherwise, random effects is used.

Testing for Random Effects: Breusch-Pagan Lagrange Multiplier (LM)
The LM test help you to decide between a random effect regression and a simple OLS regression. The null hypothesis in the LM test is that the variances across the entities is zero. That is no significant difference across the entities. (i.e. no panel effect).

Results and Discussion
Distribution of ROA We begin the analysis by observing the distribution and testing for the stochastic trend of the observed series of the returns on assets across the banks considered in the study. Figure 1 shows the time plot, histogram and normal Q-Q plot of Return on Assests recorded across the banks from 2009 to 2019. It could be observed from figures that the return of assets series appears to be stationary around the mean. By performing the unit root test on the series, we found that the Augmented Dickey-Fuller (ADF) root test statistic (-5.29) is lower than the critical value (-2.86), at 5% significance level showing that we reject the null hypothesis that there is a unit root in the series which is supported by a p-value of 0.01, hence the series is stationary and therefore does not need any differencing. All the independent variables considered failed the stationarity test and so require differencing.

Figure 1: Time Plot of Returns on Assets from 2008 to 2020
The result of the stationary test is shown in Table 1. Unit roots in variables Capital adequacy, asset quality and management quality were eliminated after the first difference whiles the second differenced was applied on liquidity ratio to make it stationary. The histogram in Figure 2 looks symmetric but slightly skewed to the right. The normal Q-Q plot as shown in Figure 3 indicates that the empirical distribution of the series is nonnormal as confirmed by Shapiro-Wilk test of normality statistic, W=0.88 with p-value of 9.17*10 -6 allowing us to reject the null hypothesis which states that the sample is normally distributed.

Figure 2: Distribution of the Returns on Assets
The *s define the number of times the specified variable was differenced before attaining stationarity as given by the level of integration. 1 The variable was stationary and was not differenced 2 The variable was differenced once 3 The variable was differenced twice before it became stationary

Test for Heteroscedasticity (ARCH Effect)
Financial data in which the variances of the error term may be expected to be larger for some points or ranges of the data than others (non-constant variance) are said to suffer from heteroscedasticity. In the presence of heteroscedasticity, the regression coefficient and standard error computed will be too narrow, giving a false sense of precision. Two tests were conducted, the Box-Ljung test for ARCH effect produces a test statistic of 6.42 and a p-value of 0.07 while the Lagrange Multiplier (LM) test for ARCH also resulted in a test statistic of 116.09 supported by a p-value of 0.10. Both tests were not significant at 5% significance level, hence we failed to reject the null hypothesis of no ARCH effect and conclude that there is no problem of heteroscedasticity in the data. Table 2 shows the descriptive statistics for both the dependent variable (Return on Assets) and the explanatory variables (capital adequacy, assets quality, management and liquidity). With a minimum value of 0.63 and a maximum of 1.58, Returns on Assets recorded an average mean of 0.79 and standard deviation of 0.41. Capital adequacy which measures the ability of the bank to retain enough equity capital has a mean value of 9.94 and standard deviation of 1.29. Assets' quality is a measure of risk that specify the quality of the loans granted by the bank has a minimum value of 0.23 and maximum value of 1.57 and mean of 0.56 with standard deviation of 0.15. Management quality has a mean of 1.14 and standard deviation of 0.15 liquidity ratio recorded a mean value of 6.32 and a standard deviation of 2.32. Table 3 presents the correlation between the variables and it depicts that, there is a weak linear relationship between Return on Assets and all the explanatory variables.  Among the explanatory variables, there is a weak positive linear relationship between the capital adequacy and assets quality but a weak negative relationship between capital adequacy and management quality and liquidity with a correlation coefficient of -0.11 and -0.32 respectively. There is also a weak positive linear relationship existing between Asset quality, management quality and liquidity. These indicates that both the response variable and the independents variables are not correlated.

Testing for Multicollinearity
It is important to check for multicollinearity among the independent variables. We will consider two recognized tests that can detect multicollinearity. These tests are: the tolerance test and the variance inflation factor test (VIF). According to Menard (2002), a tolerance value of less than 0.1 indicates a certain and serious collinearity problem. On the other hand, Myers (1990) indicated that VIF value of greater than 10 must call for a concern of multicollinearity. Table 4 shows the results. We could see that none of the tolerance level is below 0.1 and the VIF values are well below 10. Therefore, multicollinearity is not a problem or concern for this study.

Panel Multiple Regression: Fixed Effect Model
The fixed effect model explores the relationship between the predictor and the outcomes variables within an entity (each bank). It is assumed that there is a time invariant variable within an entity that may impact or bias the variables that need to be controlled. The fixed effect model removes the effects of those time-invariant characteristics to make it easy to assess the net effect of the predictor on the outcome variable. The time-invariant variables are absorbed by the intercept. We begin by testing whether the explanatory variables put together have an effect on the response variable (ROA) within each bank by testing the null hypothesis Ho: β1 = β2 = … = βP = 0. Table 5 shows the results of the regression analysis. The F (4, 61) = 10.10 and Prob. 1*10 -4 indicates that the variables are different from zero, we clearly reject the null hypothesis which states that the explanatory variables (capital adequacy, asset quality, management quality and liquidity) within each bank collectively has no effect on the Return on Assets. Thus, the predictor variables within each bank have an influence on the return on asset (Table 5). The value of the adjusted 2 is 0.69, which indicates that the model is appropriate with approximately 69% of the variation in returns on assets being explained by the independent variables.

Testing for the Significance of the Explanatory Variables (Fixed Effect Model)
From Table 5, our panel multiple regression equation is given by ROA = 0.61 Capital Adequacy. + 1.48 Assets Quality. -0.29 Mgt. Quality -0.03 Liquidity Ratio. We can also see from the Table 5 that the explanatory variables are statistically significant with pvalues 0.0060, 0.0032, 0.0146 and 0.0074 respectively at 5% level of significance and stationery and can explained the relationship with ROA of the banks. We therefore conclude that all explanatory variables have a significant effect on the returns of assets of the banks. We can also say that there is direct relationship (positive effect) between Capital Adequacy and Assets quality with the returns on assets. Thus, as they increase, the returns on assets also increases. Conversely as Management Quality and Liquidity decrease returns on assets increases indicating an inverse relationship (negative effect) between them.

Panel Multiple Regression: Random Effect Model
Unlike the fixed effect model, the random effect model assumes that the variations across entities (the banks) is random and uncorrelated with the predictor or independent variables in the model. Here we believe that differences across entities have some influence in our response variable. The random model assumes the bank's error term is not correlated with the predictor variables which allows for time-invariant variables to play roles as predictor variables. It also allows us to generate inference beyond the sample used in the model. We begin by testing whether the explanatory variables put together have an effect on the response variable (ROA) across the banks by testing the null hypothesis HO: β1 = β2 =…= βP = 0. The results of the regression analysis are shown in Table 6. The F (4, 61) = 11.24 and p-value = 0.0002 indicates that the variables are different from zero, we clearly reject the null hypothesis that the explanatory variables (capital adequacy, asset quality, management quality and liquidity) across each bank collectively has no effect on the Return on Assets. Thus, the predictor variables within each bank has an influence on the return on asset. The model appears to be appropriate as about 73% of the variation in returns on assets is being accounted for by the explanatory variables used in the model as shown by Adjusted 2 R (Table 6). Source: Analysis of field data, 2020

Testing for the Significance of the Explanatory Variables (Random Effect Model)
Our panel multiple regression equation as shown in Table 6 is given by ROA = 0.30+ 0.42 Capital Adequacy + 0.19 Assets Quality. -0.25 Management Quality -0.10 Liquidity Ratio. We can also see from the Table 6 that the explanatory variables are statistically significant with p-values 0.0000, 0.0002, 0.0000, 0.0004 and 0.0001 respectively at 5% level of significance and explained the relationship with the ROA of the banks. We therefore conclude that all explanatory variables have a significant effect on the returns of assets of the banks. We can also say that there is direct (positive effect) relationship between Capital Adequacy and Assets quality with the returns on assets. Thus, as they increase, the returns on assets also increases. Conversely as Management Quality and Liquidity decrease returns on assets increases indicating an inverse relationship (negative effect) between them.
We can also see that the intercept is also significant with a positive coefficient indicating that the Returns on the Assets of the bank will increase even if the explanatory variables remained unchanged.

Model Diagnosis: Random Effect Model or Fixed Effect Model (Hausman Specification Test)
Using the Hausman specification test, model diagnosis was conducted to select between the fixed and random effect estimation techniques to determine the model that is most suitable for the observed data since both models appear to be appropriate for the observed data. With a Hausman's chi square test statistics of 0.70 and 4 degrees of freedom supported by a Probability value of 0.40, the test was not significant at 5% level of significance so we fail to reject the null hypothesis that the errors are not correlated with the regressors (the independent variables). We therefore conclude that the random effect model is the most suitable model that explains the stochastic mechanisms for the observed data.

Testing for the Random Effect: Lagrange Multiplier (LM) Test
The Breusch-Pagan Lagrange Multiplier (LM) test helps to decide best model between a random effect regression and a simple ordinary least square regression (Breusch and Pagan, 1980). The null hypothesis in the LM test is that variances across entities (the banks) is zero. Its test whether there is a significance differences across the entities i.e. there is no panel effect in the data. With a chi-square value of 11.27 with 1 degrees of freedom and supported by a P-value of 0.0001, the null hypothesis of no panel effect was rejected at 5% level of significance and a conclusion that there is evidence of significant differences across the banks was made i.e. there is a random effect, hence the fitted random effect model is appropriate for the data. We can therefore make inferences based on the fitted random effect model.

Conclusion and Policy Implications
The results of the study show that the observed returns on assets across the banks series does not change over time, thus showing that the series is stationary; hence the probability law that governs the behavior of the process does not change over time. It was reveals from the study that the optimum model for the observed data was the panel multiple regression using the random effect model with regression equation The analysis of the study also shows that there is evidence of significant differences across the banks considered for the study, hence the use of random effect model. All explanatory variables considered in the study including the intercept were found to have a significant effect statistically on the returns on assets across the banks.