Inequality and water pollution in India

India is notorious for high inequality and high water pollution. There is a growing body of literature that says inequality is harmful to the environment, but it does not receive strong empirical support. We discuss some econometric problems that may have caused mixed ﬁ ndings in the empirical literature and use appropriate tools to overcome the problems. Our empirical results using Indian time-series data show (i) that inequality leads to an increase in water pollution, (ii) that the magnitude of inequality is nearly as large as that of corruption, suggesting that reducing inequality is almost as important as curbing corruption in addressing water pollution challenges in India, and (iii) that increases in water pollution, in turn, widen inequality in India. Our results are robust to various sensitivity checks. We also ﬁ nd no evidence of the environmental Kuznets curve hypothesis for water pollution in India. (cid:129) We examine the relationship between inequality and water pollution in India and take into account econometric issues in the literature. (cid:129) We ﬁ nd that inequality leads to an increase in water pollution and the magnitude of inequality is nearly as large as that of corruption. (cid:129) Increases in water pollution, in turn, widen inequality in India.


INTRODUCTION
In India, rivers are much more than bodies of water. Indian rivers are believed among the majority Hindus to be sacred, can wash away sins, and bring people closer to god. Despite perceived to be pure, the rivers are not free from pollutionthey serve as a dumping ground for sewage, solid, and industrial wastes (Bari, 2018). Every day, more than 10.5 million gallons of wastewater flow into rivers and other watercourses in India (Hirani & Dimble, 2019). Water pollution in India has been posing a serious threat to the health of its economy and society. Recent research suggests that in a developing country like India, pollution upstream could cut economic growth in downstream regions by nearly a half percentage point . Every year, more than 400,000 Indians die from diarrheal illness due to inadequate sanitation and hygiene (DeFrancis, 2011). It is estimated that the health costs of water pollution in India amounted to about $6.7-8.7 billion per year (Mani et al., 2012). India has spent billions of dollars on clean-up efforts, yet serious water pollution still persists (The Economist, 2019a, para. 6). Given the significant impact of water pollution on the economy and society, gaining a better understanding of the underlying causes of water pollution is critical.
Seventh, some previous studies do not deal with potential reverse causality or simultaneity (Scruggs, 1998;Torras & Boyce, 1998;Grafton & Knowles, 2004;Kasuga & Takaya, 2017). Not only inequality could have an effect on water pollution but also water pollution could have an effect on inequality (through health). For example, water pollution impairs poor people's health as they are more exposed to such pollution than the rich are (Ravi Rajan, 2014). Poor health subsequently increases absenteeism and reduces productivity at work, reducing the earning capacity of the poor (Cole & Neumayer, 2006). Sickness also lowers the attendance of poor children in school, affecting their educational attainment and thereby future earnings (Cole & Neumayer, 2006). Failure to account for this reverse causation (i.e., water pollution affects inequality) will result in biased estimates. Possible solutionsfor example, IV and GMM estimatorshave limitations we mentioned above.
This study uses the Johansen (1995) approach to overcome the above limitations. It views all variables to be endogenous, and thus allows for simultaneous relation or reverse causation between inequality and water pollution. Biases due to omission of variables are also likely to be minimal because cointegration estimates are super consistent 3 and are robust to omitted variables (Bonham & Cohen, 2001). Moreover, a cointegration relation is invariant to increases in the information set ( Juselius, 2006, p. 11). It is different from standard regression analysis, where one additional variable can considerably change parameter estimates ( Juselius, 2006, p. 11) 4 . As shown by Hassler & Kuzin (2009), the Johansen approach is also robust towards measurement errors. This study focuses on a single country -Indiaand thus is not subject to problems associated with crosscountry regressions we mentioned aboveparameter homogeneity, cross-sectional dependence, and data comparability. We use data for a long time span   5 and focus on a long-run relationship. Hence, little year-to-year variations in inequality data should not be of concern in this study. One might argue that the sample size this study employs is small (T ¼ 31). Although the sample size is important for statistical analysis, the sample span is also important to uncover a true relationship between particular variables (Lahiri & Mamingi, 1995). A sample of 30 observations over a span of 30 years produces better information than a sample of, say, 300 observations over a span of 300 days; or a sample of 300 observations over a span of 300 months (25 years) 6 . However, this study will take appropriate measures (such as by applying a small sample correction method) whenever possible to guard against bias from a small sample.
The objective of this study is to examine the relationship between inequality and water pollution in India. We have several main results. First, we find that inequality increases water pollution in India. Second, the effect of inequality on water pollution is about as large as the effect of corruption. Third, we also observe that increases in water pollution, in turn, exacerbate inequality in India. It is worth mentioning that no evidence of the environmental Kuznets curve (EKC) hypothesis is found for water pollution in India.
The rest of the article is organized as follows. Section 2 presents a literature review. Sections 3 and 4 describe data and methodology, respectively. Section 5 presents the results. Section 6 concludes, and Section 7 provides policy recommendations.
3 Super consistent estimates converge to the true parameter values at a faster rate than consistent estimates do, as sample size increases. 4 Cointegration tests can also serve as a misspecification test (Stern, 2011). A lack of cointegration indicates that additional nonstationary variables may be needed for the model (Stern, 2011). The reason is that if relevant nonstationary variables are omitted, they will be part of the error term, making the error term nonstationary (Everaert, 2011). 5 The sample period is dictated by the availability of data.

LITERATURE REVIEW
There is a growing body of theoretical literature that says inequality is harmful to the environment (Cushing et al., 2015;Islam, 2015). Several possible causal pathways have been offered. Inequality is bad for the environment because the rich consume more (e.g., water for a private swimming pool, gardening, and washing cars) and thereby generate more waste and pollution (Islam, 2015;Dorling, 2017). This hypothesis could be true in India. For example, from a survey of 1,495 households in Bengaluru by the Ashoka Trust for Research in Ecology and the Environment, it is found that rich households (top ten percentile) use four times more water than average households do; rich households consume 340 L per person per day while average households consume 85 L per person per day (The Hindu, 2017, para. 1). According to the survey, gardening and washing cars are among the reasons for the high consumption (The Hindu, 2017, para. 3). The inequality-environment theoretical literature also argues that the rich also set the norm for what defines a lifestyle of high status, inducing the rest to work more and consume more (Berthe & Elie, 2015;Cushing et al., 2015;Islam, 2015). It is well known that, in India, owning a car is considered a status symbol (Mohan, 2020). Imitating the lifestyle of the rich could result in more cars being bought unnecessarily and more water being used, for both automotive manufacturing and household consumption (i.e., car wash).
The theoretical literature also suggests that inequality erodes trust, cohesion, and cooperation, which undermine collective action to protect environmental resources (Wisman, 2011). This phenomenon can also be seen in India. For example, in Calcutta, interaction and cooperation among environmental NGOs are rare (Dembowski, 2001, p. 81). It is perceived that many of them view one another with a suspicion that others have a hidden agenda and are pursuing personal interests (e.g., fame and foreign funds) (Dembowski, 2001, p. 2). Moreover, those environmental NGOs find it easier to mobilize people from within their own social class than other classes (Dembowski, 2001, p. 82). In 2019, water in Calcutta ranked the second most unsafe after Delhi (DNA India, 2019, para. 1).
It is believed that, in the inequality-environment theoretical literature, inequality causes people to think about unemployment, status insecurity, and making the economy work again and shifts people's attention away from environmental issues (Wilkinson & Pickett, 2010, p. 263). This may be happening in India. In Calcutta, Dembowski (2001, p. 82) reported that the environment is not considered a serious issue for communities struggling for daily survival, although such people are among the most exposed and vulnerable to environmental hazards. In another city Kanpur, Do et al. (2018) find that public demand for water quality was low in the past; people were more concerned about the impact water pollution policy would have on the economy as they depended on the highly river-polluting tanning industry for jobs. In 2017, India's leather industry employs an estimated 3 million workers, which mostly come from poor and marginalized groups (Chitnis, 2017).

DATA
To investigate the impact of inequality on water pollution in India, this study uses aggregate annual time-series data. This paper focuses on water quality in rivers as they are the most important source of water in India (Agrawal, 1994). We employ biological oxygen demand (BOD) as our measure of water pollution. BOD measures the amount of oxygen consumed by organisms to remove organic matter. It has anthropogenic elements, which are important to capture the effect of socioeconomic variable such as inequality. Also, BOD is a common measure of water quality in the literature of income and the environment (e.g., Greenstone & Hanna, 2014;Sigman, 2014); so, using this measure allows for comparability. Data on BOD (mg/L) 7 are taken from Global Environmental Monitory System (GEMS) (available at https://gemstat.org/). This study uses two different measures of inequality -Gini and Theil index. Data on Gini are drawn from the Standardized World Income Inequality Database (SWIID, version 7.1). As shown later in the next section, Gini from the SWIID is not robustly I(1), so we will use the Theil index from the UTIP-UNIDO (available at https://utip.lbj.utexas.edu/data.html) in our main analysis and Gini for robustness checking. The main advantage of the Theil index is that it is based on datasets that are collected consistently over long intervals (and thus comparable across time) (Galbraith, 2009). This advantage is significant in the present study, given that our analysis is based primarily on long time-series data. Another advantage of this Theil index is that it is not imputed like the Gini of SWIID, and thus, is more reliable. The Theil index has been commonly used in many studies to measure country-level income inequality (see, e.g., Navarro et al., 2006;Krieger & Meierrieks, 2019). Real income per capita comes from the World Development Indicators (WDI). The sample period is 1976-2008. The end year is the latest period for which all the data are available at the time of writing. As is standard, income is log-transformed. For water pollution and inequality, because one of our main purposes is to test for cointegrationto see whether both time series move togetherwe prefer both series to appear in their original form 8 . Table 1 provides summary statistics.

METHODOLOGY
The Johansen (1995) approach 9 is then employed to test for cointegration between the series. Main advantages of this approach over other common cointegration methods are that (i) it views all variables to be endogenous, (ii) it allows us to perform hypothesis testing on the cointegrating vectors, and (iii) it is robust towards measurement errors (Hassler & Kuzin, 2009). To use the Johansen approach, we estimate the following vector error correction model (VECM) where y t is an n Â 1 vector of the first-order integrated [I(1)] variables that can be endogenous. The matrix P contains two N Â N matrices, P ¼ ab 0 . The matrix a contains the speed of adjustment, and the matrix b 0 contains the cointegrating coefficients. G i is an n Â 1 matrix of short-run parameters, and 1 t is an n Â 1 vector of white-noise disturbance terms. D is a difference operator. The above equation is estimated using maximum likelihood (ML).
To test for cointegration, we will look at the rank (r) of P through its eigenvalues. If all the eigenvalues are not significantly different from zero (i.e., the rank of P is zero), then there is no cointegration (variables under consideration do not move together in the long run). If the rank equals one, they are cointegrated (move together). There are two test statistics, which can be used to test the number for cointegration rank under the Johansen approachthe trace and the maximum-eigenvalue statisticswhich are computed as where r is the number of cointegrating vectors, T is the number of observations, andl i is the estimated value for the eigenvalues from the P matrix. We will first test a null hypothesis of no cointegrating vectors (r ¼ 0). If the null is rejected, the null that there is at least one cointegrating vector (r 1) will be tested. If this null is not rejected, we will conclude that there is a long-run relationship between a set of variables. Osterwald-Lenum (1992) provides critical values for the trace and the maximum-eigenvalue statistics.

RESULTS
To use the Johansen approach, all variables must be integrated of order 1, denoted as I(1). A time series is said to be I(1) if it needs differencing once to induce stationarity. A time series is stationary or I(0) if its mean, variance, and covariance are constant over time. To test the order of integration of time series under consideration, we use the standard augmented Dickey-Fuller (ADF) (1979) and Phillips and Perron (PP) (1988) unit root tests. Table 2 reports the results of the tests. In levels, the tests do not reject the null hypothesis of a unit root. For the first differences, the null hypothesis is rejected for all series except for Gini. It can be concluded that BOD, Theil, and income are integrated of order 1, I(1). Because we cannot find strong evidence that Gini is I(1), it will only be used for robustness checks.
To test for cointegration between inequality and water pollution, Equation (1) is employed on our three k-variable models. Model 1 includes BOD and Theil. Model 2 consists of BOD, Theil, and income. Model 3 includes BOD and Gini. Table 3 presents the cointegration test results. For model 1, both the trace and maximum-eigenvalue tests indicate that BOD and inequality are cointegrated. For model 2, following Heerink et al. (2001), we add income in our set of variables. It is common in the literature to add squared of income to account for a nonlinear relationship between income and pollutants, known as the EKC (Grossman & Krueger, 1995). However, it is inappropriate to include the square of income in our model because, like many popular cointegration methods, the Johansen approach needs different asymptotic theory (see Müller-Fürstenberger & Wagner, 2007). Nevertheless, the EKC can still be assessed by comparing the long-and short-run coefficients on incomeif the former is negative or smaller than the latter, then the EKC holds (Narayan & Narayan, 2010).
The trace statistic suggests that water pollution, inequality, and income are cointegrated. The maximum-eigenvalue test, however, does not confirm this result. A possible explanation for this conflicting result is that income may not belong to the cointegration relation (see Barua & Hubacek, 2009;Greenstone & Hanna, 2011). To check if this is the case, we perform long-run exclusion tests to see whether income can be excluded from our model. Table 4 also reports the results of exclusion tests. It shows that income can be safely excluded from the long-run relation (insignificant p-value). Given that adding the income variable does not improve our specification, our preferred model is a bivariate one 10 . Model 3 replaces Theil with Gini as a measure of inequality. As can be seen, both statistics support the presence of cointegration. Although we find robust evidence of cointegration between water pollution and inequality, it could be argued that the results are due to the sample size we use. In small samples, the Johansen test tends to reject the no cointegration null (Cheung & Lai, 1993). To see whether our results are biased by a small sample, we adjust our test statistics for model 1 (our preferred model), by multiplying the statistics with a small sample correction factor of Reinsel & Ahn (1992), T À k Â n ð Þ =T, where T is the number of observation, k is the number of lags, and n is the number of variables. This study also applies the Bartlett correction proposed by Johansen (2002) to the trace test statistic. Both corrections will lower the test statistics, making it difficult to reject the null hypothesis. Table 5 presents the cointegration test results with small sample correction. All the corrected statistics reject the null of no cointegration at the 5% level. This suggests that our finding of cointegration is not driven by our small sample. Table 6 presents estimates of the long-run relationship from the above three models. All models show a significant positive effect of inequality on water pollution. The estimated parameters imply that a 1% increase in inequality leads to, on average, a 1.931-2.958% increase in water pollution in the long run. The income variable in model 2 is not statistically significant. To see the existence of EKC, in unreported analysis, we compare this long-run coefficient with its short-run coefficient, as suggested by Narayan & Narayan (2010). We observe that the long run (0.655) is smaller than the short run (8.173), so no evidence of EKC. Like the long-run coefficient, the short-run coefficient is not significant, either (available upon request). This study agrees with the findings of Herzer 2020). Cointegration property is invariant to increases in the information set ( Juselius, 2006, p. 349). If cointegration is found within a set of variables, the same cointegration relation will still be found if additional variables are added (Juselius, 2006, p. 349).
To test the robustness of this result, we re-estimate the long-run relationship using three alternative estimation methods: the dynamic ordinary least squares (OLS) of Stock & Watson (1993), the fully modified OLS estimator of Phillips & Hansen (1990), and the canonical cointegrating regression of Park (1992). Table 7 presents estimation results using these estimators. The ML estimation of Johansen (model 1; Table 6) is reported for comparison. We continue to find a positive and statistically significant effect of inequality on water pollution, and the estimates from the three methods are generally rather close to that of ML. We conclude that our results are robust to different estimators.
Next, we examine the sensitivity of our results to the inclusion of other explanatory variables. To this end, we use the dynamic OLS estimator as it has important econometric advantages over other estimators. First, it   performs well in small samples as ours. Moreover, one variable that we are going to use has short time-series observations, reducing our already small sample. Second, it allows I(0) independent variables (such as corruption) in the regression. Third, it is robust to endogenous explanatory variablesan issue that may arise from measurement error of inequality and reverse causation between inequality and water pollution. Table 8 reports our results. In column 1, corruption is added as an explanatory variable. The inclusion is motivated by the fact that corruption is correlated with inequality (Gupta et al., 2002), and as a result, both variables could proxy one another. There is a possibility that what we are estimating so far is the impact of corruption rather than that of inequality. The corruption data are obtained from the International Country Risk Guide. The data have been rescaled so that greater values correspond to more corruption. Column 2 adds democracy as it also tends to be correlated with inequality (Eriksson & Persson, 2003) and has been found important in explaining variation in water pollution (Lin & Liscow, 2013). The Polity2 Index, which is from the Polity IV dataset, is used to measure democracy. In column 3, we add literacy, which relates to inequality (Torras & Boyce, 1998) and has been found significant in Barua & Hubacek (2009). Literacy data for India are available from WDI but have many data gaps. For that reason, we use secondary school enrollment rate (which has far fewer missing data) from the same source as a proxy for literacy 11 . In column 4, all additional explanatory variables are entered in the regression simultaneously. Overall, the estimated effect of inequality is not very sensitive to which specific control variables are entered in the regression. In all cases, the coefficient of inequality remains statistically significant. The democracy and literacy variables are not significant. Corruption is statistically significant at 5% with expected signs in column 1 12 . In column 4, corruption is not significant, which is understandable, since we lost more degrees of freedom with more variables included. Since only one variable is significant in column 4, results in column 1 are preferred. Interestingly, we observe that the magnitude of inequality (1.498%) is nearly as large as that of corruption (1.591%), suggesting the importance of inequality in explaining water pollution 13 . It is important to bear in mind that this finding is based on small sample size. However, the estimated effect of corruption is within the range of a previous study -À0.681 to À1.682% (see Table 2 in Sigman (2014)) 14and that gives us some confidence about the reliability of our finding. Because inequality  1986-2007 1980-2007 1980-2007 1986-2007 t-statistics in parenthesis. **Statistically significant at 5%. Given small sample, only one lead and one lag of the regressors are used in the estimation. All coefficient estimates are converted to elasticities for comparability.
11 Data for missing years are interpolated.
12 Cointegration test with corruption included is not undertaken given the fact that corruption is not persistent or I(0) (Seldadyo & De Haan, 2011). Furthermore, from our unit root tests, we also do not find strong evidence that corruption is I(1). 13 Note that although inequality has a relatively larger t-statistic than corruption does, such statistics only indicate statistical significance, not economic or practical significance of variables (Wooldridge, 2013, p. 135). 14 Corruption elasticities in Sigman (2014) are computed by multiplying its estimated coefficients of corruption with its sample mean of corruption. Vol 23 No 4,993 and corruption have different distributions, we also compute standardized coefficients for both variables. For column 1, we find that a one-standard deviation increase in inequality increases water pollution by 0.544 standard deviation; a one-standard deviation increase in corruption increases water pollution by 0.565 standard deviation. Again, the magnitude of inequality is nearly as large as that of corruption, meaning that inequality as almost important as corruption in determining water pollution.

Water Policy
Our results so far show that inequality increases water pollution. An increase in water pollution may also lead to an increase in inequality because bottom class people are more exposed to water pollution (and thus more vulnerable to waterborne diseases) 15 . Sickness reduces productivity and increases absenteeism at work and in school. This consequently reduces the earning capacity of bottom class adults and the potential income of their children, perpetuating poverty, and reproducing inequality. Therefore, the direction of causality may run in both directionsnot only from inequality to water pollution but also from water pollution to inequality. To test for long-run causality, this paper performs a test of weak exogeneity (a test of zero restrictions in the α matrix). A rejection of the null of weak exogeneity indicates long-run (Granger) causality (Hall & Milne, 1994). In Table 9, the null hypothesis of weak exogeneity of water pollution and the null hypothesis of weak exogeneity of inequality are rejected at the 5% level; the long-run Granger causality runs in both directions, from inequality to water pollution and water pollution to inequality. These results even hold when income is included in our model (model 2) and when Gini is used instead of Theil (model 3). This finding justifies our use of the Johansen approach that views all variables to be endogenous.
The long-run causality tests indicate the direction of the relationship (i.e., water pollution to inequality), but do not determine the sign of that relationship. Does an increase in water pollution lead to an increase in inequality? To see this, we compute an impulse-response function from our estimated VECM. Figure 1 plots the response of inequality to a shock in water pollution for a period of 10 years. As a robustness check, impulse-response function using Gini is also reported. In both panels, we find a shock to water pollution does have a significant permanent positive effect on inequality 16 . For Theil, the full impact is reached after 2 years, and 5 years for Gini. Furthermore, the bootstrapped confidence interval in both panels lies almost entirely in the positive domain. We conclude that there is a positive effect of water pollution on inequality.
Future research that examines the effect of inequality on environmental outcomes may need to tackle endogeneity bias that arises from reverse causality going from more pollution to higher inequality. The reason is that reverse causality going from more pollution to higher inequality would bias the estimated effect of inequality on pollution upward. To put it another way, if pollution has a positive effect on inequality, then the effect of inequality on pollution tends to be positive. 15 Access to clean water in India is correlated with a person's social class (Zargar, 2018).

CONCLUSION
Although India is notorious for its water quality and inequality, little attention has been paid to the question of whether these two phenomena are related. Previous literature that examined the relationship between water quality and inequality did not explicitly analyze for the case of India. Although lessons can still be learned from the empirical literature, it suffered from many econometric issues, which were rectified in the present study. We found that inequality has a robust positive long-run effect on water pollution in India. One reason why inequality increases water pollution in India could be that (in no particular order) the rich consume more water than is necessary and thus produce more pollution. Another reason could be that the wealthy set the norm for what defines a lifestyle of high status, forcing people to spend more and consume more (and more production in the economy as a result). Another plausible reason is that inequality always erodes trust, making cooperation to protect common resources become difficult. The next reason is that inequality causes people to think more about economic issues and shifts people's attention away from environmental problems, thereby reducing demand for better environment. The reader should bear in mind that these are just possible causal pathways, and more research is needed to develop a deeper understanding of the interplay between inequality and water pollution in India.
We showed that the magnitude of inequality on water pollution is almost as large as that of corruption, a factor that is often seen as an important cause of water pollution (Rowlatt, 2016). This suggests that reducing inequality is almost as important as curbing corruption in addressing water challenges in India. Given that an increase in water pollution leads to an increase in inequality, future studies may need to consider the possible endogeneity of inequality when assessing the impact of inequality on water pollution or other environmental outcomes.

POLICY RECOMMENDATION
From a policy point of view, reducing inequality is not only important for fairness but also for the environment. In 2016, India discontinued the wealth tax due to high collection cost, and in 2019, the government announced a corporate tax cut to spur the economy. This study recommends the government reconsider their decisions since inequality may be worsened as a result (Jones, 2015;Nallareddy et al., 2018) and can thereby have an unintended effect on its water. Although reducing tax collection may spur more economic growth, which according to the EKC hypothesis, could later be good for the environment (by increasing public demand for environmental protection, for example), we did not find the EKC for water pollution in India. It is estimated that the corporate tax revenue loss due to the recent tax cut would be about $20 billion (around 0.7% of GDP) (The Economist, 2019b, para. 3). India also suffers a significant loss due to tax avoidance. For example, in 2013, India's corporate tax revenue loss due to tax avoidance is estimated to be between $41.17 billion and $47.53 billion, between 2.34 and 2.70% of GDP (Cobham & Janský, 2018). It is well known that, in reducing water pollution, the water sector in the country faces a huge financial challenge to fund investments in capital-intensive modern wastewater collection and treatment (Narain, 2016). If India could increase the amount of tax revenue, it could be used to finance the necessary investments. The tax revenue could also be used to fund vital public services that could help reduce inequality such as education and healthcare for people from all walks of life in India. The government can also consider consumption-based tax to reduce consumption among higher consumers, in addition to increasing tax collection (Mohan, 2019).