Abstract
United States Environmental Protection Agency (USEPA) drinking water violation report is currently one of the most reliable measures of evaluating United States drinking water quality. While states continuously strive to comply with federal water quality standards making this documentation continuously relevant, consumers are likely to perceive water quality through sensory aesthetics or physical and virtual social networks. This research quantifies the relationship between consumer perceptions and government-reported drinking water quality to provide insights to state water managers and policymakers. We evaluated consumer perceptions of tap water using weekly social media data. The online search returned 898,709 mentions and 799,035 posts. Net sentiment, measured as the number of negative posts minus the number of positive posts divided by the number of posts expressing sentiment, was determined and ranged from −100 to 100. Net sentiment was uncorrelated with USEPA weekly water quality violations for most states. Net sentiment was correlated with violations related to arsenic standards (−0.223) and a total number of violations (−0.220) for Washington. For California, net sentiment was correlated with violations related to disinfectants and other organic compounds (−0.295). In many cases, water violations in one city became national news, which eclipsed local water issues circulating on social media.
HIGHLIGHTS
Estimated state sentiment scores on tap water perception in the U.S.
Compiled government agency data on water quality violation report.
Found no correlation between sentiment scores and government agency data for most states.
Increasing consumer engagement and awareness of violation report data is needed.
INTRODUCTION
Quality and acceptance of drinking water distributed by public water systems (PWS) is essential to consumer health and well-being. Drinking safe, good-quality water is important for avoiding the negative health consequences of contaminated water (USEPA 2015a). If consumers perceive drinking water to be unsafe, regardless of the true water quality status, consumers who can afford to will increase expenditures on bottled water or other filtrated water sources to minimize negative health outcomes. Research finds that when United States (U.S.) consumers perceive tap water to be unsafe, they are likely to switch to more expensive bottled water. This occurs despite the lack of evidence that bottled water is safer than tap water and the negative impact plastic bottles have on the environment (Opel 1999; Hu et al. 2011; Saylor et al. 2011).
In the U.S., safe drinking water is assured by legislative regulations and standards, implementation and enforcement of these regulations and standards, and government funding to support and maintaining infrastructure. The Safe Drinking Water Act (SDWA), passed in 1974 and amended in 1986 and 1996, forms the basis of public drinking water legislation. The SDWA sets legal standards for drinking water contaminant levels and treatment protocols to protect public health under the administration of the United States Environmental Protection Agency (USEPA) and state agencies. To ensure compliance, PWS must inspect and report contaminant levels to the states following the water supply guidance manual (USEPA 2000). States review the results and may conduct their own tests on water samples. The USEPA reviews state violation reports and assists PWS with compliance (Tiemann 2014). PWS are responsible for notifying the public of any health-threatening violation within 24 h (Tiemann 2014). The USEPA also provides financial assistance to support rural water systems, projects addressing exceedance of acceptable lead levels, and watershed protection infrastructure. Over $32 billion has been spent on 13,183 projects from 1997 to 2016 to improve water quality (USEPA 2015a). Before the SDWA, about 40% of PWS did not meet federal standards (USEPA 1999). Today, the U.S. has some of the safest drinking water in the world. More than 90% of U.S. tap water meets all standards set by the SDWA and similar policies (Salzman 2014). Despite this success, political and scientific challenges remain, including providing the same high water quality standards to all Americans (Weinmeyer et al. 2017). There are over 150,000 PWS in the U.S. and most of them are identified as small systems that serve less than 10,000 people (USEPA 2015b). Over 90% of these systems rely on groundwater, and the remainder on surface water (USEPA 2015b). This scope and variation make it difficult for all PWS to comply consistently with SDWA standards. For example, in 2019, the USEPA issued more than 4,500 severe health standard violations. The latest Natural Resource Defense Council (NRDC) report also found that the correlation between inadequate standard enforcement and certain disadvantaged groups is high, and information regarding water contamination should be accessible and conveyed more efficiently (Fedinick et al. 2019).
While the government agencies rely on lab testing and onsite monitoring to ensure tap water complies with health standards, consumers put their trust in the institution. At the same time, consumers evaluate tap water quality through sensory aesthetics including taste, odor, and color. Research finds that sensorial information affects consumer perceptions of drinking water quality (de França Doria 2010; WHO 2017). Consumers may also associate negative drinking water sensory aesthetics to health risks (Jardine et al. 1999; Arnedo-Pena et al. 2003; de Franca Doria et al. 2005; Schade et al. 2015; WBTV 2019). In addition, public trust in PWS varies across demographic groups (Pierce & Gonzalez 2017).
Consumers' ability to recognize changes in taste and odor of tap water may provide early warning signs of water quality deterioration, but these cues are generally limited (Whelton et al. 2007). Untrained consumers may detect taste and odor changes only when certain mineral or contaminant levels in tap water have reached a threshold that humans can perceive (Young et al. 1996; Dietrich & Gallagher 2013). Although some contaminants such as iron and copper affect the taste of water, other violations identified by the USEPA may be undetectable. Having a better understanding of how people communicate their water quality perceptions is helpful for identifying potential or ongoing water quality issues.
This study aims to provide insight into the relationship between consumer perceptions of tap water quality expressed on social media and water quality standard violations reported by the USEPA. We summarize the weekly online and social media content related to tap water to proxy consumer perceptions of tap water. These data are used to determine the net sentiment (number of positive posts minus the number of negative posts divided by the total number of posts with sentiment then multiplied by 100) for each of the lower 48 states. We analyzed the correlation between sentiment score and recorded weekly number of violations over the same period to explore the relevancy of social media data to actual water quality status as defined by government agency data.
PREVIOUS RESEARCH
Chloride, copper, iron, sulfate, manganese, and zinc affect the taste of water. Existing research finds relationships between water taste and quality. For example, consumers may interpret the subtle taste of chlorine as a sign of safe drinking water (Kelly & Pomfret 1997), but would consider higher levels undesirable (Bryan et al. 1973). Elevated levels of certain elements may cause water to taste salty, metallic, or bitter (Burlingame & Mackey 2007). Metallic and astringent tastes, experienced as a lingering aftertaste, more often arise from the corrosion or leaching of copper and iron (Burlingame & Mackey 2007). Consumers also may find water with low or no mineral content to taste flat (Burlingame & Mackey 2007). Water sensory studies are typically conducted in labs with trained panelists to generate consistent results (Bartels et al. 1986; Dietrich 2006). However, these settings may not accurately represent the broader population (Burlingame & Doty 2018).
Outside laboratory settings, consumers exchange information about tap water quality through physical or virtual social networks. Can unprompted virtual conversations about water help identify potential water quality standard violations? In recent years, there has been increasing attention focused on the collection and use of social media online data by individuals, industries, governments, and researchers. Social media and online data have been used for various purposes ranging from emergency management (Panagiotopoulos et al. 2014, 2016) to earthquake detection and evaluation (Earle 2010; Sakaki et al. 2010; Mendoza et al. 2019). Online reviews or comments are factored into consumer decision-making (Kim et al. 2008; Ruiz-Mafe et al. 2018). Private businesses conduct sentiment analysis of online content to understand consumer perceptions of various services and products (Chiarello et al. 2020). Government agencies and lawmakers could use online platforms as communication channels to convey information and understand consumer perceptions on various social issues (Bonsón et al. 2012; Kavanaugh et al. 2012; Driss et al. 2019). Despite its use for other products and events, to the authors' knowledge, social media data sentiment analysis has not been applied to consumer perceptions of tap water and its implications for positive health outcomes. This research contributes to the application of social media data on the topic of tap water that builds on previous research on sentiment analysis and further expands the literature on social media analysis.
DATA AND METHOD
Social media data and net sentiment
There are many database and web search engines and platforms available to collect information and data from internet content. Some platforms are tailored to news and business sources, such as LexisNexis, while others focus more on marketing and sales. Researchers, in collaborating with computer scientists, can also develop their own algorithms. This research employed the NetBase platform to collect social media data, including Twitter or other content such as blogs (NetBase 2020). NetBase is one of the leading social media search engines. The platform provides full service to users and offers access to a wide range of social media content. Similar to other full-service search engine platforms, NetBase's patented search engine employs natural language processing (NLP) system and artificial intelligence (AI) tools to conduct sentiment analysis and classify posts into different categories (Li et al. 2003). NLP search engines provide accurate and reliable results for semantic searches with well-trained AI tools (Hayes et al. 2021). Previous academic research incorporates data from the NetBase platform including perception on mosquito-borne and food-borne illness and threats (Jung et al. 2021; Widmar et al. 2021), public engagement in showcasing livestock events (Mahoney et al. 2020), and product development for private businesses (Carr et al. 2015).
Data collected in this research was not limited to any particular social media. Since there is no previous work establishing water quality communication on social media, a wide net was cast to analyze all data. Blogs, travel websites, and other such platforms could and do include the discussion of water quality. Weekly volume data were collected from 12:00 AM of December 16, 2019 to 11:59 PM of January 9, 2021, resulting in 108 weeks of data for each state. Information was downloaded on March 11, 2021. It is important to note the date was downloaded because online posts may be removed or reinstated by the author or moderator at any point after posting.
A query including 13 terms was developed to gather social media posts related to tap water quality. Terms included: tap water, #tapwater, city water, #citywater, public water, #publicwater, water from the tap, piped water, tap-water, #tap-water, faucet water, water from the faucet, and mains water. Geography was limited to the lower 48 states and posts in English. The collection of data in other languages is possible, but given the frequent use of slang and other shorthand terms in social media posts, the fluency required for interpretability is high. The authors, therefore, chose to focus on posts exclusively in English. A more inclusive sample would include Spanish and other language posts. To compare social media data between states, data were limited to posts for which a location could be identified. For some social media data such as Twitter, the application programming interface allows us to determine the location of the post when a Twitter Place (‘geo-tag’) is attached to the posting. If there is no ‘geo-tag’ associated with the posting, the user registered location is used. Facebook and Instagram, two other major social media outlets, do not provide geolocation data. At the national level, the location of posts can be determined broadly through the internet domain. For example, the country of origin can be determined by addresses that include .uk or .fr as well as codes within the domain. It is important to note that not all posts contained enough geographical data to even begin to determine state-level location. Difficulty determining the location at the time of post is a limitation of not only this research, but all social media data. Zheng et al. (2018) summarized existing approaches of social media post location identification and confirm that location identification of social media posts is a complicated issue requiring additional researcher attention. We aimed to be transparent by noting the number of posts we were able to classify for each state and for other demographic categories. Only a subset of the total number of data collected was able to be further classified at the state level.
An important part of collecting social media data is insuring that the data reflect the targeted topic. Due to slang and colloquial phrasing, in addition to bots, a random subset of posts collected using the keywords were manually checked by the researchers to determine if they were related to the search topic. Phrases and terms unrelated to tap water quality were removed. For example, advertisements related to the shoe Nike SB GT Blazer contained water-related terms and were disqualified from inclusion. Phrases excluded due to bot (an autonomous program on the internet that can interact with other systems or users) promotion included #notesfromnationalemergency. The focus of this research is water quality, not other water-related issues including the cost of water. Therefore, the terms water disconnected and water is disconnected were excluded from search results.
Both the number of posts and mentions found using the search criteria were recorded. Posts are the number of documents containing mention of the topics (NetBase 2020). Mentions are individual sentences within the post that mention the primary terms, in this case the tap water-related terms (NetBase 2020). The number of mentions will never be less than the number of posts, as each post will contain at least one, if not multiple mentions. For example, in a single blog post, someone may say ‘The tap water in city A is wonderful. However, the tap water in city B is terrible.’ This blog post would count as one post, but contains two mentions of tap water, one which would be classified as negative, and the other positive. The number of posts and mentions was recorded weekly for the study period and reflects the volume of data on social media.
Retweets are a process on Twitter where a person can, for all intents and purposes, quote another person's post and either add additional text at the top or not. The retweets can be either removed to reduce the amplification of the opinion if they are not providing additional information to a specific problem or kept to capture the influence of a broader issue depending on the purpose of the research (Gasco et al. 2019). Our analysis is on the general perceptions of water quality, in addition to state-specific analyses. Therefore, we wanted to be able to identify spikes in the overall number of posts and mentions. We, hence, left retweets in the dataset. In terms of sentiment analysis, the original tweet, and any additional text that was added to a quoted or ‘retweeted’ tweet, was analyzed separately.
Demographic information was also determined for this analysis. Determining non-stated demographics from social media often requires several assumptions. Self-reported information through the author's profile included gender, interests, and profession (NetBase 2016). Self-reported gender is available for Twitter data, with Twitter allowing participants to select from male, female, or write in their preferred gender. For those who do not declare a gender, gender was specified based on the popularity of the posters name for males and females. Therefore, only male and female tweets are reported. We acknowledge that this limits reporting to binary genders. Age was imputed by the NetBase software using U.S. social security administration data. Trends in names occur over time, and the stated name of the author of the post or blog was used to probabilistically determine their birth year based on that names frequency (popularity) for any given birth year (NetBase 2016). Research had shown that gender and age can be inferred from social media user identifiers and first and last names (Lansley & Longley 2016; Hu et al. 2021) and may achieve 77–95% accuracy (Tang et al. 2011). However, this form of identification still leaves much to be desired. Many people have social media handles that do not include their names at all. This limits the number of posts that can be identified. We reported the number of posts that can be identified for each demographic category.
Self-reported professions are available for Twitter, Instagram, and several other smaller sources. Self-reported professions were grouped into categories. The domains and sources of posts were also recorded. A source provides a general idea of where a post appeared, for example on a news site. A domain is a more detailed example of where the post appeared, for example cnn.com.
Beyond the measurement of volume, whether people were talking about tap water in a positive or negative manner was important to determine if these data could be used to assess water quality. The positivity or negativity of each post was determined using NLP (NetBase 2015). Although social media sentiment analysis is improving, there are still some topic-specific instances that require human corrections. For example, although the word ‘frightening’ is negative in most instances, if one is doing research on the U.S. holiday Halloween, it may be positive, or neutral. Therefore, the sentiment of approximately 10% of posts were spot-checked by researchers to ensure sentiment assignment accuracy. Posts were read and assigned either a negative, positive, or neutral sentiment and were then checked against the algorithm's assignment. Adjustments can be made by reassigning words to be either positive, negative, or neutral as needed. For this analysis, no reassignment was necessary. The top five positive and negative attributes, emotions, terms, and hashtags found in posts were reported. It is important to note that for items such as hashtags and terms, the language surrounding the hashtag or term was analyzed to determine negativity or positivity. This may result in the same term or hashtag appearing as both negative and positive based on the surrounding language.
Net sentiment is bound between −100 and 100. Expressing net sentiment as a percentage is simply an aesthetic convenience. It could also be presented as a decimal. We calculated the weekly net sentiment score, which provides changes of net sentiment over time while controlling for volume. The average, standard deviation, minimum, and maximum for the net sentiment values of each state during the study period were reported.
USEPA data
The USEPA measures water quality with predetermined standards and rules. The agency also reports violations if a water supply system does not meet the standards and rules. Following (Allaire et al. 2018) classification of violations, USEPA data (USEPA 2020a, 2020b) were downloaded on March 11, 2021. To match the social media data period, only violations occurring between December 16, 2018 and January 9, 2021 were included. Violations for total coliform, treatment rule and nitrate, arsenic, lead and copper, and other violations were included and categorized following Allaire et al. (2018). The ‘Other violations’ includes violation on ‘stage 1 disinfectants and disinfection byproducts by rule’, ‘stage 2 disinfectants and disinfection byproducts rule’, ‘inorganic chemicals’, ‘volatile organic chemical’, and ‘synthetic organic chemicals and radionuclides’. The USEPA determines water quality standards for each of the categories (e.g., arsenic). For arsenic, the USEPA has determined that the allowable amount for drinking water is 0.01 mg/l or 10 parts per billion (ppb) (USEPA 2020a, 2020b). Any amount greater than the acceptable amount (determined based on human health criterion) results in a violation. Therefore, the more violations a state has, the lower the water quality from a human health and safety perspective.
We restricted violation counts to Community Water Systems (CWSs). A CWS serves at least 25 people at their primary residence. CWSs serve year-round populations and are subject to SDWA regulations. Our collection differed from Allaire et al. (2018) in that they only included CWSs serving more than 500 people. We opted to include rural communities to better match the social media data which were not limited to only less rural or urban areas. Violations for each category were downloaded for each state, then summarized to weekly violation numbers by each category to match the net sentiment time frame. The number of violations was added across all violation categories to enumerate the total number of violations for each state.
Correlation between social media data and number of violations
We analyzed the association between the net sentiment and the number of violations within each category, including coliform violations (Coliform), lead and copper violations (Lead and copper), arsenic violations (Arsenic), violations of treatment rule and nitrate level (Treatment rule and nitrate), and other violations (Other), and the total number of violations (Total) for each state over the time period studied individually.





RESULTS
Social media volumes and net sentiment
The online search returned 898,709 mentions and 799,035 posts. Of the posts with identifiable gender (220,706), 54% were men and 46% were women (Table 1). The age of the posters was evenly distributed. Only 10% of posters (n = 223,578) were under 18, 19% were 55–64, and 17% were 25–34. Family was most often self-reported as an interest (30%, n = 107,660), followed by politics (24%) and religion (17%). Interestingly, for a water-related search, only 9% of posters listed food and drink as an interest. Top professions (n = 58,770) included creative arts (43%), education (9%), journalism (7%), student (7%), and science and research (7%). Examples of professions that fall under creative arts include actor, composer, painter, film director, and model. For education, example professions include lecturer, professor, and teacher. Science and research professions include jobs such as researcher, scholar, scientist, and biologist.
Social media data demographics
Category . | Percentage of posts . |
---|---|
Gender n = 220,706 | |
Male | 54 |
Female | 46 |
Implied age n = 223,578 | |
<18 | 10 |
18–24 | 12 |
25–34 | 17 |
35–44 | 15 |
45–54 | 14 |
55–64 | 19 |
65+ | 12 |
Interests n = 107,660 | |
Family | 30 |
Politics | 24 |
Religion | 17 |
Food and drink | 9 |
Pets | 9 |
Profession n = 58,770 | |
Creative arts | 43 |
Education | 9 |
Journalism | 7 |
Student | 7 |
Science and research | 7 |
Domains n = 697,515 | |
twitter.com | 56 |
reddit.com | 7 |
forum.grasscity.com | 1 |
tripadvisor.com | 1 |
booking.com | 1 |
Sources n = 697,515 | |
57 | |
Forums | 20 |
Blogs | 12 |
News | 10 |
Consumer reviews | <1 |
Category . | Percentage of posts . |
---|---|
Gender n = 220,706 | |
Male | 54 |
Female | 46 |
Implied age n = 223,578 | |
<18 | 10 |
18–24 | 12 |
25–34 | 17 |
35–44 | 15 |
45–54 | 14 |
55–64 | 19 |
65+ | 12 |
Interests n = 107,660 | |
Family | 30 |
Politics | 24 |
Religion | 17 |
Food and drink | 9 |
Pets | 9 |
Profession n = 58,770 | |
Creative arts | 43 |
Education | 9 |
Journalism | 7 |
Student | 7 |
Science and research | 7 |
Domains n = 697,515 | |
twitter.com | 56 |
reddit.com | 7 |
forum.grasscity.com | 1 |
tripadvisor.com | 1 |
booking.com | 1 |
Sources n = 697,515 | |
57 | |
Forums | 20 |
Blogs | 12 |
News | 10 |
Consumer reviews | <1 |
Twitter.com (a social media platform) made up the largest percentage of posts with a domain (n = 697,515) of 57%. Reddit.com (a large collection of online forums) was the second-largest domain at 7%. All other domains were less than 1%. Top sources (n = 697,515) included Twitter (57%), forums (20%), blogs (12%), news (10%), and consumer reviews (<1%).
Spikes in the number of posts and mentions occurred as a result of a wide range of events (Figure 1). Around January 27, 2019, a spike in mentions and posts occurred as news of a report regarding the Flint Michigan water crisis became public (Eggert 2019). This report outlined the crucial errors made by the staffers in the Department of Environmental Quality's drinking water office. The discussion surrounding water quality issues in the southeastern states was a large driver of posts around July 7, 2019 (Fouriezos 2019). Additional discussion surrounded poor tap water quality at Parchman state prison in Michigan. The spike around August 11, 2019 was primarily driven by issues with tap water in Newark New Jersey with reports of officials offering bottled water (Fitzsimmons 2019). Additional posts were driven by another boil water notice in Marshfield Massachusetts (Kukstis 2019). More brain-eating microbes drove posts around September 20, 2020 when the Houston Texas area was warned to stop using tap water (abcNews 2020). Spikes around November 15, 2020 were the result of a discussion of water filtration in Men's Journal (Men's Journal Editor 2020).
When considering the positivity or negativity of words and hashtags, it is important to remember the surrounding words which were analyzed to determine context. Top positive attributes included best tap water (10%), safe to drink (10%), taste (5%), clean tap water (3%), and clean (2%) (Table 2). Negative attributes mirrored the positive with taste (10%), brain-eating ameba (5%), contaminate with toxic (5%), contaminate (5%), and not safe to drink (4%). Leading emotions were unsurprising, with the words good (22%) and best (15%) topping the list for positive. For negative emotions, bad (6%) and warn (5%) topped the list. For both negative and positive terms, drink (9%) made up the highest percentage followed by use (positive 6%) and using (negative 5%). The negativity or positivity of the word drink was dependent on the sentiment of the words surrounding it. Safe, good, and filter all made up 3% each of the top positive terms. Not drink, residents, and bottled water accounted for 3, 2, and 2% of the negative terms, respectively. Although residents may be a surprising term when considering water quality, many people were talking about the plight of residents who were experiencing poor water quality, such as those in Flint Michigan. This resulted in the term ‘resident' being a negative term. Top positive hashtags included #covid19 (7%), #water (6%), #coronavirus (5%), and #drinktap (3%). Many posts related to COVID-19 were people explaining that water was safe during the pandemic. The safe status of water during the pandemic was accompanied by positive language, rendering #covid19 as positive in the context of water discussions. Top negative hashtags included #flintwatercrisis (6%), #saveflintchallenge (5%), #metrodetroit (4%), #detroit (4%), and #water (3%).
Attributes, emotions, terms and hashtags from social media data
. | Positive . | Percent . | Negative . | Percent . |
---|---|---|---|---|
Top attributes n = 24,204 | Best tap water | 10 | Taste | 10 |
Safe to drink | 10 | Brain-eating ameba | 5 | |
Taste | 5 | Contaminate with toxic | 5 | |
Clean tap water | 3 | Contaminant | 5 | |
Clean | 2 | Not safe to drink | 4 | |
Top emotions n = 32,442 | Good | 22 | Bad | 6 |
Best | 15 | Warn | 5 | |
Great | 5 | Not like | 3 | |
Love | 3 | Terrible | 3 | |
Delicious | 2 | Hate | 3 | |
Top terms n = 238,814 | Drink | 9 | Drink | 10 |
Use | 6 | Using | 5 | |
Safe | 3 | Not drink | 3 | |
Good | 3 | Residents | 2 | |
Filter | 3 | Bottled water | 2 | |
Top hashtags n = 6,965 | #covid19 | 7 | #flintwatercrisis | 6 |
#water | 6 | #saveflintchallenge | 5 | |
#coronavirus | 5 | #metrodetroit | 4 | |
#drinktap | 3 | #detroit | 4 | |
#utah | 3 | #water | 3 |
. | Positive . | Percent . | Negative . | Percent . |
---|---|---|---|---|
Top attributes n = 24,204 | Best tap water | 10 | Taste | 10 |
Safe to drink | 10 | Brain-eating ameba | 5 | |
Taste | 5 | Contaminate with toxic | 5 | |
Clean tap water | 3 | Contaminant | 5 | |
Clean | 2 | Not safe to drink | 4 | |
Top emotions n = 32,442 | Good | 22 | Bad | 6 |
Best | 15 | Warn | 5 | |
Great | 5 | Not like | 3 | |
Love | 3 | Terrible | 3 | |
Delicious | 2 | Hate | 3 | |
Top terms n = 238,814 | Drink | 9 | Drink | 10 |
Use | 6 | Using | 5 | |
Safe | 3 | Not drink | 3 | |
Good | 3 | Residents | 2 | |
Filter | 3 | Bottled water | 2 | |
Top hashtags n = 6,965 | #covid19 | 7 | #flintwatercrisis | 6 |
#water | 6 | #saveflintchallenge | 5 | |
#coronavirus | 5 | #metrodetroit | 4 | |
#drinktap | 3 | #detroit | 4 | |
#utah | 3 | #water | 3 |
In general, the net sentiment for the U.S. drinking water was barely positive at 3%. There were several events that caused either negative or positive spikes in net sentiment (Figure 2). Many of these events are the same events that caused spikes in the number of mentions and posts. Net sentiment dropped to −51% around January 27, 2019 because of the issues with tap water quality in Flint Michigan. Around April 21, 2020, net sentiment increased to 47% due to general discussion regarding quality tap water. Mirroring the spike in the number of posts and mentions, net sentiment around August 11, 2019 fell to −20% because of the unsafe tap water in New Jersey and Massachusetts. Another brain-eating ameba induced a negative spike in net sentiment, −76, around September 20, 2020 pertaining to the tap water Amebas in Houston Texas. A positive spike in net sentiment for water occurred around November 15, 2020 due to the Men's health article.
The average net sentiment at the state level as well as the minimum, maximum, and standard deviation for each state are presented in Table 3. States with higher average net sentiment included Louisiana (46), Wyoming (44), Iowa (23), Minnesota (19), and Indiana (17). Vermont had the lowest average net sentiment (−62), followed by Arkansas (−42), North Dakota (−32), Georgia (−30), and Idaho (−28). For sentiment scores between −2 and 2, there are California (−2), Illinois (−2), Michigan (−1), North Carolina (−1), South Carolina (−1), Florida (1), New Hampshire (1), Massachusetts (2), and Ohio (2).
Statistics of net sentiment by state
State . | Number of posts . | Minimum . | Maximum . | Average net sentiment (Standard deviation) . |
---|---|---|---|---|
Alabama | 708 | −100 | 100 | −15 (48) |
Arizona | 1,506 | −100 | 100 | −6 (45) |
Arkansas | 313 | −100 | 60 | −42 (48) |
California | 11,960 | −100 | 100 | −2 (47) |
Colorado | 2,252 | −67 | 100 | 15 (46) |
Connecticut | 596 | −100 | 100 | 5 (39) |
Delaware | 180 | −100 | 60 | 9 (41) |
Florida | 5,096 | −100 | 93 | 1 (45) |
Georgia | 2,424 | −100 | 100 | −30 (46) |
Idaho | 272 | −100 | 60 | −28 (62) |
Illinois | 2,846 | −100 | 100 | −2 (51) |
Indiana | 895 | −100 | 100 | 17 (61) |
Iowa | 535 | −96 | 100 | 23 (56) |
Kansas | 645 | −100 | 100 | −11 (55) |
Kentucky | 1,010 | −100 | 100 | 9 (39) |
Louisiana | 490 | −60 | 100 | 46 (38) |
Maine | 301 | −60 | 100 | 12 (54) |
Maryland | 1,302 | −100 | 100 | −8 (54) |
Massachusetts | 1,805 | −100 | 100 | 2 (51) |
Michigan | 2,434 | −100 | 100 | −1 (48) |
Minnesota | 1,219 | −100 | 100 | 19 (58) |
Mississippi | 251 | −90 | 100 | −3 (57) |
Missouri | 1,721 | −100 | 100 | 16 (51) |
Montana | 151 | −100 | 60 | −11 (66) |
Nebraska | 293 | −100 | 100 | −19 (47) |
Nevada | 1,231 | −100 | 100 | 7 (42) |
New Hampshire | 204 | −100 | 60 | 1 (41) |
New Jersey | 1,508 | −100 | 100 | −16 (58) |
New Mexico | 357 | −100 | 67 | 7 (49) |
New York | 6,367 | −100 | 100 | 6 (49) |
North Carolina | 2,050 | −83 | 100 | −1 (43) |
North Dakota | 130 | −100 | 33 | −32 (49) |
Ohio | 2,433 | −100 | 100 | 2 (52) |
Oklahoma | 636 | −67 | 100 | 3 (47) |
Oregon | 1,518 | −100 | 100 | 10 (47) |
Pennsylvania | 2,762 | −100 | 100 | 3 (47) |
Rhode Island | 205 | −100 | 100 | 10 (46) |
South Carolina | 700 | −100 | 100 | −1 (60) |
South Dakota | 68 | −100 | 100 | −28 (56) |
Tennessee | 1,549 | −100 | 100 | −5 (62) |
Texas | 8,365 | −84 | 100 | 3 (46) |
Utah | 484 | −100 | 100 | −20 (40) |
Vermont | 71 | −100 | 100 | −62 (48) |
Virginia | 1,551 | −100 | 100 | −4 (45) |
Washington | 2,154 | −100 | 100 | 7 (47) |
West Virginia | 301 | −60 | 60 | 10 (23) |
Wisconsin | 857 | −91 | 100 | −12 (44) |
Wyoming | 100 | 0 | 100 | 44 (34) |
State . | Number of posts . | Minimum . | Maximum . | Average net sentiment (Standard deviation) . |
---|---|---|---|---|
Alabama | 708 | −100 | 100 | −15 (48) |
Arizona | 1,506 | −100 | 100 | −6 (45) |
Arkansas | 313 | −100 | 60 | −42 (48) |
California | 11,960 | −100 | 100 | −2 (47) |
Colorado | 2,252 | −67 | 100 | 15 (46) |
Connecticut | 596 | −100 | 100 | 5 (39) |
Delaware | 180 | −100 | 60 | 9 (41) |
Florida | 5,096 | −100 | 93 | 1 (45) |
Georgia | 2,424 | −100 | 100 | −30 (46) |
Idaho | 272 | −100 | 60 | −28 (62) |
Illinois | 2,846 | −100 | 100 | −2 (51) |
Indiana | 895 | −100 | 100 | 17 (61) |
Iowa | 535 | −96 | 100 | 23 (56) |
Kansas | 645 | −100 | 100 | −11 (55) |
Kentucky | 1,010 | −100 | 100 | 9 (39) |
Louisiana | 490 | −60 | 100 | 46 (38) |
Maine | 301 | −60 | 100 | 12 (54) |
Maryland | 1,302 | −100 | 100 | −8 (54) |
Massachusetts | 1,805 | −100 | 100 | 2 (51) |
Michigan | 2,434 | −100 | 100 | −1 (48) |
Minnesota | 1,219 | −100 | 100 | 19 (58) |
Mississippi | 251 | −90 | 100 | −3 (57) |
Missouri | 1,721 | −100 | 100 | 16 (51) |
Montana | 151 | −100 | 60 | −11 (66) |
Nebraska | 293 | −100 | 100 | −19 (47) |
Nevada | 1,231 | −100 | 100 | 7 (42) |
New Hampshire | 204 | −100 | 60 | 1 (41) |
New Jersey | 1,508 | −100 | 100 | −16 (58) |
New Mexico | 357 | −100 | 67 | 7 (49) |
New York | 6,367 | −100 | 100 | 6 (49) |
North Carolina | 2,050 | −83 | 100 | −1 (43) |
North Dakota | 130 | −100 | 33 | −32 (49) |
Ohio | 2,433 | −100 | 100 | 2 (52) |
Oklahoma | 636 | −67 | 100 | 3 (47) |
Oregon | 1,518 | −100 | 100 | 10 (47) |
Pennsylvania | 2,762 | −100 | 100 | 3 (47) |
Rhode Island | 205 | −100 | 100 | 10 (46) |
South Carolina | 700 | −100 | 100 | −1 (60) |
South Dakota | 68 | −100 | 100 | −28 (56) |
Tennessee | 1,549 | −100 | 100 | −5 (62) |
Texas | 8,365 | −84 | 100 | 3 (46) |
Utah | 484 | −100 | 100 | −20 (40) |
Vermont | 71 | −100 | 100 | −62 (48) |
Virginia | 1,551 | −100 | 100 | −4 (45) |
Washington | 2,154 | −100 | 100 | 7 (47) |
West Virginia | 301 | −60 | 60 | 10 (23) |
Wisconsin | 857 | −91 | 100 | −12 (44) |
Wyoming | 100 | 0 | 100 | 44 (34) |
State violations
The state violations summed over the study period are presented in Table 4. States with the lowest number of total violations over the study period are Florida (20), Louisiana (38), South Carolina (51), and Ohio (53). States with the highest number of total violations were Texas (5,059), Pennsylvania (3,846), Washington (2,370), Oklahoma (2,093), and West Virginia (1,618). The leading violations are ‘Arsenic’, ‘Lead and copper’, ‘Treatment rule and nitrate’, and ‘Other’. The total number of ‘Coliform’ violation counts during the study period is 239, while ‘Other’ accrued to 11,884 counts. The top three states of ‘Arsenic’ violations are Washington (416), California (439), and Texas (542), of ‘Lead and copper’ violations are Kansas (354), New Jersey (486), and Texas (2,028), of ‘Treatment rule and nitrate’ are Oregon (741), Texas (816), and Pennsylvania (2,250), of ‘Other’ are Washington (1,185), Oklahoma (1,484), and Texas (1,672).
Number of water quality violations per state over the study period
State . | Arsenic . | Lead and copper . | Coliform . | Treatment rule and nitrate . | Other . | Total . |
---|---|---|---|---|---|---|
Alabama | 1 | 30 | 0 | 5 | 128 | 164 |
Arizona | 224 | 350 | 0 | 274 | 199 | 1,047 |
Arkansas | 2 | 13 | 0 | 25 | 90 | 130 |
California | 439 | 285 | 6 | 424 | 93 | 1,247 |
Colorado | 42 | 338 | 0 | 520 | 475 | 1,375 |
Connecticut | 19 | 163 | 0 | 137 | 154 | 473 |
Delaware | 0 | 12 | 0 | 5 | 3 | 20 |
Florida | 4 | 128 | 2 | 533 | 182 | 849 |
Georgia | 51 | 58 | 0 | 169 | 518 | 796 |
Idaho | 96 | 85 | 0 | 262 | 118 | 561 |
Illinois | 49 | 104 | 0 | 49 | 238 | 440 |
Indiana | 12 | 138 | 0 | 64 | 69 | 283 |
Iowa | 30 | 51 | 0 | 103 | 124 | 308 |
Kansas | 43 | 354 | 0 | 208 | 239 | 844 |
Kentucky | 3 | 3 | 0 | 12 | 20 | 38 |
Louisiana | 31 | 226 | 0 | 630 | 194 | 1,081 |
Maine | 20 | 60 | 0 | 21 | 16 | 117 |
Maryland | 7 | 59 | 0 | 24 | 0 | 90 |
Massachusetts | 5 | 47 | 0 | 67 | 39 | 158 |
Michigan | 34 | 274 | 2 | 71 | 272 | 653 |
Minnesota | 24 | 55 | 2 | 15 | 23 | 119 |
Mississippi | 21 | 36 | 0 | 143 | 71 | 271 |
Missouri | 24 | 66 | 0 | 283 | 36 | 409 |
Montana | 20 | 47 | 0 | 139 | 151 | 357 |
Nebraska | 20 | 0 | 0 | 59 | 63 | 142 |
Nevada | 88 | 14 | 0 | 41 | 127 | 270 |
New Hampshire | 35 | 35 | 0 | 38 | 14 | 122 |
New Jersey | 24 | 486 | 0 | 197 | 292 | 999 |
New Mexico | 55 | 135 | 0 | 513 | 189 | 892 |
New York | 46 | 151 | 4 | 158 | 332 | 691 |
North Carolina | 11 | 202 | 0 | 71 | 362 | 646 |
North Dakota | 4 | 33 | 0 | 1 | 15 | 53 |
Ohio | 11 | 25 | 0 | 19 | 39 | 94 |
Oklahoma | 74 | 198 | 0 | 337 | 1,484 | 2,093 |
Oregon | 61 | 186 | 14 | 741 | 111 | 1,113 |
Pennsylvania | 104 | 193 | 25 | 2,550 | 974 | 3,846 |
Rhode Island | 0 | 17 | 0 | 7 | 27 | 51 |
South Carolina | 0 | 36 | 0 | 20 | 3 | 59 |
South Dakota | 3 | 21 | 0 | 37 | 33 | 94 |
Tennessee | 0 | 58 | 0 | 20 | 32 | 110 |
Texas | 542 | 2,028 | 1 | 816 | 1,672 | 5,059 |
Utah | 24 | 122 | 0 | 175 | 96 | 417 |
Vermont | 10 | 42 | 0 | 42 | 214 | 308 |
Virginia | 18 | 67 | 0 | 50 | 158 | 293 |
Washington | 416 | 74 | 181 | 514 | 1,185 | 2,370 |
West Virginia | 40 | 271 | 0 | 442 | 865 | 1,618 |
Wisconsin | 12 | 137 | 0 | 95 | 141 | 385 |
Wyoming | 25 | 24 | 2 | 64 | 4 | 119 |
State . | Arsenic . | Lead and copper . | Coliform . | Treatment rule and nitrate . | Other . | Total . |
---|---|---|---|---|---|---|
Alabama | 1 | 30 | 0 | 5 | 128 | 164 |
Arizona | 224 | 350 | 0 | 274 | 199 | 1,047 |
Arkansas | 2 | 13 | 0 | 25 | 90 | 130 |
California | 439 | 285 | 6 | 424 | 93 | 1,247 |
Colorado | 42 | 338 | 0 | 520 | 475 | 1,375 |
Connecticut | 19 | 163 | 0 | 137 | 154 | 473 |
Delaware | 0 | 12 | 0 | 5 | 3 | 20 |
Florida | 4 | 128 | 2 | 533 | 182 | 849 |
Georgia | 51 | 58 | 0 | 169 | 518 | 796 |
Idaho | 96 | 85 | 0 | 262 | 118 | 561 |
Illinois | 49 | 104 | 0 | 49 | 238 | 440 |
Indiana | 12 | 138 | 0 | 64 | 69 | 283 |
Iowa | 30 | 51 | 0 | 103 | 124 | 308 |
Kansas | 43 | 354 | 0 | 208 | 239 | 844 |
Kentucky | 3 | 3 | 0 | 12 | 20 | 38 |
Louisiana | 31 | 226 | 0 | 630 | 194 | 1,081 |
Maine | 20 | 60 | 0 | 21 | 16 | 117 |
Maryland | 7 | 59 | 0 | 24 | 0 | 90 |
Massachusetts | 5 | 47 | 0 | 67 | 39 | 158 |
Michigan | 34 | 274 | 2 | 71 | 272 | 653 |
Minnesota | 24 | 55 | 2 | 15 | 23 | 119 |
Mississippi | 21 | 36 | 0 | 143 | 71 | 271 |
Missouri | 24 | 66 | 0 | 283 | 36 | 409 |
Montana | 20 | 47 | 0 | 139 | 151 | 357 |
Nebraska | 20 | 0 | 0 | 59 | 63 | 142 |
Nevada | 88 | 14 | 0 | 41 | 127 | 270 |
New Hampshire | 35 | 35 | 0 | 38 | 14 | 122 |
New Jersey | 24 | 486 | 0 | 197 | 292 | 999 |
New Mexico | 55 | 135 | 0 | 513 | 189 | 892 |
New York | 46 | 151 | 4 | 158 | 332 | 691 |
North Carolina | 11 | 202 | 0 | 71 | 362 | 646 |
North Dakota | 4 | 33 | 0 | 1 | 15 | 53 |
Ohio | 11 | 25 | 0 | 19 | 39 | 94 |
Oklahoma | 74 | 198 | 0 | 337 | 1,484 | 2,093 |
Oregon | 61 | 186 | 14 | 741 | 111 | 1,113 |
Pennsylvania | 104 | 193 | 25 | 2,550 | 974 | 3,846 |
Rhode Island | 0 | 17 | 0 | 7 | 27 | 51 |
South Carolina | 0 | 36 | 0 | 20 | 3 | 59 |
South Dakota | 3 | 21 | 0 | 37 | 33 | 94 |
Tennessee | 0 | 58 | 0 | 20 | 32 | 110 |
Texas | 542 | 2,028 | 1 | 816 | 1,672 | 5,059 |
Utah | 24 | 122 | 0 | 175 | 96 | 417 |
Vermont | 10 | 42 | 0 | 42 | 214 | 308 |
Virginia | 18 | 67 | 0 | 50 | 158 | 293 |
Washington | 416 | 74 | 181 | 514 | 1,185 | 2,370 |
West Virginia | 40 | 271 | 0 | 442 | 865 | 1,618 |
Wisconsin | 12 | 137 | 0 | 95 | 141 | 385 |
Wyoming | 25 | 24 | 2 | 64 | 4 | 119 |
Correlations between net sentiment and violations
For most states, the correlations are not statistically significant (Table 5). California's net sentiment is weakly correlated with the numbers of ‘Other’ violations (p = 0.045), while Washington state's net sentiment is slightly correlated to the numbers of ‘Arsenic’ violations and ‘Total’ violations (p = 0.020). All three correlation coefficients are negative, indicating that an increase in the numbers of violations accompanies a decrease in the level of net sentiment. Net sentiment was correlated with ‘Arsenic’ (−0.223) and ‘Total’ violations (−0.220) for Washington state. For California, net sentiment was correlated with ‘Other’ violations (−0.295). Although California's ‘Other’ violations are not very high compared with other states, it has the highest social media posts during the study period. The state of Washington also has relatively high social media posts (2,154) (Table 3), and its ‘Arsenic’ and ‘Other’ violation numbers both rank third in the country (Table 4). It is possible that consumers in California and Washington have a higher awareness of their water quality, as well as willingness to share their opinions on social media regarding local and national water quality issues.
Correlation between net sentiment and the number of violations by state
. | Arsenic . | Lead and copper . | Coliform . | Treatment rule and nitrate . | Other . | Total . |
---|---|---|---|---|---|---|
Alabama | −0.037 (0.702) | 0.032 (0.746) | −0.166 (0.086) | 0.066 (0.496) | 0.062 (0.523) | |
Arizona | −0.005 (0.956) | −0.001 (0.991) | 0.002 (0.986) | 0.122 (0.209) | 0.048 (0.620) | |
Arkansas | 0.177 (0.067) | −0.131 (0.178) | −0.091 (0.349) | 0.184 (0.056) | 0.152 (0.115) | |
California | 0.186 (0.053) | −0.017 (0.858) | 0.182 (0.059) | 0.187 (0.0528) | −0.205* (0.034) | 0.114 (0.240) |
Colorado | 0.003 (0.978) | 0.035 (0.718) | 0.040 (0.684) | 0.079(0.414) | 0.072 (0.456) | |
Connecticut | 0.064 (0.508) | 0.065 (0.503) | 0.103 (0.287) | 0.061 (0.528) | 0.100 (0.302) | |
Delaware | −0.167 (0.083) | 0.015 (0.874) | 0.036 (0.710) | −0.110 (0.259) | ||
Florida | 0.111 (0.252) | 0.040 (0.684) | 0.070 (0.474) | 0.025 (0.792) | 0.017 (0.861) | 0.028 (0.773) |
Georgia | −0.038 (0.699) | 0.035 (0.716) | 0.059 (0.546) | −0.076 (0.436) | −0.062 (0.524) | |
Hawaii | 0.023 (0.809) | 0.081 (0.406) | 0.084 (0.386) | |||
Idaho | −0.039 (0.690) | −0.082 (0.394) | −0.028 (0.771) | 0.171 (0.078) | 0.026 (0.792) | |
Illinois | 0.004 (0.970) | 0.082 (0.396) | 0.092 (0.346) | −0.006 (0.949) | 0.023 (0.814) | |
Indiana | −0.111 (0.254) | −0.026 (0.792) | 0.008 (0.937) | 0.007 (0.945) | −0.019 (0.841) | |
Iowa | −0.032 (0.744) | −0.127 (0.191) | −0.011 (0.906) | 0.024 (0.807) | −0.013 (0.892) | |
Kansas | −0.072 (0.457) | −0.021 (0.830) | −0.086 (0.378) | 0.017 (0.858) | −0.028 (0.775) | |
Kentucky | 0.081 (0.402) | −0.007 (0.941) | −0.150 (0.121) | −0.147 (0.130) | −0.146 (0.132) | |
Louisiana | −0.034 (0.727) | 0.123 (0.206) | 0.056 (0.563) | 0.045 (0.644) | 0.086 (0.377) | |
Maine | −0.110 (0.256) | 0.044 (0.650) | −0.118 (0.224) | −0.121 (0.212) | −0.025 (0.795) | |
Maryland | −0.081 (0.403) | 0.006 (0.952) | 0.041 (0.671) | 0.009 (0.925) | ||
Massachusetts | 0.020 (0.835) | 0.036 (0.710) | 0.107 (0.270) | −0.002 (0.985) | 0.078 (0.420) | |
Michigan | 0.053 (0.583) | 0.069 (0.478) | −0.017 (0.861) | 0.036 (0.710) | −0.07 (0.4291) | −0.016 (0.871) |
Minnesota | 0.099 (0.309) | 0.057 (0.557) | −0.009 (0.926) | 0.065 (0.502) | 0.092 (0.341) | 0.097 (0.319) |
Mississippi | −0.046 (0.639) | −0.016 (0.869) | 0.028 (0.775) | −0.112 (0.247) | −0.040 (0.680) | |
Missouri | 0.158 (0.102) | −0.038 (0.697) | −0.040 (0.680) | −0.011 (0.910) | −0.025 (0.796) | |
Montana | −0.169 (0.080) | −0.042 (0.668) | −0.063 (0.517) | 0.117 (0.226) | 0.057 (0.557) | |
Nebraska | −0.036 (0.712) | 0.198 (0.046) | −0.072 (0.460) | 0.000 (1.000) | ||
Nevada | −0.055 (0.570) | 0.029 (0.769) | −0.030 (0.758) | 0.170 (0.078) | 0.129 (0.183) | |
New Hampshire | 0.034 (0.729) | 0.157 (0.105) | 0.002 (0.980) | 0.065 (0.504) | 0.077 (0.429) | |
New Jersey | 0.026 (0.787) | −0.023 (0.814) | 0.005 (0.959) | 0.034 (0.724) | 0.010 (0.922) | |
New Mexico | −0.002 (0.981) | 0.063 (0.520) | 0.072 (0.457) | −0.015 (0.874) | 0.048 (0.621) | |
New York | 0.035 (0.722) | −0.112 (0.250) | 0.135 (0.165) | −0.113 (0.243) | 0.027 (0.782) | −0.022 (0.820) |
North Carolina | 0.034 (0.725) | −0.100 (0.302) | −0.119 (0.221) | −0.062 (0.523) | −0.113 (0.245) | |
North Dakota | 0.107 (0.269) | −0.052 (0.594) | −0.136 (0.161) | 0.165 (0.088) | 0.044 (0.651) | |
Ohio | −0.023 (0.812) | 0.003 (0.976) | 0.025 (0.794) | 0.057(0.557) | 0.042 (0.668) | |
Oklahoma | 0.007 (0.939) | −0.100 (0.303) | 0.009 (0.924) | −0.033 (0.733) | −0.037 (0.707) | |
Oregon | −0.021 (0.826) | −0.025 (0.798) | 0.113 (0.242) | −0.036 (0.714) | 0.053 (0.589) | −0.016 (0.871) |
Pennsylvania | −0.053 (0.585) | 0.014 (0.889) | −0.044 (0.651) | −0.060 (0.535) | 0.010 (0.919) | −0.035 (0.718) |
Rhode Island | 0.013 (0.892) | −0.030 (0.758) | −0.058 (0.555) | −0.043 (0.660) | ||
South Carolina | −0.141 (0.145) | −0.054 (0.581) | 0.016 (0.867) | −0.121 (0.213) | ||
South Dakota | −0.136 (0.159) | −0.012 (0.902) | −0.060 (0.538) | 0.100 (0.302) | 0.014 (0.883) | |
Tennessee | 0.009 (0.925) | −0.054 (0.576) | 0.067 (0.490) | 0.036 (0.713) | ||
Texas | 0.132 (0.173) | 0.106 (0.273) | 0.036 (0.708) | 0.124 (0.202) | −0.019 (0.850) | 0.074 (0.443) |
Utah | 0.047 (0.630) | 0.100 (0.303) | 0.034 (0.725) | −0.020 (0.840) | 0.064 (0.507) | |
Vermont | 0.045 (0.646) | 0.033 (0.732) | −0.056 (0.567) | 0.059 (0.543) | 0.057 (0.560) | |
Virginia | 0.0657 (0.500) | 0.067 (0.487) | 0.215 (0.025) | −0.023 (0.817) | 0.040 (0.677) | |
Washington | −0.223* (0.020) | −0.133 (0.171) | −0.102 (0.292) | −0.180 (0.061) | −0.163 (0.092) | −0.220* (0.022) |
West Virginia | 0.035 (0.722) | 0.105 (0.280) | −0.001 (0.993) | −0.046 (0.636) | −0.018 (0.850) | |
Wisconsin | −0.097 (0.318) | −0.132 (0.174) | −0.107 (0.271) | −0.065 (0.505) | −0.106 (0.276) | |
Wyoming | 0.147 (0.129) | −0.087 (0.368) | −0.126 (0.193) | 0.015 (0.873) | 0.026 (0.790) | 0.042 (0.668) |
. | Arsenic . | Lead and copper . | Coliform . | Treatment rule and nitrate . | Other . | Total . |
---|---|---|---|---|---|---|
Alabama | −0.037 (0.702) | 0.032 (0.746) | −0.166 (0.086) | 0.066 (0.496) | 0.062 (0.523) | |
Arizona | −0.005 (0.956) | −0.001 (0.991) | 0.002 (0.986) | 0.122 (0.209) | 0.048 (0.620) | |
Arkansas | 0.177 (0.067) | −0.131 (0.178) | −0.091 (0.349) | 0.184 (0.056) | 0.152 (0.115) | |
California | 0.186 (0.053) | −0.017 (0.858) | 0.182 (0.059) | 0.187 (0.0528) | −0.205* (0.034) | 0.114 (0.240) |
Colorado | 0.003 (0.978) | 0.035 (0.718) | 0.040 (0.684) | 0.079(0.414) | 0.072 (0.456) | |
Connecticut | 0.064 (0.508) | 0.065 (0.503) | 0.103 (0.287) | 0.061 (0.528) | 0.100 (0.302) | |
Delaware | −0.167 (0.083) | 0.015 (0.874) | 0.036 (0.710) | −0.110 (0.259) | ||
Florida | 0.111 (0.252) | 0.040 (0.684) | 0.070 (0.474) | 0.025 (0.792) | 0.017 (0.861) | 0.028 (0.773) |
Georgia | −0.038 (0.699) | 0.035 (0.716) | 0.059 (0.546) | −0.076 (0.436) | −0.062 (0.524) | |
Hawaii | 0.023 (0.809) | 0.081 (0.406) | 0.084 (0.386) | |||
Idaho | −0.039 (0.690) | −0.082 (0.394) | −0.028 (0.771) | 0.171 (0.078) | 0.026 (0.792) | |
Illinois | 0.004 (0.970) | 0.082 (0.396) | 0.092 (0.346) | −0.006 (0.949) | 0.023 (0.814) | |
Indiana | −0.111 (0.254) | −0.026 (0.792) | 0.008 (0.937) | 0.007 (0.945) | −0.019 (0.841) | |
Iowa | −0.032 (0.744) | −0.127 (0.191) | −0.011 (0.906) | 0.024 (0.807) | −0.013 (0.892) | |
Kansas | −0.072 (0.457) | −0.021 (0.830) | −0.086 (0.378) | 0.017 (0.858) | −0.028 (0.775) | |
Kentucky | 0.081 (0.402) | −0.007 (0.941) | −0.150 (0.121) | −0.147 (0.130) | −0.146 (0.132) | |
Louisiana | −0.034 (0.727) | 0.123 (0.206) | 0.056 (0.563) | 0.045 (0.644) | 0.086 (0.377) | |
Maine | −0.110 (0.256) | 0.044 (0.650) | −0.118 (0.224) | −0.121 (0.212) | −0.025 (0.795) | |
Maryland | −0.081 (0.403) | 0.006 (0.952) | 0.041 (0.671) | 0.009 (0.925) | ||
Massachusetts | 0.020 (0.835) | 0.036 (0.710) | 0.107 (0.270) | −0.002 (0.985) | 0.078 (0.420) | |
Michigan | 0.053 (0.583) | 0.069 (0.478) | −0.017 (0.861) | 0.036 (0.710) | −0.07 (0.4291) | −0.016 (0.871) |
Minnesota | 0.099 (0.309) | 0.057 (0.557) | −0.009 (0.926) | 0.065 (0.502) | 0.092 (0.341) | 0.097 (0.319) |
Mississippi | −0.046 (0.639) | −0.016 (0.869) | 0.028 (0.775) | −0.112 (0.247) | −0.040 (0.680) | |
Missouri | 0.158 (0.102) | −0.038 (0.697) | −0.040 (0.680) | −0.011 (0.910) | −0.025 (0.796) | |
Montana | −0.169 (0.080) | −0.042 (0.668) | −0.063 (0.517) | 0.117 (0.226) | 0.057 (0.557) | |
Nebraska | −0.036 (0.712) | 0.198 (0.046) | −0.072 (0.460) | 0.000 (1.000) | ||
Nevada | −0.055 (0.570) | 0.029 (0.769) | −0.030 (0.758) | 0.170 (0.078) | 0.129 (0.183) | |
New Hampshire | 0.034 (0.729) | 0.157 (0.105) | 0.002 (0.980) | 0.065 (0.504) | 0.077 (0.429) | |
New Jersey | 0.026 (0.787) | −0.023 (0.814) | 0.005 (0.959) | 0.034 (0.724) | 0.010 (0.922) | |
New Mexico | −0.002 (0.981) | 0.063 (0.520) | 0.072 (0.457) | −0.015 (0.874) | 0.048 (0.621) | |
New York | 0.035 (0.722) | −0.112 (0.250) | 0.135 (0.165) | −0.113 (0.243) | 0.027 (0.782) | −0.022 (0.820) |
North Carolina | 0.034 (0.725) | −0.100 (0.302) | −0.119 (0.221) | −0.062 (0.523) | −0.113 (0.245) | |
North Dakota | 0.107 (0.269) | −0.052 (0.594) | −0.136 (0.161) | 0.165 (0.088) | 0.044 (0.651) | |
Ohio | −0.023 (0.812) | 0.003 (0.976) | 0.025 (0.794) | 0.057(0.557) | 0.042 (0.668) | |
Oklahoma | 0.007 (0.939) | −0.100 (0.303) | 0.009 (0.924) | −0.033 (0.733) | −0.037 (0.707) | |
Oregon | −0.021 (0.826) | −0.025 (0.798) | 0.113 (0.242) | −0.036 (0.714) | 0.053 (0.589) | −0.016 (0.871) |
Pennsylvania | −0.053 (0.585) | 0.014 (0.889) | −0.044 (0.651) | −0.060 (0.535) | 0.010 (0.919) | −0.035 (0.718) |
Rhode Island | 0.013 (0.892) | −0.030 (0.758) | −0.058 (0.555) | −0.043 (0.660) | ||
South Carolina | −0.141 (0.145) | −0.054 (0.581) | 0.016 (0.867) | −0.121 (0.213) | ||
South Dakota | −0.136 (0.159) | −0.012 (0.902) | −0.060 (0.538) | 0.100 (0.302) | 0.014 (0.883) | |
Tennessee | 0.009 (0.925) | −0.054 (0.576) | 0.067 (0.490) | 0.036 (0.713) | ||
Texas | 0.132 (0.173) | 0.106 (0.273) | 0.036 (0.708) | 0.124 (0.202) | −0.019 (0.850) | 0.074 (0.443) |
Utah | 0.047 (0.630) | 0.100 (0.303) | 0.034 (0.725) | −0.020 (0.840) | 0.064 (0.507) | |
Vermont | 0.045 (0.646) | 0.033 (0.732) | −0.056 (0.567) | 0.059 (0.543) | 0.057 (0.560) | |
Virginia | 0.0657 (0.500) | 0.067 (0.487) | 0.215 (0.025) | −0.023 (0.817) | 0.040 (0.677) | |
Washington | −0.223* (0.020) | −0.133 (0.171) | −0.102 (0.292) | −0.180 (0.061) | −0.163 (0.092) | −0.220* (0.022) |
West Virginia | 0.035 (0.722) | 0.105 (0.280) | −0.001 (0.993) | −0.046 (0.636) | −0.018 (0.850) | |
Wisconsin | −0.097 (0.318) | −0.132 (0.174) | −0.107 (0.271) | −0.065 (0.505) | −0.106 (0.276) | |
Wyoming | 0.147 (0.129) | −0.087 (0.368) | −0.126 (0.193) | 0.015 (0.873) | 0.026 (0.790) | 0.042 (0.668) |
*Statistically significant at p<0.05 level.
DISCUSSION
In 2019, 78% of men and 65% of women used a social media platform of any kind (Pew Research Center 2019). When focusing on the topic of water, we found a slightly higher number of posters were male when compared with females. This is likely a reflection of those on social media. Interestingly, a lower percentage of younger people were posting about water when compared with the older age groups. Young people were the earliest adopters of social media, and although the gap in usage has decreased, they are still more prevalent on social media platforms (Pew Research Center 2019). This may indicate that issues related to tap water are of a greater interest to older individuals participating in online media.
The number of violations and net sentiment were uncorrelated for most states with the exception of California and Washington. The correlation in California could be a result of higher numbers of weekly social media posts that may better reflect net sentiment related to drinking water issues. For Washington state, it could be a result of both a higher number of violations and a relatively high number of weekly posts. The results of these two states may begin to provide insights on why sentiment from social media posts do not reflect the water quality in the other states. However, without further investigation, it is not possible to conclude that correlations could become significant if consumers in other states posted more frequently about their water quality. Also, given the limitations and issues surrounding correctly identifying the location of posts and posters, some data could not be assigned to a specific state. In addition, given the current legislation surrounding the level of granularity social media/online data scraping can reach and privacy desires of individuals, it is unclear if additional and more correct location data will ever be made available.
Although the results indicate that it is unlikely that current U.S. social media data can be used to indicate water quality, potential water quality violations, or the need for additional testing, the data collected are still compelling. There are two trends that can be found in the data. One trend is that some words related to the posts are not necessarily related to USEPA violations, such as taste, good, and terrible. Although researchers find that lead and copper in drinking water could result in metallic smells and tastes (Burlingame & Mackey 2007), other contaminants like arsenic require lab testing. Additionally, based on the spikes in both mentions and sentiment, many posts were related to large national stories. For states in the U.S. that are trying to attract new residents or visitors, it is important to note that water quality issues are no longer just a local problem. Incidents of water quality issues appear to reverberate across the internet, potentially with long-term reputational impacts.
CONCLUSION
Quantifying the inconsistency of consumer perception and reported water quality and safety can be a useful practice to provide some insight to state water managers and policymakers. Comparing the numbers of violations reported by the USEPA with social media net sentiment scores at the state level over the same period, we found no statistically significant relationships between these two measurements except for California and Washington states.
In this research, we used numbers of USEPA reported violations as a proxy of tap water quality and safety, while sentiment scores reflect consumers' perception of tap water, either positive or negative. Most of the states with higher numbers of USEPA violations did not have negative net sentiment scores. Of the top five highest violation states, only Arizona had a negative average net sentiment. As all states continue working to comply with federal and state water quality and safety standards, it is crucial to educate consumers regarding the water safety standards and make them aware of the health-related violation reports.
Social media demographic data show that older people were more engaged with the topic of water quality compared with younger people. Water managers and policymakers should consider this factor when trying to communicate with consumers on tap water quality information or future quality improvement. Analysis of the social media data also revealed that larger-scale violations in one state were discussed across the entire country, often eclipsing any local water issues that may be happening. Bad water violations can impact the good standing of a state or city beyond local residents.
Our research focused on state-level data because of current inconsistent access to geo-tagged social media data. Due to limitations caused by the number of cellphone towers that are used to geo-tag social media posts, drilling down to the county level was not possible for states that are more rural. As sentiment analysis technology develops, it may be possible for future research to evaluate social media net sentiment at more granular levels. This research could be extended to include local water sensory tests to identify PWS that need to improve water taste (the USEPA secondary standard), although this is not currently enforced by federal policies and legislations.
There are additional limitations to social media data collection and language processing. First, demographic data from online content are difficult to quantify; hence, it was reported as a summary sense and was not included in analysis. Unless self-reporting of demographic data becomes more common on social media platforms, it is unlikely that it can be used for extensive analysis. Second, although precautions were taken to improve language processing by having a subset of posts inspected by the researchers, there is still a chance that misclassification of sentiment occurred. The magnitude of data as well as privacy laws currently prevent the human inspection of all posts. Additionally, linguistic trends such as sarcasm can be difficult to interpret when presented textually and are often difficult to identify in spoken language as well. Third, location identification on social media posts is also a challenging issue. In this research, we used an already established, tailorable algorithm and cast a wide net for sentimental analysis. Future research may seek to limit the media sources studied to only those that explicitly provide location information.
DATA AVAILABILITY STATEMENT
The datasets generated during and/or analyzed for the current study are available from the corresponding author on reasonable request.