United States Environmental Protection Agency (USEPA) drinking water violation report is currently one of the most reliable measures of evaluating United States drinking water quality. While states continuously strive to comply with federal water quality standards making this documentation continuously relevant, consumers are likely to perceive water quality through sensory aesthetics or physical and virtual social networks. This research quantifies the relationship between consumer perceptions and government-reported drinking water quality to provide insights to state water managers and policymakers. We evaluated consumer perceptions of tap water using weekly social media data. The online search returned 898,709 mentions and 799,035 posts. Net sentiment, measured as the number of negative posts minus the number of positive posts divided by the number of posts expressing sentiment, was determined and ranged from −100 to 100. Net sentiment was uncorrelated with USEPA weekly water quality violations for most states. Net sentiment was correlated with violations related to arsenic standards (−0.223) and a total number of violations (−0.220) for Washington. For California, net sentiment was correlated with violations related to disinfectants and other organic compounds (−0.295). In many cases, water violations in one city became national news, which eclipsed local water issues circulating on social media.

  • Estimated state sentiment scores on tap water perception in the U.S.

  • Compiled government agency data on water quality violation report.

  • Found no correlation between sentiment scores and government agency data for most states.

  • Increasing consumer engagement and awareness of violation report data is needed.

Quality and acceptance of drinking water distributed by public water systems (PWS) is essential to consumer health and well-being. Drinking safe, good-quality water is important for avoiding the negative health consequences of contaminated water (USEPA 2015a). If consumers perceive drinking water to be unsafe, regardless of the true water quality status, consumers who can afford to will increase expenditures on bottled water or other filtrated water sources to minimize negative health outcomes. Research finds that when United States (U.S.) consumers perceive tap water to be unsafe, they are likely to switch to more expensive bottled water. This occurs despite the lack of evidence that bottled water is safer than tap water and the negative impact plastic bottles have on the environment (Opel 1999; Hu et al. 2011; Saylor et al. 2011).

In the U.S., safe drinking water is assured by legislative regulations and standards, implementation and enforcement of these regulations and standards, and government funding to support and maintaining infrastructure. The Safe Drinking Water Act (SDWA), passed in 1974 and amended in 1986 and 1996, forms the basis of public drinking water legislation. The SDWA sets legal standards for drinking water contaminant levels and treatment protocols to protect public health under the administration of the United States Environmental Protection Agency (USEPA) and state agencies. To ensure compliance, PWS must inspect and report contaminant levels to the states following the water supply guidance manual (USEPA 2000). States review the results and may conduct their own tests on water samples. The USEPA reviews state violation reports and assists PWS with compliance (Tiemann 2014). PWS are responsible for notifying the public of any health-threatening violation within 24 h (Tiemann 2014). The USEPA also provides financial assistance to support rural water systems, projects addressing exceedance of acceptable lead levels, and watershed protection infrastructure. Over $32 billion has been spent on 13,183 projects from 1997 to 2016 to improve water quality (USEPA 2015a). Before the SDWA, about 40% of PWS did not meet federal standards (USEPA 1999). Today, the U.S. has some of the safest drinking water in the world. More than 90% of U.S. tap water meets all standards set by the SDWA and similar policies (Salzman 2014). Despite this success, political and scientific challenges remain, including providing the same high water quality standards to all Americans (Weinmeyer et al. 2017). There are over 150,000 PWS in the U.S. and most of them are identified as small systems that serve less than 10,000 people (USEPA 2015b). Over 90% of these systems rely on groundwater, and the remainder on surface water (USEPA 2015b). This scope and variation make it difficult for all PWS to comply consistently with SDWA standards. For example, in 2019, the USEPA issued more than 4,500 severe health standard violations. The latest Natural Resource Defense Council (NRDC) report also found that the correlation between inadequate standard enforcement and certain disadvantaged groups is high, and information regarding water contamination should be accessible and conveyed more efficiently (Fedinick et al. 2019).

While the government agencies rely on lab testing and onsite monitoring to ensure tap water complies with health standards, consumers put their trust in the institution. At the same time, consumers evaluate tap water quality through sensory aesthetics including taste, odor, and color. Research finds that sensorial information affects consumer perceptions of drinking water quality (de França Doria 2010; WHO 2017). Consumers may also associate negative drinking water sensory aesthetics to health risks (Jardine et al. 1999; Arnedo-Pena et al. 2003; de Franca Doria et al. 2005; Schade et al. 2015; WBTV 2019). In addition, public trust in PWS varies across demographic groups (Pierce & Gonzalez 2017).

Consumers' ability to recognize changes in taste and odor of tap water may provide early warning signs of water quality deterioration, but these cues are generally limited (Whelton et al. 2007). Untrained consumers may detect taste and odor changes only when certain mineral or contaminant levels in tap water have reached a threshold that humans can perceive (Young et al. 1996; Dietrich & Gallagher 2013). Although some contaminants such as iron and copper affect the taste of water, other violations identified by the USEPA may be undetectable. Having a better understanding of how people communicate their water quality perceptions is helpful for identifying potential or ongoing water quality issues.

This study aims to provide insight into the relationship between consumer perceptions of tap water quality expressed on social media and water quality standard violations reported by the USEPA. We summarize the weekly online and social media content related to tap water to proxy consumer perceptions of tap water. These data are used to determine the net sentiment (number of positive posts minus the number of negative posts divided by the total number of posts with sentiment then multiplied by 100) for each of the lower 48 states. We analyzed the correlation between sentiment score and recorded weekly number of violations over the same period to explore the relevancy of social media data to actual water quality status as defined by government agency data.

Chloride, copper, iron, sulfate, manganese, and zinc affect the taste of water. Existing research finds relationships between water taste and quality. For example, consumers may interpret the subtle taste of chlorine as a sign of safe drinking water (Kelly & Pomfret 1997), but would consider higher levels undesirable (Bryan et al. 1973). Elevated levels of certain elements may cause water to taste salty, metallic, or bitter (Burlingame & Mackey 2007). Metallic and astringent tastes, experienced as a lingering aftertaste, more often arise from the corrosion or leaching of copper and iron (Burlingame & Mackey 2007). Consumers also may find water with low or no mineral content to taste flat (Burlingame & Mackey 2007). Water sensory studies are typically conducted in labs with trained panelists to generate consistent results (Bartels et al. 1986; Dietrich 2006). However, these settings may not accurately represent the broader population (Burlingame & Doty 2018).

Outside laboratory settings, consumers exchange information about tap water quality through physical or virtual social networks. Can unprompted virtual conversations about water help identify potential water quality standard violations? In recent years, there has been increasing attention focused on the collection and use of social media online data by individuals, industries, governments, and researchers. Social media and online data have been used for various purposes ranging from emergency management (Panagiotopoulos et al. 2014, 2016) to earthquake detection and evaluation (Earle 2010; Sakaki et al. 2010; Mendoza et al. 2019). Online reviews or comments are factored into consumer decision-making (Kim et al. 2008; Ruiz-Mafe et al. 2018). Private businesses conduct sentiment analysis of online content to understand consumer perceptions of various services and products (Chiarello et al. 2020). Government agencies and lawmakers could use online platforms as communication channels to convey information and understand consumer perceptions on various social issues (Bonsón et al. 2012; Kavanaugh et al. 2012; Driss et al. 2019). Despite its use for other products and events, to the authors' knowledge, social media data sentiment analysis has not been applied to consumer perceptions of tap water and its implications for positive health outcomes. This research contributes to the application of social media data on the topic of tap water that builds on previous research on sentiment analysis and further expands the literature on social media analysis.

Social media data and net sentiment

There are many database and web search engines and platforms available to collect information and data from internet content. Some platforms are tailored to news and business sources, such as LexisNexis, while others focus more on marketing and sales. Researchers, in collaborating with computer scientists, can also develop their own algorithms. This research employed the NetBase platform to collect social media data, including Twitter or other content such as blogs (NetBase 2020). NetBase is one of the leading social media search engines. The platform provides full service to users and offers access to a wide range of social media content. Similar to other full-service search engine platforms, NetBase's patented search engine employs natural language processing (NLP) system and artificial intelligence (AI) tools to conduct sentiment analysis and classify posts into different categories (Li et al. 2003). NLP search engines provide accurate and reliable results for semantic searches with well-trained AI tools (Hayes et al. 2021). Previous academic research incorporates data from the NetBase platform including perception on mosquito-borne and food-borne illness and threats (Jung et al. 2021; Widmar et al. 2021), public engagement in showcasing livestock events (Mahoney et al. 2020), and product development for private businesses (Carr et al. 2015).

Data collected in this research was not limited to any particular social media. Since there is no previous work establishing water quality communication on social media, a wide net was cast to analyze all data. Blogs, travel websites, and other such platforms could and do include the discussion of water quality. Weekly volume data were collected from 12:00 AM of December 16, 2019 to 11:59 PM of January 9, 2021, resulting in 108 weeks of data for each state. Information was downloaded on March 11, 2021. It is important to note the date was downloaded because online posts may be removed or reinstated by the author or moderator at any point after posting.

A query including 13 terms was developed to gather social media posts related to tap water quality. Terms included: tap water, #tapwater, city water, #citywater, public water, #publicwater, water from the tap, piped water, tap-water, #tap-water, faucet water, water from the faucet, and mains water. Geography was limited to the lower 48 states and posts in English. The collection of data in other languages is possible, but given the frequent use of slang and other shorthand terms in social media posts, the fluency required for interpretability is high. The authors, therefore, chose to focus on posts exclusively in English. A more inclusive sample would include Spanish and other language posts. To compare social media data between states, data were limited to posts for which a location could be identified. For some social media data such as Twitter, the application programming interface allows us to determine the location of the post when a Twitter Place (‘geo-tag’) is attached to the posting. If there is no ‘geo-tag’ associated with the posting, the user registered location is used. Facebook and Instagram, two other major social media outlets, do not provide geolocation data. At the national level, the location of posts can be determined broadly through the internet domain. For example, the country of origin can be determined by addresses that include .uk or .fr as well as codes within the domain. It is important to note that not all posts contained enough geographical data to even begin to determine state-level location. Difficulty determining the location at the time of post is a limitation of not only this research, but all social media data. Zheng et al. (2018) summarized existing approaches of social media post location identification and confirm that location identification of social media posts is a complicated issue requiring additional researcher attention. We aimed to be transparent by noting the number of posts we were able to classify for each state and for other demographic categories. Only a subset of the total number of data collected was able to be further classified at the state level.

An important part of collecting social media data is insuring that the data reflect the targeted topic. Due to slang and colloquial phrasing, in addition to bots, a random subset of posts collected using the keywords were manually checked by the researchers to determine if they were related to the search topic. Phrases and terms unrelated to tap water quality were removed. For example, advertisements related to the shoe Nike SB GT Blazer contained water-related terms and were disqualified from inclusion. Phrases excluded due to bot (an autonomous program on the internet that can interact with other systems or users) promotion included #notesfromnationalemergency. The focus of this research is water quality, not other water-related issues including the cost of water. Therefore, the terms water disconnected and water is disconnected were excluded from search results.

Both the number of posts and mentions found using the search criteria were recorded. Posts are the number of documents containing mention of the topics (NetBase 2020). Mentions are individual sentences within the post that mention the primary terms, in this case the tap water-related terms (NetBase 2020). The number of mentions will never be less than the number of posts, as each post will contain at least one, if not multiple mentions. For example, in a single blog post, someone may say ‘The tap water in city A is wonderful. However, the tap water in city B is terrible.’ This blog post would count as one post, but contains two mentions of tap water, one which would be classified as negative, and the other positive. The number of posts and mentions was recorded weekly for the study period and reflects the volume of data on social media.

Retweets are a process on Twitter where a person can, for all intents and purposes, quote another person's post and either add additional text at the top or not. The retweets can be either removed to reduce the amplification of the opinion if they are not providing additional information to a specific problem or kept to capture the influence of a broader issue depending on the purpose of the research (Gasco et al. 2019). Our analysis is on the general perceptions of water quality, in addition to state-specific analyses. Therefore, we wanted to be able to identify spikes in the overall number of posts and mentions. We, hence, left retweets in the dataset. In terms of sentiment analysis, the original tweet, and any additional text that was added to a quoted or ‘retweeted’ tweet, was analyzed separately.

Demographic information was also determined for this analysis. Determining non-stated demographics from social media often requires several assumptions. Self-reported information through the author's profile included gender, interests, and profession (NetBase 2016). Self-reported gender is available for Twitter data, with Twitter allowing participants to select from male, female, or write in their preferred gender. For those who do not declare a gender, gender was specified based on the popularity of the posters name for males and females. Therefore, only male and female tweets are reported. We acknowledge that this limits reporting to binary genders. Age was imputed by the NetBase software using U.S. social security administration data. Trends in names occur over time, and the stated name of the author of the post or blog was used to probabilistically determine their birth year based on that names frequency (popularity) for any given birth year (NetBase 2016). Research had shown that gender and age can be inferred from social media user identifiers and first and last names (Lansley & Longley 2016; Hu et al. 2021) and may achieve 77–95% accuracy (Tang et al. 2011). However, this form of identification still leaves much to be desired. Many people have social media handles that do not include their names at all. This limits the number of posts that can be identified. We reported the number of posts that can be identified for each demographic category.

Self-reported professions are available for Twitter, Instagram, and several other smaller sources. Self-reported professions were grouped into categories. The domains and sources of posts were also recorded. A source provides a general idea of where a post appeared, for example on a news site. A domain is a more detailed example of where the post appeared, for example cnn.com.

Beyond the measurement of volume, whether people were talking about tap water in a positive or negative manner was important to determine if these data could be used to assess water quality. The positivity or negativity of each post was determined using NLP (NetBase 2015). Although social media sentiment analysis is improving, there are still some topic-specific instances that require human corrections. For example, although the word ‘frightening’ is negative in most instances, if one is doing research on the U.S. holiday Halloween, it may be positive, or neutral. Therefore, the sentiment of approximately 10% of posts were spot-checked by researchers to ensure sentiment assignment accuracy. Posts were read and assigned either a negative, positive, or neutral sentiment and were then checked against the algorithm's assignment. Adjustments can be made by reassigning words to be either positive, negative, or neutral as needed. For this analysis, no reassignment was necessary. The top five positive and negative attributes, emotions, terms, and hashtags found in posts were reported. It is important to note that for items such as hashtags and terms, the language surrounding the hashtag or term was analyzed to determine negativity or positivity. This may result in the same term or hashtag appearing as both negative and positive based on the surrounding language.

Weekly net sentiment was calculated for the time period studied. Net sentiment is the number of positive posts minus the number of negative posts divided by the total number of posts with sentiment multiplied by 100 (Equation (1)):
(1)

Net sentiment is bound between −100 and 100. Expressing net sentiment as a percentage is simply an aesthetic convenience. It could also be presented as a decimal. We calculated the weekly net sentiment score, which provides changes of net sentiment over time while controlling for volume. The average, standard deviation, minimum, and maximum for the net sentiment values of each state during the study period were reported.

USEPA data

The USEPA measures water quality with predetermined standards and rules. The agency also reports violations if a water supply system does not meet the standards and rules. Following (Allaire et al. 2018) classification of violations, USEPA data (USEPA 2020a, 2020b) were downloaded on March 11, 2021. To match the social media data period, only violations occurring between December 16, 2018 and January 9, 2021 were included. Violations for total coliform, treatment rule and nitrate, arsenic, lead and copper, and other violations were included and categorized following Allaire et al. (2018). The ‘Other violations’ includes violation on ‘stage 1 disinfectants and disinfection byproducts by rule’, ‘stage 2 disinfectants and disinfection byproducts rule’, ‘inorganic chemicals’, ‘volatile organic chemical’, and ‘synthetic organic chemicals and radionuclides’. The USEPA determines water quality standards for each of the categories (e.g., arsenic). For arsenic, the USEPA has determined that the allowable amount for drinking water is 0.01 mg/l or 10 parts per billion (ppb) (USEPA 2020a, 2020b). Any amount greater than the acceptable amount (determined based on human health criterion) results in a violation. Therefore, the more violations a state has, the lower the water quality from a human health and safety perspective.

We restricted violation counts to Community Water Systems (CWSs). A CWS serves at least 25 people at their primary residence. CWSs serve year-round populations and are subject to SDWA regulations. Our collection differed from Allaire et al. (2018) in that they only included CWSs serving more than 500 people. We opted to include rural communities to better match the social media data which were not limited to only less rural or urban areas. Violations for each category were downloaded for each state, then summarized to weekly violation numbers by each category to match the net sentiment time frame. The number of violations was added across all violation categories to enumerate the total number of violations for each state.

Correlation between social media data and number of violations

We analyzed the association between the net sentiment and the number of violations within each category, including coliform violations (Coliform), lead and copper violations (Lead and copper), arsenic violations (Arsenic), violations of treatment rule and nitrate level (Treatment rule and nitrate), and other violations (Other), and the total number of violations (Total) for each state over the time period studied individually.

For each state, the correlation of net sentiment and violation numbers for each category j was measured using the Pearson correlation coefficient, (Equation (2)):
(2)
where are the net sentiment values for week i, and the mean net sentiment of the state is denoted . are the number of violations for violation category j (e.g., coliform) in week i. STATA software was used to calculate the correlation coefficients (STATA 2019). Statistical significance was reported via -value.

Social media volumes and net sentiment

The online search returned 898,709 mentions and 799,035 posts. Of the posts with identifiable gender (220,706), 54% were men and 46% were women (Table 1). The age of the posters was evenly distributed. Only 10% of posters (n = 223,578) were under 18, 19% were 55–64, and 17% were 25–34. Family was most often self-reported as an interest (30%, n = 107,660), followed by politics (24%) and religion (17%). Interestingly, for a water-related search, only 9% of posters listed food and drink as an interest. Top professions (n = 58,770) included creative arts (43%), education (9%), journalism (7%), student (7%), and science and research (7%). Examples of professions that fall under creative arts include actor, composer, painter, film director, and model. For education, example professions include lecturer, professor, and teacher. Science and research professions include jobs such as researcher, scholar, scientist, and biologist.

Table 1

Social media data demographics

CategoryPercentage of posts
Gender n = 220,706 
 Male 54 
 Female 46 
Implied age n = 223,578 
 <18 10 
 18–24 12 
 25–34 17 
 35–44 15 
 45–54 14 
 55–64 19 
 65+ 12 
Interests n = 107,660 
 Family 30 
 Politics 24 
 Religion 17 
 Food and drink 
 Pets 
Profession n = 58,770 
 Creative arts 43 
 Education 
 Journalism 
 Student 
 Science and research 
Domains n = 697,515 
 twitter.com 56 
 reddit.com 
 forum.grasscity.com 
 tripadvisor.com 
 booking.com 
Sources n = 697,515 
 Twitter 57 
 Forums 20 
 Blogs 12 
 News 10 
 Consumer reviews <1 
CategoryPercentage of posts
Gender n = 220,706 
 Male 54 
 Female 46 
Implied age n = 223,578 
 <18 10 
 18–24 12 
 25–34 17 
 35–44 15 
 45–54 14 
 55–64 19 
 65+ 12 
Interests n = 107,660 
 Family 30 
 Politics 24 
 Religion 17 
 Food and drink 
 Pets 
Profession n = 58,770 
 Creative arts 43 
 Education 
 Journalism 
 Student 
 Science and research 
Domains n = 697,515 
 twitter.com 56 
 reddit.com 
 forum.grasscity.com 
 tripadvisor.com 
 booking.com 
Sources n = 697,515 
 Twitter 57 
 Forums 20 
 Blogs 12 
 News 10 
 Consumer reviews <1 

Twitter.com (a social media platform) made up the largest percentage of posts with a domain (n = 697,515) of 57%. Reddit.com (a large collection of online forums) was the second-largest domain at 7%. All other domains were less than 1%. Top sources (n = 697,515) included Twitter (57%), forums (20%), blogs (12%), news (10%), and consumer reviews (<1%).

Spikes in the number of posts and mentions occurred as a result of a wide range of events (Figure 1). Around January 27, 2019, a spike in mentions and posts occurred as news of a report regarding the Flint Michigan water crisis became public (Eggert 2019). This report outlined the crucial errors made by the staffers in the Department of Environmental Quality's drinking water office. The discussion surrounding water quality issues in the southeastern states was a large driver of posts around July 7, 2019 (Fouriezos 2019). Additional discussion surrounded poor tap water quality at Parchman state prison in Michigan. The spike around August 11, 2019 was primarily driven by issues with tap water in Newark New Jersey with reports of officials offering bottled water (Fitzsimmons 2019). Additional posts were driven by another boil water notice in Marshfield Massachusetts (Kukstis 2019). More brain-eating microbes drove posts around September 20, 2020 when the Houston Texas area was warned to stop using tap water (abcNews 2020). Spikes around November 15, 2020 were the result of a discussion of water filtration in Men's Journal (Men's Journal Editor 2020).

Figure 1

Weekly national number of posts and mentions over the study period.

Figure 1

Weekly national number of posts and mentions over the study period.

Close modal

When considering the positivity or negativity of words and hashtags, it is important to remember the surrounding words which were analyzed to determine context. Top positive attributes included best tap water (10%), safe to drink (10%), taste (5%), clean tap water (3%), and clean (2%) (Table 2). Negative attributes mirrored the positive with taste (10%), brain-eating ameba (5%), contaminate with toxic (5%), contaminate (5%), and not safe to drink (4%). Leading emotions were unsurprising, with the words good (22%) and best (15%) topping the list for positive. For negative emotions, bad (6%) and warn (5%) topped the list. For both negative and positive terms, drink (9%) made up the highest percentage followed by use (positive 6%) and using (negative 5%). The negativity or positivity of the word drink was dependent on the sentiment of the words surrounding it. Safe, good, and filter all made up 3% each of the top positive terms. Not drink, residents, and bottled water accounted for 3, 2, and 2% of the negative terms, respectively. Although residents may be a surprising term when considering water quality, many people were talking about the plight of residents who were experiencing poor water quality, such as those in Flint Michigan. This resulted in the term ‘resident' being a negative term. Top positive hashtags included #covid19 (7%), #water (6%), #coronavirus (5%), and #drinktap (3%). Many posts related to COVID-19 were people explaining that water was safe during the pandemic. The safe status of water during the pandemic was accompanied by positive language, rendering #covid19 as positive in the context of water discussions. Top negative hashtags included #flintwatercrisis (6%), #saveflintchallenge (5%), #metrodetroit (4%), #detroit (4%), and #water (3%).

Table 2

Attributes, emotions, terms and hashtags from social media data

PositivePercentNegativePercent
Top attributes n = 24,204 Best tap water 10 Taste 10 
Safe to drink 10 Brain-eating ameba 
Taste Contaminate with toxic 
Clean tap water Contaminant 
Clean Not safe to drink 
Top emotions n = 32,442 Good 22 Bad 
Best 15 Warn 
Great Not like 
Love Terrible 
Delicious Hate 
Top terms n = 238,814 Drink Drink 10 
Use Using 
Safe Not drink 
Good Residents 
Filter Bottled water 
Top hashtags n = 6,965 #covid19 #flintwatercrisis 
#water #saveflintchallenge 
#coronavirus #metrodetroit 
#drinktap #detroit 
#utah #water 
PositivePercentNegativePercent
Top attributes n = 24,204 Best tap water 10 Taste 10 
Safe to drink 10 Brain-eating ameba 
Taste Contaminate with toxic 
Clean tap water Contaminant 
Clean Not safe to drink 
Top emotions n = 32,442 Good 22 Bad 
Best 15 Warn 
Great Not like 
Love Terrible 
Delicious Hate 
Top terms n = 238,814 Drink Drink 10 
Use Using 
Safe Not drink 
Good Residents 
Filter Bottled water 
Top hashtags n = 6,965 #covid19 #flintwatercrisis 
#water #saveflintchallenge 
#coronavirus #metrodetroit 
#drinktap #detroit 
#utah #water 

In general, the net sentiment for the U.S. drinking water was barely positive at 3%. There were several events that caused either negative or positive spikes in net sentiment (Figure 2). Many of these events are the same events that caused spikes in the number of mentions and posts. Net sentiment dropped to −51% around January 27, 2019 because of the issues with tap water quality in Flint Michigan. Around April 21, 2020, net sentiment increased to 47% due to general discussion regarding quality tap water. Mirroring the spike in the number of posts and mentions, net sentiment around August 11, 2019 fell to −20% because of the unsafe tap water in New Jersey and Massachusetts. Another brain-eating ameba induced a negative spike in net sentiment, −76, around September 20, 2020 pertaining to the tap water Amebas in Houston Texas. A positive spike in net sentiment for water occurred around November 15, 2020 due to the Men's health article.

Figure 2

Weekly national net sentiment over the study period.

Figure 2

Weekly national net sentiment over the study period.

Close modal

The average net sentiment at the state level as well as the minimum, maximum, and standard deviation for each state are presented in Table 3. States with higher average net sentiment included Louisiana (46), Wyoming (44), Iowa (23), Minnesota (19), and Indiana (17). Vermont had the lowest average net sentiment (−62), followed by Arkansas (−42), North Dakota (−32), Georgia (−30), and Idaho (−28). For sentiment scores between −2 and 2, there are California (−2), Illinois (−2), Michigan (−1), North Carolina (−1), South Carolina (−1), Florida (1), New Hampshire (1), Massachusetts (2), and Ohio (2).

Table 3

Statistics of net sentiment by state

StateNumber of postsMinimumMaximumAverage net sentiment (Standard deviation)
Alabama 708 −100 100 −15 (48) 
Arizona 1,506 −100 100 −6 (45) 
Arkansas 313 −100 60 −42 (48) 
California 11,960 −100 100 −2 (47) 
Colorado 2,252 −67 100 15 (46) 
Connecticut 596 −100 100 5 (39) 
Delaware 180 −100 60 9 (41) 
Florida 5,096 −100 93 1 (45) 
Georgia 2,424 −100 100 −30 (46) 
Idaho 272 −100 60 −28 (62) 
Illinois 2,846 −100 100 −2 (51) 
Indiana 895 −100 100 17 (61) 
Iowa 535 −96 100 23 (56) 
Kansas 645 −100 100 −11 (55) 
Kentucky 1,010 −100 100 9 (39) 
Louisiana 490 −60 100 46 (38) 
Maine 301 −60 100 12 (54) 
Maryland 1,302 −100 100 −8 (54) 
Massachusetts 1,805 −100 100 2 (51) 
Michigan 2,434 −100 100 −1 (48) 
Minnesota 1,219 −100 100 19 (58) 
Mississippi 251 −90 100 −3 (57) 
Missouri 1,721 −100 100 16 (51) 
Montana 151 −100 60 −11 (66) 
Nebraska 293 −100 100 −19 (47) 
Nevada 1,231 −100 100 7 (42) 
New Hampshire 204 −100 60 1 (41) 
New Jersey 1,508 −100 100 −16 (58) 
New Mexico 357 −100 67 7 (49) 
New York 6,367 −100 100 6 (49) 
North Carolina 2,050 −83 100 −1 (43) 
North Dakota 130 −100 33 −32 (49) 
Ohio 2,433 −100 100 2 (52) 
Oklahoma 636 −67 100 3 (47) 
Oregon 1,518 −100 100 10 (47) 
Pennsylvania 2,762 −100 100 3 (47) 
Rhode Island 205 −100 100 10 (46) 
South Carolina 700 −100 100 −1 (60) 
South Dakota 68 −100 100 −28 (56) 
Tennessee 1,549 −100 100 −5 (62) 
Texas 8,365 −84 100 3 (46) 
Utah 484 −100 100 −20 (40) 
Vermont 71 −100 100 −62 (48) 
Virginia 1,551 −100 100 −4 (45) 
Washington 2,154 −100 100 7 (47) 
West Virginia 301 −60 60 10 (23) 
Wisconsin 857 −91 100 −12 (44) 
Wyoming 100 100 44 (34) 
StateNumber of postsMinimumMaximumAverage net sentiment (Standard deviation)
Alabama 708 −100 100 −15 (48) 
Arizona 1,506 −100 100 −6 (45) 
Arkansas 313 −100 60 −42 (48) 
California 11,960 −100 100 −2 (47) 
Colorado 2,252 −67 100 15 (46) 
Connecticut 596 −100 100 5 (39) 
Delaware 180 −100 60 9 (41) 
Florida 5,096 −100 93 1 (45) 
Georgia 2,424 −100 100 −30 (46) 
Idaho 272 −100 60 −28 (62) 
Illinois 2,846 −100 100 −2 (51) 
Indiana 895 −100 100 17 (61) 
Iowa 535 −96 100 23 (56) 
Kansas 645 −100 100 −11 (55) 
Kentucky 1,010 −100 100 9 (39) 
Louisiana 490 −60 100 46 (38) 
Maine 301 −60 100 12 (54) 
Maryland 1,302 −100 100 −8 (54) 
Massachusetts 1,805 −100 100 2 (51) 
Michigan 2,434 −100 100 −1 (48) 
Minnesota 1,219 −100 100 19 (58) 
Mississippi 251 −90 100 −3 (57) 
Missouri 1,721 −100 100 16 (51) 
Montana 151 −100 60 −11 (66) 
Nebraska 293 −100 100 −19 (47) 
Nevada 1,231 −100 100 7 (42) 
New Hampshire 204 −100 60 1 (41) 
New Jersey 1,508 −100 100 −16 (58) 
New Mexico 357 −100 67 7 (49) 
New York 6,367 −100 100 6 (49) 
North Carolina 2,050 −83 100 −1 (43) 
North Dakota 130 −100 33 −32 (49) 
Ohio 2,433 −100 100 2 (52) 
Oklahoma 636 −67 100 3 (47) 
Oregon 1,518 −100 100 10 (47) 
Pennsylvania 2,762 −100 100 3 (47) 
Rhode Island 205 −100 100 10 (46) 
South Carolina 700 −100 100 −1 (60) 
South Dakota 68 −100 100 −28 (56) 
Tennessee 1,549 −100 100 −5 (62) 
Texas 8,365 −84 100 3 (46) 
Utah 484 −100 100 −20 (40) 
Vermont 71 −100 100 −62 (48) 
Virginia 1,551 −100 100 −4 (45) 
Washington 2,154 −100 100 7 (47) 
West Virginia 301 −60 60 10 (23) 
Wisconsin 857 −91 100 −12 (44) 
Wyoming 100 100 44 (34) 

State violations

The state violations summed over the study period are presented in Table 4. States with the lowest number of total violations over the study period are Florida (20), Louisiana (38), South Carolina (51), and Ohio (53). States with the highest number of total violations were Texas (5,059), Pennsylvania (3,846), Washington (2,370), Oklahoma (2,093), and West Virginia (1,618). The leading violations are ‘Arsenic’, ‘Lead and copper’, ‘Treatment rule and nitrate’, and ‘Other’. The total number of ‘Coliform’ violation counts during the study period is 239, while ‘Other’ accrued to 11,884 counts. The top three states of ‘Arsenic’ violations are Washington (416), California (439), and Texas (542), of ‘Lead and copper’ violations are Kansas (354), New Jersey (486), and Texas (2,028), of ‘Treatment rule and nitrate’ are Oregon (741), Texas (816), and Pennsylvania (2,250), of ‘Other’ are Washington (1,185), Oklahoma (1,484), and Texas (1,672).

Table 4

Number of water quality violations per state over the study period

StateArsenicLead and copperColiformTreatment rule and nitrateOtherTotal
Alabama 30 128 164 
Arizona 224 350 274 199 1,047 
Arkansas 13 25 90 130 
California 439 285 424 93 1,247 
Colorado 42 338 520 475 1,375 
Connecticut 19 163 137 154 473 
Delaware 12 20 
Florida 128 533 182 849 
Georgia 51 58 169 518 796 
Idaho 96 85 262 118 561 
Illinois 49 104 49 238 440 
Indiana 12 138 64 69 283 
Iowa 30 51 103 124 308 
Kansas 43 354 208 239 844 
Kentucky 12 20 38 
Louisiana 31 226 630 194 1,081 
Maine 20 60 21 16 117 
Maryland 59 24 90 
Massachusetts 47 67 39 158 
Michigan 34 274 71 272 653 
Minnesota 24 55 15 23 119 
Mississippi 21 36 143 71 271 
Missouri 24 66 283 36 409 
Montana 20 47 139 151 357 
Nebraska 20 59 63 142 
Nevada 88 14 41 127 270 
New Hampshire 35 35 38 14 122 
New Jersey 24 486 197 292 999 
New Mexico 55 135 513 189 892 
New York 46 151 158 332 691 
North Carolina 11 202 71 362 646 
North Dakota 33 15 53 
Ohio 11 25 19 39 94 
Oklahoma 74 198 337 1,484 2,093 
Oregon 61 186 14 741 111 1,113 
Pennsylvania 104 193 25 2,550 974 3,846 
Rhode Island 17 27 51 
South Carolina 36 20 59 
South Dakota 21 37 33 94 
Tennessee 58 20 32 110 
Texas 542 2,028 816 1,672 5,059 
Utah 24 122 175 96 417 
Vermont 10 42 42 214 308 
Virginia 18 67 50 158 293 
Washington 416 74 181 514 1,185 2,370 
West Virginia 40 271 442 865 1,618 
Wisconsin 12 137 95 141 385 
Wyoming 25 24 64 119 
StateArsenicLead and copperColiformTreatment rule and nitrateOtherTotal
Alabama 30 128 164 
Arizona 224 350 274 199 1,047 
Arkansas 13 25 90 130 
California 439 285 424 93 1,247 
Colorado 42 338 520 475 1,375 
Connecticut 19 163 137 154 473 
Delaware 12 20 
Florida 128 533 182 849 
Georgia 51 58 169 518 796 
Idaho 96 85 262 118 561 
Illinois 49 104 49 238 440 
Indiana 12 138 64 69 283 
Iowa 30 51 103 124 308 
Kansas 43 354 208 239 844 
Kentucky 12 20 38 
Louisiana 31 226 630 194 1,081 
Maine 20 60 21 16 117 
Maryland 59 24 90 
Massachusetts 47 67 39 158 
Michigan 34 274 71 272 653 
Minnesota 24 55 15 23 119 
Mississippi 21 36 143 71 271 
Missouri 24 66 283 36 409 
Montana 20 47 139 151 357 
Nebraska 20 59 63 142 
Nevada 88 14 41 127 270 
New Hampshire 35 35 38 14 122 
New Jersey 24 486 197 292 999 
New Mexico 55 135 513 189 892 
New York 46 151 158 332 691 
North Carolina 11 202 71 362 646 
North Dakota 33 15 53 
Ohio 11 25 19 39 94 
Oklahoma 74 198 337 1,484 2,093 
Oregon 61 186 14 741 111 1,113 
Pennsylvania 104 193 25 2,550 974 3,846 
Rhode Island 17 27 51 
South Carolina 36 20 59 
South Dakota 21 37 33 94 
Tennessee 58 20 32 110 
Texas 542 2,028 816 1,672 5,059 
Utah 24 122 175 96 417 
Vermont 10 42 42 214 308 
Virginia 18 67 50 158 293 
Washington 416 74 181 514 1,185 2,370 
West Virginia 40 271 442 865 1,618 
Wisconsin 12 137 95 141 385 
Wyoming 25 24 64 119 

Correlations between net sentiment and violations

For most states, the correlations are not statistically significant (Table 5). California's net sentiment is weakly correlated with the numbers of ‘Other’ violations (p = 0.045), while Washington state's net sentiment is slightly correlated to the numbers of ‘Arsenic’ violations and ‘Total’ violations (p = 0.020). All three correlation coefficients are negative, indicating that an increase in the numbers of violations accompanies a decrease in the level of net sentiment. Net sentiment was correlated with ‘Arsenic’ (−0.223) and ‘Total’ violations (−0.220) for Washington state. For California, net sentiment was correlated with ‘Other’ violations (−0.295). Although California's ‘Other’ violations are not very high compared with other states, it has the highest social media posts during the study period. The state of Washington also has relatively high social media posts (2,154) (Table 3), and its ‘Arsenic’ and ‘Other’ violation numbers both rank third in the country (Table 4). It is possible that consumers in California and Washington have a higher awareness of their water quality, as well as willingness to share their opinions on social media regarding local and national water quality issues.

Table 5

Correlation between net sentiment and the number of violations by state

ArsenicLead and copperColiformTreatment rule and nitrateOtherTotal
Alabama −0.037 (0.702) 0.032 (0.746)  −0.166 (0.086) 0.066 (0.496) 0.062 (0.523) 
Arizona −0.005 (0.956) −0.001 (0.991)  0.002 (0.986) 0.122 (0.209) 0.048 (0.620) 
Arkansas 0.177 (0.067) −0.131 (0.178)  −0.091 (0.349) 0.184 (0.056) 0.152 (0.115) 
California 0.186 (0.053) −0.017 (0.858) 0.182 (0.059) 0.187 (0.0528) −0.205* (0.034) 0.114 (0.240) 
Colorado 0.003 (0.978) 0.035 (0.718)  0.040 (0.684) 0.079(0.414) 0.072 (0.456) 
Connecticut 0.064 (0.508) 0.065 (0.503)  0.103 (0.287) 0.061 (0.528) 0.100 (0.302) 
Delaware  −0.167 (0.083)  0.015 (0.874) 0.036 (0.710) −0.110 (0.259) 
Florida 0.111 (0.252) 0.040 (0.684) 0.070 (0.474) 0.025 (0.792) 0.017 (0.861) 0.028 (0.773) 
Georgia −0.038 (0.699) 0.035 (0.716)  0.059 (0.546) −0.076 (0.436) −0.062 (0.524) 
Hawaii  0.023 (0.809)  0.081 (0.406)  0.084 (0.386) 
Idaho −0.039 (0.690) −0.082 (0.394)  −0.028 (0.771) 0.171 (0.078) 0.026 (0.792) 
Illinois 0.004 (0.970) 0.082 (0.396)  0.092 (0.346) −0.006 (0.949) 0.023 (0.814) 
Indiana −0.111 (0.254) −0.026 (0.792)  0.008 (0.937) 0.007 (0.945) −0.019 (0.841) 
Iowa −0.032 (0.744) −0.127 (0.191)  −0.011 (0.906) 0.024 (0.807) −0.013 (0.892) 
Kansas −0.072 (0.457) −0.021 (0.830)  −0.086 (0.378) 0.017 (0.858) −0.028 (0.775) 
Kentucky 0.081 (0.402) −0.007 (0.941)  −0.150 (0.121) −0.147 (0.130) −0.146 (0.132) 
Louisiana −0.034 (0.727) 0.123 (0.206)  0.056 (0.563) 0.045 (0.644) 0.086 (0.377) 
Maine −0.110 (0.256) 0.044 (0.650)  −0.118 (0.224) −0.121 (0.212) −0.025 (0.795) 
Maryland −0.081 (0.403) 0.006 (0.952)  0.041 (0.671)  0.009 (0.925) 
Massachusetts 0.020 (0.835) 0.036 (0.710)  0.107 (0.270) −0.002 (0.985) 0.078 (0.420) 
Michigan 0.053 (0.583) 0.069 (0.478) −0.017 (0.861) 0.036 (0.710) −0.07 (0.4291) −0.016 (0.871) 
Minnesota 0.099 (0.309) 0.057 (0.557) −0.009 (0.926) 0.065 (0.502) 0.092 (0.341) 0.097 (0.319) 
Mississippi −0.046 (0.639) −0.016 (0.869)  0.028 (0.775) −0.112 (0.247) −0.040 (0.680) 
Missouri 0.158 (0.102) −0.038 (0.697)  −0.040 (0.680) −0.011 (0.910) −0.025 (0.796) 
Montana −0.169 (0.080) −0.042 (0.668)  −0.063 (0.517) 0.117 (0.226) 0.057 (0.557) 
Nebraska −0.036 (0.712)   0.198 (0.046) −0.072 (0.460) 0.000 (1.000) 
Nevada −0.055 (0.570) 0.029 (0.769)  −0.030 (0.758) 0.170 (0.078) 0.129 (0.183) 
New Hampshire 0.034 (0.729) 0.157 (0.105)  0.002 (0.980) 0.065 (0.504) 0.077 (0.429) 
New Jersey 0.026 (0.787) −0.023 (0.814)  0.005 (0.959) 0.034 (0.724) 0.010 (0.922) 
New Mexico −0.002 (0.981) 0.063 (0.520)  0.072 (0.457) −0.015 (0.874) 0.048 (0.621) 
New York 0.035 (0.722) −0.112 (0.250) 0.135 (0.165) −0.113 (0.243) 0.027 (0.782) −0.022 (0.820) 
North Carolina 0.034 (0.725) −0.100 (0.302)  −0.119 (0.221) −0.062 (0.523) −0.113 (0.245) 
North Dakota 0.107 (0.269) −0.052 (0.594)  −0.136 (0.161) 0.165 (0.088) 0.044 (0.651) 
Ohio −0.023 (0.812) 0.003 (0.976)  0.025 (0.794) 0.057(0.557) 0.042 (0.668) 
Oklahoma 0.007 (0.939) −0.100 (0.303)  0.009 (0.924) −0.033 (0.733) −0.037 (0.707) 
Oregon −0.021 (0.826) −0.025 (0.798) 0.113 (0.242) −0.036 (0.714) 0.053 (0.589) −0.016 (0.871) 
Pennsylvania −0.053 (0.585) 0.014 (0.889) −0.044 (0.651) −0.060 (0.535) 0.010 (0.919) −0.035 (0.718) 
Rhode Island  0.013 (0.892)  −0.030 (0.758) −0.058 (0.555) −0.043 (0.660) 
South Carolina  −0.141 (0.145)  −0.054 (0.581) 0.016 (0.867) −0.121 (0.213) 
South Dakota −0.136 (0.159) −0.012 (0.902)  −0.060 (0.538) 0.100 (0.302) 0.014 (0.883) 
Tennessee  0.009 (0.925)  −0.054 (0.576) 0.067 (0.490) 0.036 (0.713) 
Texas 0.132 (0.173) 0.106 (0.273) 0.036 (0.708) 0.124 (0.202) −0.019 (0.850) 0.074 (0.443) 
Utah 0.047 (0.630) 0.100 (0.303)  0.034 (0.725) −0.020 (0.840) 0.064 (0.507) 
Vermont 0.045 (0.646) 0.033 (0.732)  −0.056 (0.567) 0.059 (0.543) 0.057 (0.560) 
Virginia 0.0657 (0.500) 0.067 (0.487)  0.215 (0.025) −0.023 (0.817) 0.040 (0.677) 
Washington −0.223* (0.020) −0.133 (0.171) −0.102 (0.292) −0.180 (0.061) −0.163 (0.092) −0.220* (0.022) 
West Virginia 0.035 (0.722) 0.105 (0.280)  −0.001 (0.993) −0.046 (0.636) −0.018 (0.850) 
Wisconsin −0.097 (0.318) −0.132 (0.174)  −0.107 (0.271) −0.065 (0.505) −0.106 (0.276) 
Wyoming 0.147 (0.129) −0.087 (0.368) −0.126 (0.193) 0.015 (0.873) 0.026 (0.790) 0.042 (0.668) 
ArsenicLead and copperColiformTreatment rule and nitrateOtherTotal
Alabama −0.037 (0.702) 0.032 (0.746)  −0.166 (0.086) 0.066 (0.496) 0.062 (0.523) 
Arizona −0.005 (0.956) −0.001 (0.991)  0.002 (0.986) 0.122 (0.209) 0.048 (0.620) 
Arkansas 0.177 (0.067) −0.131 (0.178)  −0.091 (0.349) 0.184 (0.056) 0.152 (0.115) 
California 0.186 (0.053) −0.017 (0.858) 0.182 (0.059) 0.187 (0.0528) −0.205* (0.034) 0.114 (0.240) 
Colorado 0.003 (0.978) 0.035 (0.718)  0.040 (0.684) 0.079(0.414) 0.072 (0.456) 
Connecticut 0.064 (0.508) 0.065 (0.503)  0.103 (0.287) 0.061 (0.528) 0.100 (0.302) 
Delaware  −0.167 (0.083)  0.015 (0.874) 0.036 (0.710) −0.110 (0.259) 
Florida 0.111 (0.252) 0.040 (0.684) 0.070 (0.474) 0.025 (0.792) 0.017 (0.861) 0.028 (0.773) 
Georgia −0.038 (0.699) 0.035 (0.716)  0.059 (0.546) −0.076 (0.436) −0.062 (0.524) 
Hawaii  0.023 (0.809)  0.081 (0.406)  0.084 (0.386) 
Idaho −0.039 (0.690) −0.082 (0.394)  −0.028 (0.771) 0.171 (0.078) 0.026 (0.792) 
Illinois 0.004 (0.970) 0.082 (0.396)  0.092 (0.346) −0.006 (0.949) 0.023 (0.814) 
Indiana −0.111 (0.254) −0.026 (0.792)  0.008 (0.937) 0.007 (0.945) −0.019 (0.841) 
Iowa −0.032 (0.744) −0.127 (0.191)  −0.011 (0.906) 0.024 (0.807) −0.013 (0.892) 
Kansas −0.072 (0.457) −0.021 (0.830)  −0.086 (0.378) 0.017 (0.858) −0.028 (0.775) 
Kentucky 0.081 (0.402) −0.007 (0.941)  −0.150 (0.121) −0.147 (0.130) −0.146 (0.132) 
Louisiana −0.034 (0.727) 0.123 (0.206)  0.056 (0.563) 0.045 (0.644) 0.086 (0.377) 
Maine −0.110 (0.256) 0.044 (0.650)  −0.118 (0.224) −0.121 (0.212) −0.025 (0.795) 
Maryland −0.081 (0.403) 0.006 (0.952)  0.041 (0.671)  0.009 (0.925) 
Massachusetts 0.020 (0.835) 0.036 (0.710)  0.107 (0.270) −0.002 (0.985) 0.078 (0.420) 
Michigan 0.053 (0.583) 0.069 (0.478) −0.017 (0.861) 0.036 (0.710) −0.07 (0.4291) −0.016 (0.871) 
Minnesota 0.099 (0.309) 0.057 (0.557) −0.009 (0.926) 0.065 (0.502) 0.092 (0.341) 0.097 (0.319) 
Mississippi −0.046 (0.639) −0.016 (0.869)  0.028 (0.775) −0.112 (0.247) −0.040 (0.680) 
Missouri 0.158 (0.102) −0.038 (0.697)  −0.040 (0.680) −0.011 (0.910) −0.025 (0.796) 
Montana −0.169 (0.080) −0.042 (0.668)  −0.063 (0.517) 0.117 (0.226) 0.057 (0.557) 
Nebraska −0.036 (0.712)   0.198 (0.046) −0.072 (0.460) 0.000 (1.000) 
Nevada −0.055 (0.570) 0.029 (0.769)  −0.030 (0.758) 0.170 (0.078) 0.129 (0.183) 
New Hampshire 0.034 (0.729) 0.157 (0.105)  0.002 (0.980) 0.065 (0.504) 0.077 (0.429) 
New Jersey 0.026 (0.787) −0.023 (0.814)  0.005 (0.959) 0.034 (0.724) 0.010 (0.922) 
New Mexico −0.002 (0.981) 0.063 (0.520)  0.072 (0.457) −0.015 (0.874) 0.048 (0.621) 
New York 0.035 (0.722) −0.112 (0.250) 0.135 (0.165) −0.113 (0.243) 0.027 (0.782) −0.022 (0.820) 
North Carolina 0.034 (0.725) −0.100 (0.302)  −0.119 (0.221) −0.062 (0.523) −0.113 (0.245) 
North Dakota 0.107 (0.269) −0.052 (0.594)  −0.136 (0.161) 0.165 (0.088) 0.044 (0.651) 
Ohio −0.023 (0.812) 0.003 (0.976)  0.025 (0.794) 0.057(0.557) 0.042 (0.668) 
Oklahoma 0.007 (0.939) −0.100 (0.303)  0.009 (0.924) −0.033 (0.733) −0.037 (0.707) 
Oregon −0.021 (0.826) −0.025 (0.798) 0.113 (0.242) −0.036 (0.714) 0.053 (0.589) −0.016 (0.871) 
Pennsylvania −0.053 (0.585) 0.014 (0.889) −0.044 (0.651) −0.060 (0.535) 0.010 (0.919) −0.035 (0.718) 
Rhode Island  0.013 (0.892)  −0.030 (0.758) −0.058 (0.555) −0.043 (0.660) 
South Carolina  −0.141 (0.145)  −0.054 (0.581) 0.016 (0.867) −0.121 (0.213) 
South Dakota −0.136 (0.159) −0.012 (0.902)  −0.060 (0.538) 0.100 (0.302) 0.014 (0.883) 
Tennessee  0.009 (0.925)  −0.054 (0.576) 0.067 (0.490) 0.036 (0.713) 
Texas 0.132 (0.173) 0.106 (0.273) 0.036 (0.708) 0.124 (0.202) −0.019 (0.850) 0.074 (0.443) 
Utah 0.047 (0.630) 0.100 (0.303)  0.034 (0.725) −0.020 (0.840) 0.064 (0.507) 
Vermont 0.045 (0.646) 0.033 (0.732)  −0.056 (0.567) 0.059 (0.543) 0.057 (0.560) 
Virginia 0.0657 (0.500) 0.067 (0.487)  0.215 (0.025) −0.023 (0.817) 0.040 (0.677) 
Washington −0.223* (0.020) −0.133 (0.171) −0.102 (0.292) −0.180 (0.061) −0.163 (0.092) −0.220* (0.022) 
West Virginia 0.035 (0.722) 0.105 (0.280)  −0.001 (0.993) −0.046 (0.636) −0.018 (0.850) 
Wisconsin −0.097 (0.318) −0.132 (0.174)  −0.107 (0.271) −0.065 (0.505) −0.106 (0.276) 
Wyoming 0.147 (0.129) −0.087 (0.368) −0.126 (0.193) 0.015 (0.873) 0.026 (0.790) 0.042 (0.668) 

*Statistically significant at p<0.05 level.

In 2019, 78% of men and 65% of women used a social media platform of any kind (Pew Research Center 2019). When focusing on the topic of water, we found a slightly higher number of posters were male when compared with females. This is likely a reflection of those on social media. Interestingly, a lower percentage of younger people were posting about water when compared with the older age groups. Young people were the earliest adopters of social media, and although the gap in usage has decreased, they are still more prevalent on social media platforms (Pew Research Center 2019). This may indicate that issues related to tap water are of a greater interest to older individuals participating in online media.

The number of violations and net sentiment were uncorrelated for most states with the exception of California and Washington. The correlation in California could be a result of higher numbers of weekly social media posts that may better reflect net sentiment related to drinking water issues. For Washington state, it could be a result of both a higher number of violations and a relatively high number of weekly posts. The results of these two states may begin to provide insights on why sentiment from social media posts do not reflect the water quality in the other states. However, without further investigation, it is not possible to conclude that correlations could become significant if consumers in other states posted more frequently about their water quality. Also, given the limitations and issues surrounding correctly identifying the location of posts and posters, some data could not be assigned to a specific state. In addition, given the current legislation surrounding the level of granularity social media/online data scraping can reach and privacy desires of individuals, it is unclear if additional and more correct location data will ever be made available.

Although the results indicate that it is unlikely that current U.S. social media data can be used to indicate water quality, potential water quality violations, or the need for additional testing, the data collected are still compelling. There are two trends that can be found in the data. One trend is that some words related to the posts are not necessarily related to USEPA violations, such as taste, good, and terrible. Although researchers find that lead and copper in drinking water could result in metallic smells and tastes (Burlingame & Mackey 2007), other contaminants like arsenic require lab testing. Additionally, based on the spikes in both mentions and sentiment, many posts were related to large national stories. For states in the U.S. that are trying to attract new residents or visitors, it is important to note that water quality issues are no longer just a local problem. Incidents of water quality issues appear to reverberate across the internet, potentially with long-term reputational impacts.

Quantifying the inconsistency of consumer perception and reported water quality and safety can be a useful practice to provide some insight to state water managers and policymakers. Comparing the numbers of violations reported by the USEPA with social media net sentiment scores at the state level over the same period, we found no statistically significant relationships between these two measurements except for California and Washington states.

In this research, we used numbers of USEPA reported violations as a proxy of tap water quality and safety, while sentiment scores reflect consumers' perception of tap water, either positive or negative. Most of the states with higher numbers of USEPA violations did not have negative net sentiment scores. Of the top five highest violation states, only Arizona had a negative average net sentiment. As all states continue working to comply with federal and state water quality and safety standards, it is crucial to educate consumers regarding the water safety standards and make them aware of the health-related violation reports.

Social media demographic data show that older people were more engaged with the topic of water quality compared with younger people. Water managers and policymakers should consider this factor when trying to communicate with consumers on tap water quality information or future quality improvement. Analysis of the social media data also revealed that larger-scale violations in one state were discussed across the entire country, often eclipsing any local water issues that may be happening. Bad water violations can impact the good standing of a state or city beyond local residents.

Our research focused on state-level data because of current inconsistent access to geo-tagged social media data. Due to limitations caused by the number of cellphone towers that are used to geo-tag social media posts, drilling down to the county level was not possible for states that are more rural. As sentiment analysis technology develops, it may be possible for future research to evaluate social media net sentiment at more granular levels. This research could be extended to include local water sensory tests to identify PWS that need to improve water taste (the USEPA secondary standard), although this is not currently enforced by federal policies and legislations.

There are additional limitations to social media data collection and language processing. First, demographic data from online content are difficult to quantify; hence, it was reported as a summary sense and was not included in analysis. Unless self-reporting of demographic data becomes more common on social media platforms, it is unlikely that it can be used for extensive analysis. Second, although precautions were taken to improve language processing by having a subset of posts inspected by the researchers, there is still a chance that misclassification of sentiment occurred. The magnitude of data as well as privacy laws currently prevent the human inspection of all posts. Additionally, linguistic trends such as sarcasm can be difficult to interpret when presented textually and are often difficult to identify in spoken language as well. Third, location identification on social media posts is also a challenging issue. In this research, we used an already established, tailorable algorithm and cast a wide net for sentimental analysis. Future research may seek to limit the media sources studied to only those that explicitly provide location information.

The datasets generated during and/or analyzed for the current study are available from the corresponding author on reasonable request.

abcNews
2020
Deadly Microbe Water Warning Lifted for All But 1 Texas City
.
ABC News
.
Allaire
M.
,
Wu
H.
&
Lall
U.
2018
National trends in drinking water quality violations
.
Proceedings of the National Academy of Sciences
115
(
9
),
2078
2083
.
https://doi.org/10.1073/pnas.1719805115
.
Arnedo-Pena
A.
,
Bellido-Blasco
J.
,
Villamarin-Vazquez
J.-L.
,
Aranda-Mares
J.-L.
,
Font-Cardona
N.
,
Gobba
F.
&
Kogevinas
M.
2003
Acute health effects after accidental exposure to styrene from drinking water in Spain
.
Environmental Health
2
(
1
),
1
9
.
Bartels
J. H. M.
,
Burlingame
G. A.
&
Suffet
I. H. (Mel)
1986
Flavor profile analysis: taste and odor control of the future
.
Journal AWWA
78
(
3
),
50
55
.
https://doi.org/10.1002/j.1551-8833.1986.tb05714.x
.
Bonsón
E.
,
Torres
L.
,
Royo
S.
&
Flores
F.
2012
Local e-government 2.0: social media and corporate transparency in municipalities
.
Government Information Quarterly
29
(
2
),
123
132
.
https://doi.org/10.1016/j.giq.2011.10.001
.
Bryan
P. E.
,
Kuzminski
L. N.
,
Sawyer
F. M.
&
Feng
T. H.
1973
Taste thresholds of halogens in water
.
Journal AWWA
65
(
5
),
363
368
.
https://doi.org/10.1002/j.1551-8833.1973.tb01851.x
.
Burlingame
G. A.
&
Doty
R. L.
2018
Important considerations for estimating odor threshold concentrations of contaminants found in water supplies
.
Journal AWWA
110
(
12
),
E1
E12
.
https://doi.org/10.1002/awwa.1147
.
Burlingame
G. A.
&
Mackey
E. D.
2007
Philadelphia obtains useful information from its customers about taste and odour quality
.
Water Science and Technology
55
(
5
),
257
263
.
Carr
J.
,
Decreton
L.
,
Qin
W.
,
Rojas
B.
,
Rossochacki
T.
&
Yang
Y. wen
2015
Social media in product development
.
Food Quality and Preference
40
,
354
364
.
https://doi.org/10.1016/j.foodqual.2014.04.001
.
Chiarello
F.
,
Bonaccorsi
A.
&
Fantoni
G.
2020
Technical sentiment analysis. Measuring advantages and drawbacks of new products using social media
.
Computers in Industry
123
,
103299
.
https://doi.org/10.1016/j.compind.2020.103299
.
de Franca Doria
M.
,
Pidgeon
N.
&
Hunter
P.
2005
Perception of tap water risks and quality: a structural equation model approach
.
Water Science and Technology
52
(
8
),
143
149
.
Dietrich
A. M.
2006
Aesthetic issues for drinking water
.
Journal of Water and Health
4
(
S1
),
11
16
.
https://doi.org/10.2166/wh.2006.0038
.
Dietrich
A. M.
&
Gallagher
C. D.
2013
Consumer ability to detect the taste of total dissolved solids
.
Journal AWWA
105
(
5
),
E255
E263
.
https://doi.org/10.5942/jawwa.2013.105.0049
.
Driss
O. B.
,
Mellouli
S.
&
Trabelsi
Z.
2019
From citizens to government policy-makers: social media data analysis
.
Government Information Quarterly
36
(
3
),
560
570
.
Earle
P.
2010
Earthquake twitter
.
Nature Geoscience
3
(
4
),
221
222
.
Eggert
D.
2019
Audit: State Partly Complies with Post-Flint Recommendations
.
AP NEWS
.
Fedinick
K. P.
,
Taylor
S.
,
Roberts
M.
,
Moore
R.
&
Olson
E.
2019
Water Down Justice (R:19-09-A; p. 52). Natural Resources Defense Council. Available from: https://www.nrdc.org/sites/default/files/watered-down-justice-report.pdf (accessed 25 March 2021)
.
Fitzsimmons
E. G.
2019
In Echo of Flint Lead Crisis, Newark Offers Bottled Water. The New York Times. Available from: https://www.nytimes.com/2019/08/11/nyregion/newark-water-lead.html.
Fouriezos
N.
2019
Love thy Neighbor: The Bible Belt Is Becoming a Dumping Ground. Available from: https://www.ozy.com/the-new-and-the-next/love-thy-neighbor-the-bible-belt-is-becoming-a-dumping-ground/93854/ (accessed 12 November 2020)
.
Gasco
L.
,
Clavel
C.
,
Asensio
C.
&
de Arcas
G.
2019
Beyond sound level monitoring: exploitation of social media to gather citizens subjective response to noise
.
Science of the Total Environment
658
,
69
79
.
Hayes
J. L.
,
Britt
B. C.
,
Evans
W.
,
Rush
S. W.
,
Towery
N. A.
&
Adamson
A. C.
2021
Can social media listening platforms’ artificial intelligence be trusted? Examining the accuracy of crimson hexagon's (now Brandwatch Consumer Research's) AI-driven analyses
.
Journal of Advertising
50
(
1
),
81
91
.
https://doi.org/10.1080/00913367.2020.1809576
.
Hu
Z.
,
Morton
L. W.
&
Mahler
R. L.
2011
Bottled water: United States consumers and their perceptions of water quality
.
International Journal of Environmental Research and Public Health
8
(
2
),
565
578
.
Hu
Y.
,
Hu
C.
,
Tran
T.
,
Kasturi
T.
,
Joseph
E.
&
Gillingham
M
, .
2021
What's in a Name? – Gender Classification of Names with Character Based Machine Learning Models. ArXiv:2102.03692 [Cs]. Available from: http://arxiv.org/abs/2102.03692.
Jardine
C. G.
,
Gibson
N.
&
Hrudey
S. E.
1999
Detection of odour and health risk perception of drinking water
.
Water Science and Technology
40
(
6
),
91
98
.
Jung
J.
,
Bir
C.
,
Widmar
N. O.
&
Sayal
P.
2021
Initial reports of foodborne illness drive more public attention than food recall announcements
.
Journal of Food Protection
.
https://doi.org/10.4315/JFP-20-383
.
Kavanaugh
A. L.
,
Fox
E. A.
,
Sheetz
S. D.
,
Yang
S.
,
Li
L. T.
,
Shoemaker
D. J.
,
Natsev
A.
&
Xie
L.
2012
Social media use by government: from the routine to the critical
.
Government Information Quarterly
29
(
4
),
480
491
.
Kelly
M. G.
&
Pomfret
J. R.
1997
Tastes and odours in potable water: perception versus reality
. In:
The Microbiological Quality of Water
(
Sutcliffe
D. W.
, ed.).
Freshwater Biological Association, Ambleside
,
UK
, pp.
71
80
.
Kukstis
J.
2019
Marshfield Boil Water Order Remains in Effect
.
Marshfield Mariner
. .
Lansley
G.
&
Longley
P.
2016
Deriving age and gender from forenames for consumer analytics
.
Journal of Retailing and Consumer Services
30
,
271
278
.
https://doi.org/10.1016/j.jretconser.2016.02.007
.
Li
W.
,
Zhang
X.
,
Niu
C.
,
Jiang
Y.
&
Srihari
R.
2003
An expert lexicon approach to identifying English phrasal verbs
. In:
Proceedings of the 41st Annual Meeting on Association for Computational Linguistics – ACL ‘03, 1
. pp.
513
520
.
Mahoney
J. A.
,
Widmar
N. J. O.
&
Bir
C. L.
2020
#GoingtotheFair: a social media listening analysis of agricultural fairs
.
Translational Animal Science
4
,
txaa139
.
https://doi.org/10.1093/tas/txaa139
.
Mendoza
M.
,
Poblete
B.
&
Valderrama
I.
2019
Nowcasting earthquake damages with Twitter
.
EPJ Data Science
8
(
1
),
3
.
Men's Journal Editor
2020
Why Pro Athletes Are Swearing by the Kangen Water System
. .
NetBase
2015
New Enhancements: Flexible Sentiment Classification
. .
NetBase
2016
Accuracy Differentiators: Why Precision Matters in Consumer Sentiment Analysis
.
NetBase Quid
. .
NetBase
2020
NetBaseQuid Overview
. .
Panagiotopoulos
P.
,
Bigdeli
A. Z.
&
Sams
S.
2014
Citizen–government collaboration on social media: the case of Twitter in the 2011 riots in England
.
Government Information Quarterly
31
(
3
),
349
357
.
Panagiotopoulos
P.
,
Barnett
J.
,
Bigdeli
A. Z.
&
Sams
S.
2016
Social media in emergency management: Twitter as a tool for communicating risks to the public
.
Technological Forecasting and Social Change
111
,
86
96
.
Pew Research Center
2019
Social Media Fact Sheet
.
Ruiz-Mafe
C.
,
Chatzipanagiotou
K.
&
Curras-Perez
R.
2018
The role of emotions and conflicting online reviews on consumers’ purchase intentions
.
Journal of Business Research
89
,
336
344
.
Sakaki
T.
,
Okazaki
M.
&
Matsuo
Y.
2010
Earthquake shakes Twitter users: real-time event detection by social sensors
. In:
Proceedings of the 19th International Conference on World Wide Web
.
pp. 851
860
.
Salzman
J.
2014
STATA
2019
STATA Statistical Software: Release 16
.
Tang
C.
,
Ross
K.
,
Saxena
N.
,
Chen
R.
2011
What's in a name: a study of names, gender inference, and gender behavior in Facebook
. In:
Database Systems for Advanced Applications
, Vol.
6637
(
Xu
J.
,
Yu
G.
,
Zhou
S.
&
Unland
R.
, eds).
Springer Berlin Heidelberg
, pp.
344
356
.
https://doi.org/10.1007/978-3-642-20244-5_33
.
Tiemann
M.
2014
Safe Drinking Water Act (SDWA): A Summary of the Act and Its Major Requirements
.
Congressional Research Service
,
Washington, DC
,
USA
.
USEPA
1999
25 Years of Drinking Water Act History and Trends.
EPA 816-R-99-007
.
United States Environmental Protection
.
USEPA
2000
Water Supply Guidance Manual.
EPA 816-R-00-003
.
Office of Water
.
USEPA
2015a
Drinking Water Contaminant Human Health Effects Information [Collections and Lists]
.
USEPA
2015b
Learn about Capacity Development [Overviews and Factsheets]
.
USEPA
2020a
SDWIS Federal Reports Advanced Search
.
Available from: https://ofmpub.epa.gov/apex/sfdw/f?p=108:1:::NO:1 (accessed 12 December 2020)
.
USEPA
2020b
Drinking Water Arsenic Rule History
.
Available from: https://www.epa.gov/dwreginfo/drinking-water-arsenic-rule-history (accessed 22 November 2020)
.
WBTV
2019
Water Woes in Concord: Residents Complain of Health Issues after Weeks of Foul-Smelling Water
. .
Weinmeyer
R.
,
Norling
A.
,
Kawarski
M.
&
Higgins
E.
2017
The Safe Drinking Water Act of 1974 and its role in providing access to safe drinking water in the United States
.
AMA Journal of Ethics
19
(
10
),
1018
1026
.
Whelton
A. J.
,
Dietrich
A. M.
,
Gallagher
D. L.
&
Roberson
J. A.
2007
Using customer feedback for improved water quality and infrastructure monitoring
.
Journal American Water Works Association
99
(
11
),
62
76
.
WHO
2017
Guidelines for Drinking-Water Quality
, 4th edn.
Incorporating the 1st Addendum
.
World Health Organization
.
Widmar
N. J. O.
,
Bir
C.
,
Long
E.
&
Ruple
A.
2021
Public perceptions of threats from mosquitoes in the U.S. using online media analytics
.
Pathogens and Global Health
115
(
1
),
40
52
.
https://doi.org/10.1080/20477724.2020.1842641
.
Young
W. F.
,
Horth
H.
,
Crane
R.
,
Ogden
T.
&
Arnott
M.
1996
Taste and odour threshold concentrations of potential potable water contaminants
.
Water Research
30
(
2
),
331
340
.
Zheng
X.
,
Han
J.
&
Sun
A.
2018
A survey of location prediction on twitter
.
IEEE Transactions on Knowledge and Data Engineering
30
(
9
),
1652
1671
.
https://doi.org/10.1109/TKDE.2018.2807840
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY-NC-ND 4.0), which permits copying and redistribution for non-commercial purposes with no derivatives, provided the original work is properly cited (http://creativecommons.org/licenses/by-nc-nd/4.0/).