This study conducts a text semantic analysis of mainstream media coverage of the 2019 COVID-19 outbreak in China. By examining frequently used keywords and their co-occurrences, researchers infer a semantic network and word collocations. Encoding epidemic-related frames offers insight into cognitive structures used in understanding and communicating issues. Through framing, media and individuals emphasize certain crisis aspects while downplaying others. The study reveals that Chinese mainstream media employed 12 frames during the COVID-19 crisis. Methodologically, the study demonstrates identifying frames in Chinese media news through text mining. Using Multiple Correspondence Analysis (MCA) and Hierarchical Cluster Analysis (HCA), the study elucidates stage-frame connections and frame relationships. Paired statistical analysis examines mainstream media attention to environmental pollution amid the COVID-19 pandemic. Results show frames changed during different pandemic stages, reflecting mainstream media's role in social stability. Applying MCA and HCA techniques, the 12 frames cluster into four groups, highlighting consistent frame usage by Chinese mainstream media. Mainstream media also begins to address COVID-19-related environmental pollution, focusing on virus contamination of goods, medical waste, and wastewater, lacking comprehensive attention to broader environmental pollution. These findings offer insights for public health professionals and environmentalists, aiding crisis communication strategy formulation for future emergencies.

  • This study is to investigate the text semantic analysis of mainstream media news related to the COVID-19 outbreak in China that occurred at the end of 2019.

  • Examining the most frequently used keywords and their co-occurrences, researchers can infer a semantic network that represents the major frames.

  • Frames are cognitive structures that people use to understand and communicate about issues.

The COVID-19 outbreak stands as a significant global infectious disease event, profoundly impacting various facets of society. Analyzing media coverage during this period across different regions holds valuable insights for health communication and future disease management. News media is considered one of the most mature sources of health information for the public (Viswanath & Emmons 2006). Media coverage is seen as a cognitive shortcut for individuals to make sense of events and interpret the world (Scheufele & Lewenstein 2005). Framing concepts in news can influence audience understanding, attitudes, and behaviors related to health issues.

Mainstream media plays an important role in disseminating information about infectious diseases (Lee & Basnyat 2013). Mainstream media content, including news and comments, reflects both public information and a country's dominant ideology. Researchers use this data to assess stakeholder responses and perceptions of infectious diseases. For example, Shih et al. (2011) examined the New York Times' coverage of West Nile virus and avian influenza through quantitative content analysis. Wu (2006) compared Xinhua News Agency and the Associated Press' coverage of HIV/AIDS in China, revealing how they shaped the social reality of the issue. Currently, many studies on diseases categorize and analyze text content based on pre-existing categories using manual coding or training datasets based on manual coding (Tang et al. 2018). Semantic networks represent a perspective based on natural language processing that allows researchers to investigate the characteristics exhibited by a country's mainstream media during outbreaks without the need for training datasets. The principle of semantic networks is based on word frequency, co-occurrence frequency, and the distance between words to explore the meaning of text (Danowski 1993). This theory has been applied to public health events, including the COVID-19 pandemic. For example, Mattei et al. (2021) analyzed the semantic network observed on Twitter during the first Italian lockdown. Luo et al. (2021) employed semantic network analysis to compare public perceptions of COVID-19 vaccines on Twitter and Weibo. Gao et al. (2021) used semantic networks and sentiment analysis to explore changes in public attention and attitudes toward domestic COVID-19 vaccines. Meadows et al. (2022) extracted semantic networks from Sina Weibo posts at each stage of the COVID-19 epidemic, exploring government legitimacy management in China. Obviously, after the outbreak of the COVID-19 pandemic, it quickly attracted the attention of researchers, and social media platforms such as Twitter and Weibo are easy targets for semantic analysis (Abd-Alrazaq et al. 2020; Pobiruchin et al. 2020; Shim et al. 2021; Suzuki et al. 2021; Alhuzali et al. 2022; Mori et al. 2023).

Text mining (Feldman & Sanger 2007; Risch et al. 2008; Berry & Kogan 2010) is a technique to extract potential information from unstructured text information. Semantic networks discovered through text mining can be used to infer the frames used in texts. Through the study of frames, the relationship between news communication and public health can be explored. The frame theory is one of the main theories in media effects research and has been widely applied in various areas such as disease, health, political, economic, and other fields (Tian & Stewart 2005; Sweetser & Brown 2008; Shih et al. 2008,  2011; Nucci et al. 2009;Van der Meer et al. 2014).

Despite the rise of social media in China, mainstream media remains a crucial source for the Chinese public, and the way it disseminates information significantly influences the public's perception of things related to the COVID-19 pandemic (Gao et al. 2021). However, traditional mainstream media is often overlooked as a target for semantic network and framing analysis. Thus, the first research question (RQ1) is proposed.

  • RQ1: What frames are used in Chinese mainstream media news to discuss the COVID-19 pandemic as revealed by text mining?

Reynolds & Seeger (2005) discussed Crisis and Emergency Risk Communication (CERC), comprising five stages pre-crisis, initial, maintenance, opening-up, and resolution. Each stage involves different tasks of crisis communication, reflecting the public's processing and construction of meaning during a health crisis. While specific crises may not unfold completely as predicted by the model, the media likely employs different frames at each stage. Therefore, we propose the following research question:

  • RQ2: How have the frames used by mainstream media in China's COVID-19 outbreak changed in different stages? This article employs a natural language processing software called KH Coder to conduct text mining on news articles from various stages of the COVID-19 pandemic in China. Qualitative and quantitative analyses are performed using methods such as semantic networks and frame analysis to explore patterns of frame usage in Chinese mainstream media.

The COVID-19 pandemic's impact on the environment has also drawn attention. From a positive perspective, research indicates that lockdown measures and reduced economic activities have led to improved air quality, reduced greenhouse gas emissions, decreased water pollution, and less noise pollution in many cities globally (Rume & Islam 2020; Yang et al. 2022; Hammad et al. 2023). However, compared to ground transportation and aviation, the pandemic has had a more significant short-term impact on carbon dioxide emissions from electricity and industry (Ang et al. 2023). There are also negative aspects, such as increased use of personal protective equipment and medical waste contributing to environmental pollution (Rume & Islam 2020; Hammad et al. 2023).

Environmental factors may also have an impact on the COVID-19 pandemic, including air pollution, chemical exposure, and climate (Weaver et al. 2022). Specifically, exposure to air pollution is associated with an increased risk of COVID-19 infection and death, particularly affecting vulnerable groups and contributing to differences in morbidity and mortality rates (Weaver et al. 2022). If semantic networks and content analysis based on the infectious disease frame are macro-level studies of crisis communication by Chinese mainstream media during the COVID-19 pandemic, then language statistical analysis on specific research topics can reveal micro-level language patterns not captured in macro analyses. We propose the third question (RQ3).

  • RQ3: Have mainstream Chinese media paid attention to the relationship between the COVID-19 pandemic and environmental pollution? This paper examines the collocation patterns of words related to environmental pollution to explore the inclusion of environmental pollution factors in the reporting of the COVID-19 pandemic by Chinese mainstream media. This serves as a valuable supplement for a more comprehensive understanding of how crisis information is framed and disseminated to the public by Chinese mainstream media.

Sampling

The global COVID-19 pandemic, lasting several years, was declared no longer a ‘public health emergency of international concern’ by the World Health Organization (WHO) in May 2023. China experienced its initial outbreak in December 2019. We collected online news related to COVID-19 from December 1, 2019 to September 30, 2023 from zhonghua.cloud.gmw.cn, a multimedia library by Guangming Daily Press. Mainstream media in China, recognized and influenced by official institutions, were included, totaling 4,439 news with 1,240,554 tokens, accounting for some repetitive content reflecting journalists' attention.

To explore dynamic media discussions during the pandemic, we focused on key time points: (a) December 31, 2019, Wuhan reported pneumonia cases; (b) January 23, 2020, Wuhan lockdown; (c) January 1, 2021, vaccine administration began; (d) December 7, 2022, comprehensive pandemic situation opening; (e) May 5, 2023, WHO's declaration. The pandemic in China was divided into five stages, with each stage being sampled through the first three months of online news, excluding video and image materials.

Data cleaning

First, the original text was cleaned and converted for mining purposes. Video and image materials were excluded. Second, in order to conduct semantic network analysis, links and special characters were deleted from the text. Finally, stop words were set, including conjunctions, auxiliary words, and other function words, as well as non-relevant words such as ‘responsible editor’.

Semantic network analysis

R programming language and KH Coder 3 are used for linguistic and statistical processing of the dataset, which have been applied to text research (Higuchi 2016, 2017). The words and pictures obtained by text mining are obtained by the first author. Using the Jaccard coefficient data, the semantic network for each stage was set, and words had to appear in more than 1% of paragraphs (n = 43,997). We restricted the words appearing in the semantic network to nouns, as nouns have meaning even without context. The visualization of the semantic network was based on the top 60 connections for each stage filtered by the Jaccard coefficient. The Jaccard coefficient ranges from 0 to 1. In the semantic network, words that frequently appear in the same text are considered closely related.

To analyze frame variations, a dictionary of typical keywords for each frame was created from words/phrases that appeared in over 1% of news (n = 4,439). This process was guided by semantic networks, established frames (Tang et al. 2018), and news readings. Coding, by the first and second authors, ensured reliability. Each frame's typical keywords, like ‘pneumonia’ for basic information, were identified. News paragraphs were tagged with frame labels or marked frameless using unique keywords. Percentages of frame usage across the four stages (excluding pre-crisis due to unpublished news) were calculated. χ2 tests were used to assess stage differences, and Holm–Bonferroni correction was applied to address multiple testing impacts.

Multiple correspondence analysis

To visually represent the relationship between the frames, we used a commonly used technique called multiple correspondence analysis (MCA). MCA is a dimension reduction technique used to handle datasets containing multiple categorical variables. It maps multiple categorical variables into a lower dimensional space to better observe and interpret the relationships between them. MCA aims to explore the correlations between categorical variables in the COVID-19 news dataset, as it can simultaneously draw scatterplots reflecting the correlations between the dataset and different frames and stages. This allows us to study the news data in more depth and extract more detailed insights.

Hierarchical cluster analysis

In addition to MCA, we also conducted hierarchical cluster analysis (HCA) on the identified frames. HCA is an exploratory technique that displays the similarity of a given form by determining the hierarchical clustering membership. These two techniques complement each other because the clustering membership of HCA helps to detect the connections between different frames, and the results of MCA can explain the connections between stages and frames. Specifically, all frames were subjected to hierarchical cluster analysis using Ward's method and the Jaccard coefficient, and four larger clusters were set up above the frames. These clusters represent a summary of the connections and distinctions between different frames. Using hierarchical cluster analysis of the text can induce higher-level objective usage rules from the bottom up.

Collocation stats

Referring to the existing research on COVID-19 and environmental pollution, we understand that the current focus is on various types of environmental pollution and medical waste, including air pollution, water pollution, noise pollution, as well as waste from medicines, protective clothing, and masks. Given these focal points, we chose the keywords ‘pollution’, ‘waste,’ and ‘wastewater’ for collocation stats. Collocation stats can help us understand which words are frequently used together with ‘pollution,’ ‘waste,’ and ‘wastewater’ in news from Chinese mainstream media. This collocation pattern may reveal the level of attention given to environmental pollution by Chinese mainstream media in the context of the COVID-19 pandemic. Specifically, we retrieved and extracted all news paragraphs containing ‘pollution,’ ‘waste,’ and ‘wastewater,’ then conducted statistics on the five words before and after the keywords, sorted them by the total number of occurrences, and finally observed and compared the top 10 co-occurring words.

Figure 1 displays a co-occurrence network of word correlations, consisting of nodes representing each word and edges representing co-occurrence relationships between each word. The size of the nodes is proportional to the frequency of the words. The co-occurrence relationships between words are indicated by the thickness of the edges and are not influenced by the distance between each node. Additionally, the absence of edges does not necessarily imply no correlation between nodes, as only the first 60 connections are displayed for clarity. Although the chart itself is described through automated operations, for ease of understanding, words closely related to COVID-19 were grouped and colored based on the author's decision. Group A involves basic information about COVID-19. Group B pertains to COVID-19 treatment and medical resources. Group C relates to COVID-19 prevention and vaccines. Group D concerns authoritative institutions. Group E involves socio-economic aspects.
Figure 1

Semantic network of Chinese mainstream media during the COVID-19 pandemic.

Figure 1

Semantic network of Chinese mainstream media during the COVID-19 pandemic.

Close modal

Based on our reading of the contexts in which high-frequency keywords (occurring in more than 1% of news) appear within the entire sample, we have identified 12 distinct frames: basic information frame, preventive frame, treatment frame, authority frame, information update frame, political frame, economic frame, vaccine frame, social security frame, medical research frame, responsibility frame, and war metaphor. In order to avoid overlap between frames, we classify each keyword into the frame that best aligns with the majority of the contextual language. For example, words related to vaccines are prioritized and classified under the vaccine frame, whereas words related to new drugs are categorized under the medical research frame. Similarly, words associated with preventive measures are first classified under the preventive frame, whereas governmental actions pertaining to areas beyond preventive measures are categorized under the political frame. The titles of these frames were created based on the typical information conveyed in each frame. Specifically,

  • Basic information frame: It conveys fundamental information about COVID-19 to the public such as the definition of COVID-19, the virus, and major symptoms.

  • Preventive frame: It conveys information on how to prevent COVID-19.

  • Treatment frame: It conveys information on the treatment of COVID-19, including where to seek treatment, what conventional medications to take, etc.

  • Authority frame: It conveys information from authoritative sources, including doctors from well-known hospitals, researchers from universities and institutes, and officials from disease control centers. The content of their statements usually includes predictions about the development of the COVID-19 pandemic and scientific knowledge related to COVID-19. The content of the statements by authoritative figures may also encompass other frames, but appealing to authority is a common characteristic of news texts that use the authority frame.

  • Information update frame: It conveys information about suspected and confirmed COVID-19 cases during the COVID-19 pandemic. Reports using the information update frame are usually concise and mainly report the number of cases in a specific area in China. Sometimes, information about confirmed COVID-19 cases outside of China is also reported.

  • Political frame: It describes political responses to the pandemic, including government measures and outbreak causes.

  • Economic frame: It conveys the impact of COVID-19 on the economy. The economic frame refers to reports on the impact of the pandemic on the economy. China experienced large-scale shutdowns and production stoppages due to the COVID-19 pandemic. In addition, the COVID-19 pandemic has had a significant impact on various aspects of the Chinese and global economy.

  • Vaccine frame: It addresses COVID-19 vaccine development, safety discussions, and public perceptions, and mainstream media is generally supportive.

  • Social security frame: It conveys the social security measures taken in response to COVID-19. The reporting of the security framework conveys information to the public about measures to protect against the new coronavirus, including social insurance, medical insurance, and specific measures to protect those affected by the epidemic.

  • Medical research frame: It conveys information on scientific research related to COVID-19. The medical research framework communicates medical research information to the public regarding genetic research, tracing, and the development of effective drugs for the new coronavirus.

  • Responsibility frame: It conveys the responsibilities of individuals and society toward COVID-19. The responsibility framework communicates information to the public regarding responsibilities, including who should be held responsible for certain issues during the COVID-19 period and what kind of responsibilities are assigned.

  • War metaphor: It conveys the metaphor of COVID-19 as a war. The war metaphor in COVID-19 news is a metaphorical frame about war. Mainstream media uses vocabulary originally used in the context of war to describe the new coronavirus outbreak, thereby achieving specific pragmatic purposes. For example, words such as ‘defeat,’ ‘enemy,’ and ‘reinforcements,’ which were originally used to describe warfare, have been used in news related to the pandemic to describe certain aspects of the COVID-19 situation.

In determining the frames, reference was made to the episodic–thematic typology of generic media frames provided by Iyengar (1991) and the practices of other studies on disease and health (Dudo et al. 2007; Lee & Basnyat 2013; Odlum & Yoon 2015; Tang et al. 2018). However, the main basis for creating frames still lies in the semantic network that emerges from actual news reporting. After determining the frameworks, the results of encoding the vocabulary showed a high level of inter-coder reliability, with Cohen's Kappa value of 0.841 (p < 0.001). According to Landis & Koch (1977), Altman (1990), and McHugh (2012), when the coefficient value is greater than 0.8, it indicates a high level of consistency between the annotators. In case of disagreement, the codes where full consensus was reached were retained after communication.

A series of χ2 tests indicated significant differences in the usage of these frames across the four stages of the COVID-19 outbreak in China (see Table 1). Pairwise χ2 tests were conducted to further examine the differences in frame usage among these four stages (see Table 2).

Table 1

Results of χ2 tests on frames in four stages

FrameInitialMaintenanceOpening-upResolutionχ2 (df = 3)P-value
Basic information 10,615 (55.76) 5,715 (49.46) 5,844 (48.95) 888 (60.57) 222.500 <0.001* 
Preventive 3,033 (15.93) 1,483 (12.83) 2,315 (19.39) 308 (21.01) 211.615 <0.001* 
Treatment 8,261 (43.39) 2,997 (25.94) 6,550 (54.86) 894 (60.98) 2254.858 <0.001* 
Authority 3,542 (18.61) 2,122 (18.36) 1,888 (15.81) 286 (19.51) 46.047 <0.001* 
Information update 3,160 (16.60) 1,594 (13.79) 540 (4.52) 95 (6.48) 1,069.309 <0.001* 
Political 6,903 (36.26) 3,864 (33.44) 2,602 (21.79) 291 (19.85) 835.979 <0.001* 
Economic 1,704 (8.95) 802 (6.94) 305 (2.55) 19 (1.30) 568.155 <0.001* 
Vaccine 833 (4.38) 4345 (37.60) 868 (7.27) 119 (8.12) 7,296.919 <0.001* 
Social security 1,196 (6.28) 666 (5.76) 1,863 (15.60) 149 (10.16) 974.694 <0.001* 
Medical research 3,649 (19.17) 2,410 (20.86) 1,742 (14.59) 355 (24.22) 200.612 <0.001* 
Responsibility 649 (3.41) 273 (2.36) 196 (1.64) 7 (0.48) 122.823 <0.001* 
War metaphor 2,909 (15.28) 790 (6.84) 566 (4.74) 76 (5.18) 1,135.637 <0.001* 
FrameInitialMaintenanceOpening-upResolutionχ2 (df = 3)P-value
Basic information 10,615 (55.76) 5,715 (49.46) 5,844 (48.95) 888 (60.57) 222.500 <0.001* 
Preventive 3,033 (15.93) 1,483 (12.83) 2,315 (19.39) 308 (21.01) 211.615 <0.001* 
Treatment 8,261 (43.39) 2,997 (25.94) 6,550 (54.86) 894 (60.98) 2254.858 <0.001* 
Authority 3,542 (18.61) 2,122 (18.36) 1,888 (15.81) 286 (19.51) 46.047 <0.001* 
Information update 3,160 (16.60) 1,594 (13.79) 540 (4.52) 95 (6.48) 1,069.309 <0.001* 
Political 6,903 (36.26) 3,864 (33.44) 2,602 (21.79) 291 (19.85) 835.979 <0.001* 
Economic 1,704 (8.95) 802 (6.94) 305 (2.55) 19 (1.30) 568.155 <0.001* 
Vaccine 833 (4.38) 4345 (37.60) 868 (7.27) 119 (8.12) 7,296.919 <0.001* 
Social security 1,196 (6.28) 666 (5.76) 1,863 (15.60) 149 (10.16) 974.694 <0.001* 
Medical research 3,649 (19.17) 2,410 (20.86) 1,742 (14.59) 355 (24.22) 200.612 <0.001* 
Responsibility 649 (3.41) 273 (2.36) 196 (1.64) 7 (0.48) 122.823 <0.001* 
War metaphor 2,909 (15.28) 790 (6.84) 566 (4.74) 76 (5.18) 1,135.637 <0.001* 

Note: Values are presented as n (%). *Denotes significance of items with p < 0.01.

Table 2

Pairwise comparison of frames in four stages

Initial vs. maintenanceMaintenance vs. opening-upOpening-up vs. resolution
Frame Adjusted p-value Adjusted p-value Adjusted p-value 
Basic information <0.001** 0.434 <0.001** 
Preventive <0.001** <0.001** 0.140 
Treatment <0.001** <0.001** <0.001** 
Authority 0.598 <0.001** <0.001** 
Information update <0.001** <0.001** <0.001** 
Political <0.001** <0.001** 0.088 
Economic <0.001** <0.001** 0.003** 
Vaccine <0.001** <0.001** 0.241 
Social security 0.066 <0.001** <0.001** 
Medical research <0.001** <0.001** <0.001** 
Responsibility <0.001** <0.001** <0.001** 
War metaphor <0.001** <0.001** 0.453 
Initial vs. maintenanceMaintenance vs. opening-upOpening-up vs. resolution
Frame Adjusted p-value Adjusted p-value Adjusted p-value 
Basic information <0.001** 0.434 <0.001** 
Preventive <0.001** <0.001** 0.140 
Treatment <0.001** <0.001** <0.001** 
Authority 0.598 <0.001** <0.001** 
Information update <0.001** <0.001** <0.001** 
Political <0.001** <0.001** 0.088 
Economic <0.001** <0.001** 0.003** 
Vaccine <0.001** <0.001** 0.241 
Social security 0.066 <0.001** <0.001** 
Medical research <0.001** <0.001** <0.001** 
Responsibility <0.001** <0.001** <0.001** 
War metaphor <0.001** <0.001** 0.453 

Note: The degree of freedom of all χ2 tests is 1. Adjusted p-value < 0.01. **Denotes significance of items with p < 0.01.

Three comparisons were made in chronological order according to the development stages to observe the usage of different frames in the changing context of time. This follows a similar approach to the examination conducted by Tang et al. (2018) on the social media platform discussions during the outbreak of urticaria infection in the United States. Specifically, the comparisons were made between the initial stage and the maintenance stage, the maintenance stage and the opening-up stage, and the opening-up stage and the resolution stage. The results indicated that in the comparison between the initial stage and the maintenance stage, the usage of the authority frame and the social security frame was not significant at the adjusted p < 0.01 level. In the comparison between the maintenance stage and the opening-up stage, the usage of the basic information frame was not significant at the adjusted p < 0.01 level. In the comparison between the opening-up stage and the resolution stage, the usage of the preventive frame, the political frame, the vaccine frame, and the war metaphor was not significant at the adjusted p < 0.01 level. All other pairs were significant at the adjusted p < 0.01 level.

In the four stages of the COVID-19 outbreak in China, the use of different frames showed varying patterns (see Figure 2). The vertical axis represents the percentage of frame usage, while the horizontal axis represents different stages. The basic information frame has always been one of the most prominent frames in each stage. The treatment frame was one of the two most prominent frames in the initial, opening-up, and resolution stages, but its usage showed a decreasing trend in the maintenance stage. The vaccine frame became the second most prominent frame in the maintenance stage, following the basic information frame, but its usage in other stages was relatively low.
Figure 2

Twelve frames in different stages.

Figure 2

Twelve frames in different stages.

Close modal

The preventive frame decreased in proportion from the initial stage to the maintenance stage, but increased in the opening-up stage, even surpassing the early stage, and further increased in the resolution stage. The medical research frame showed an upward trend from the initial stage to the maintenance stage, but decreased in the opening-up stage, and then increased again in the resolution stage. The authority frame remained relatively stable in the initial and maintenance stages, decreased in the opening-up stage, and then rebounded in the resolution stage. The social security frame also remained relatively stable in the initial and maintenance stages, but it increased significantly in the opening-up stage and then fell back in the resolution stage.

The political frame had a high proportion in the initial stage, ranking third, but its proportion gradually decreased as the stages progressed. There was a similar trend for the economic frame, the responsibility frame, and the war metaphor frame, all of which were used most in the early stage of the outbreak and then gradually decreased. The information update frame was relatively active in the early stage of the outbreak compared to other stages and then gradually decreased until it increased again in the resolution stage.

In Figure 3, the frames of news and the data points of different stages of COVID-19 were plotted as an unbalanced distribution. The blue circle represents the frames of the news, and the red square represents the different stages of COVID-19. The relative proximities between data points indicate the association strength between them. The closer the data points, the stronger the association. Specifically, the data points in the opening-up and resolution stages are relatively close, forming a relatively stable triangle with the other two stages. The data points of the vaccine frame are separated from the other blue circles, indicating that the use of the vaccine frame is different from other frames and is closely related to the maintenance stage. The data points of the social security frame are at a certain distance from the other blue circles and are closely related to the resolution and opening-up stages. The origin shows three frames (basic information frame, authority frame, and medical research frame) with a blurred distinction between them.
Figure 3

Results of MCA.

The 12 frames used by mainstream Chinese media during the COVID-19 period are distributed into four quadrants, as shown in Figure 3, and are proximitized to each other to varying degrees. The usage of these frames can be preliminarily observed. Figure 4 applies HCA to identify the hierarchical relationship of these meanings, which also helps verify the usage of frames observed in the MCA results with quantitative parameters.
Figure 4

Results of HCA.

The dendrogram in Figure 4 shows the results of the HCA using Ward's method and Jaccard coefficient. It represents frames applied in the news of mainstream Chinese media during the COVID-19 period. As shown in Figure 4, there are four possible clusters among the 12 frames. Cluster 1 includes the ‘Authority Frame’ and the ‘Information Update Frame’, Cluster 2 includes the ‘Political Frame’, the ‘Basic Information Frame’, the ‘Treatment Frame’, the ‘Preventive Frame’, the ‘Vaccine Frame’, and the ‘Medical Research Frame’, Cluster 3 includes the ‘Responsibility Frame’, and Cluster 4 includes the ‘Social Security Frame’, the ‘Economic Frame’, and ‘War Metaphor’. Regarding their hierarchical relationship, the eight frames in Cluster 1 and Cluster 2 form the left branch, which is parallel to the branch formed by the four frames in Cluster 3 and Cluster 4 on the right side. In the next level, the left part shows two sub-branches, including two frames from Cluster 1 and six frames from Cluster 2, while the right part also shows two sub-branches, including one frame from Cluster 3 and three frames from Cluster 4.

Table 3, respectively, displays the top 10 words paired with ‘pollution,’ ‘waste,’ and ‘wastewater.’ In the table, the rightmost column represents the total number of word pairings. This value is obtained by counting all words included in the context, within the five words before and after the target word used as the reference node. The total collocation frequency of the term ‘environment’ with ‘pollution’ is 27 times, ranking third. This indicates that when Chinese mainstream media mentions pollution, the concept of environmental pollution is not uncommon, suggesting a potential increased focus on the impact of COVID-19 on environmental pollution. By examining contexts that include both ‘pollution’ and ‘ environment,’ we found that media attention is directed toward the environmental pollution caused by medical waste, emphasizing that medical waste in vaccination areas does not pose harm to humans. Moreover, higher-order collocations reveal that the media primarily disseminates knowledge about virus-contaminated items and infection pathways rather than urging attention to environmental pollution caused by the COVID-19 pandemic. In terms of words paired with ‘waste,’ the total collocation frequencies for ‘dispose,’ ‘treating,’ and ‘collect’ are 16, 13, and 10 times, respectively, all ranking in the top 10. This indicates a clear awareness in Chinese media regarding the handling of medical waste, intending to convey this awareness to the public. Medical waste is identified as a major environmental pollution issue resulting from COVID-19, and Chinese mainstream media has made efforts to address pollution caused by medical waste. Words paired with ‘wastewater,’ ‘virus,’ ‘city,’ ‘region,’ and ‘monitoring,’ all rank in the top 10. This suggests that media attention toward wastewater primarily focuses on utilizing it for monitoring and controlling the COVID-19 pandemic.

Table 3

Collocations of environmental pollution

Pollution
Waste
Wastewater
No.CollocationFrequencyCollocationFrequencyCollocationFrequency
Goods 31 Medicare 72 Virus 14 
Contact 30 Dispose 16 COVID-19 10 
Environment 27 Vaccinate 14 Discovery 
Virus 21 Treating 13 City 
Surface 14 Should 13 Monitoring 
Object 11 Pneumonia 12 Variation 
Not 10 Waste 10 Sample 
Touch 10 Collect 10 State 
Avoid 10 COVID-19 10 Region 
10 Infection Danger Exist 
Pollution
Waste
Wastewater
No.CollocationFrequencyCollocationFrequencyCollocationFrequency
Goods 31 Medicare 72 Virus 14 
Contact 30 Dispose 16 COVID-19 10 
Environment 27 Vaccinate 14 Discovery 
Virus 21 Treating 13 City 
Surface 14 Should 13 Monitoring 
Object 11 Pneumonia 12 Variation 
Not 10 Waste 10 Sample 
Touch 10 Collect 10 State 
Avoid 10 COVID-19 10 Region 
10 Infection Danger Exist 

In the context of health crisis communication, the role of mainstream media in framing differs from traditional contexts such as conventional health communication and political communication. In this study, we used text mining methods for the first time to explore the contextual factors that influence how mainstream Chinese media is framed by epidemic news and compared the framing usage of mainstream Chinese media during different stages of the COVID-19 outbreak. Some literature has already shown the importance of COVID-19 health strategies on the economy and global factors (Abbas 2021; Su et al. 2022; Abbas et al. 2023). It can be foreseen that similar global emerging epidemics like COVID-19 will reoccur in the future. This study further contributes to the limited literature and theories regarding the understanding of information dissemination on emerging epidemics.

Due to the nature of the analyzed data in this study, which consists of news text about COVID-19, the appearance of Groups A, B, and C in Figure 1 is not surprising. They represent the focus of Chinese mainstream media on aspects closely related to the novel coronavirus, such as prevention, vaccines, and treatment. In addition to the words from Groups A, B, and C, terms representing authoritative institutions and socio-economic aspects (Groups D and E) also emerge in the semantic network. This indicates the enthusiasm of Chinese mainstream media in conveying the will and actions of authoritative institutions during the COVID-19 pandemic, showcasing the impact of the crisis on socio-economic factors. This aligns with the results of the frame analysis. However, since the semantic network considers only high-frequency words, it does not cover all the 12 frames analyzed in this study.

The changes in the information frame usage during the COVID-19 pandemic can be categorized into four stages: initial, opening-up, maintenance, and resolution. Initially, the treatment frame dominates, reflecting public interest in treatment methods. However, during the maintenance stage, discussions on treatment decline due to China's ‘zero-COVID’ approach, while the vaccine frame gains importance. Preventive measures and information updates see increased focus during the opening-up or resolution stages. The medical research frame decreases as understanding of COVID-19 grows, while the authority frame remains stable initially but decreases during opening-up. Political, economic, responsibility, and war metaphor frames decline as the pandemic evolves. One possible explanation for the shifts in mainstream media's framing of the COVID-19 pandemic lies in their role in maintaining social stability, stemming from their authority and influence over public behavior (Lee & Basnyat 2013). As key information channels, mainstream media swiftly disseminate facts to prevent public panic. With their authority, they filter and distribute pertinent COVID-19 information, often selectively emphasizing certain aspects. For instance, during the pandemic's onset, media prioritized conveying basic virus information, bolstering credibility through expert opinions to quell panic and promote protective measures. This selective framing, observed by Holmes (2008), aims to induce a behavioral change by providing accurate information through trusted channels. However, in communicating during an emerging pandemic, focusing solely on facts may not suffice due to uncertainty, public anxiety, trust issues, and related matters such as vaccine development, effective drugs, and socio-economic consequences. These complexities demand more than basic information provision. For instance, the political frame usage significantly decreased during the shift to the opening-up and resolution stages of the COVID-19 pandemic. While stringent administrative measures effectively contained the spread and ensured treatment, prolonged enforcement might have sustained public tension. Mainstream media aided in easing public anxiety by selectively framing news, contributing to the decline in war metaphors' usage. Initially, war metaphors were prevalent in framing COVID-19-related phenomena, fostering urgency and solidarity. However, as understanding grows, vaccine availability increases, and public resilience strengthens, mainstream media gradually reduces the use of war metaphors to alleviate public anxiety.

Furthermore, we have observed some interesting phenomena in the news of mainstream media. Although the 12 frames extracted in this study already provide a comprehensive overview of the information selection by Chinese mainstream media, these frames can be further grouped into clusters. Using MCA and HCA techniques, the relationships between different frames were computed and observed, leading to the identification of four additional clusters. Cluster 1 includes the ‘Authority Frame’ and the ‘Information Update Frame,’ suggesting a potential similarity in their roles of providing official information and real-time updates. This indicates that during the pandemic, authoritative information and the latest news complement each other, as people seek both authoritative guidance and real-time updates. Cluster 2 is more extensive, encompassing the ‘Political Frame,’ the ‘Basic Information Frame,’ the ‘Treatment Frame,’ the ‘Preventive Frame,’ the ‘Vaccine Frame,’ and the ‘Medical Research Frame.’ This cluster covers various aspects directly related to pandemic management, including policymaking, basic information dissemination, treatment methods, preventive measures, vaccine development, and medical research. This likely reflects the multifaceted demands and comprehensive strategies in pandemic management. Cluster 3 exclusively contains the ‘Responsibility Frame,’ suggesting that accountability and responsibility attribution are independent and specific topics holding a unique position in media reporting during the pandemic. Cluster 4 comprises the ‘Social Security Frame,’ the ‘Economic Frame,’ and ‘War Metaphor.’ These frames collectively focus on the impact of the pandemic on society and the economy, as well as the measures taken to address these effects. They are grouped together as they collectively highlight the widespread impact of the pandemic on people's lives and discussions on strategic responses. The left and right branches in the hierarchical relationship depict two major approaches to handling information: the left side's eight frames predominantly focus on pandemic management and scientific aspects, while the right side's four frames emphasize socio-economic impacts and responsibility attribution. This analysis examines clustering and hierarchical relationships, revealing the complexity and multidimensionality of media reporting during the pandemic. It illustrates the various roles that different frames play in conveying information and shaping public perception.

Amid the rise of emerging infectious diseases, news media contend with diverse information sources like the Internet and social media, complicating information dissemination. For instance, mainstream media's vaccine coverage may be influenced by social media discourse. Mori et al. (2023) found that younger generations associate the coronavirus vaccine with ‘death’ on Twitter®, though this perception has shifted over time. Gao et al. (2021) noted pre-approval concerns about vaccine safety and priority groups, with post-approval discussions focusing on vaccine side effects. In contrast, Chinese mainstream media adopts a positive, scientific stance in vaccine reporting to alleviate public anxiety, amidst competition with social platforms. This dynamic interaction reflects the evolving media landscape's influence on public discourse (Tang et al. 2018; Gao et al. 2021).

Finally, although the results of collocation stats indicate that Chinese mainstream media has begun to pay attention to the environmental pollution caused by the COVID-19 pandemic, the focus is still on the contamination of items by the virus and the pollution caused by medical waste and wastewater. By emphasizing the safe handling of medical waste, the media aims to raise public awareness of environmental protection during the pandemic. Reporting on wastewater monitoring highlights its importance as a tool for monitoring and preventing the spread of the virus, demonstrating the role of technology and environmental monitoring in epidemic prevention and control. Efforts to reduce the adverse effects of COVID-19 on the environment can bring significant benefits. The positive impact of the COVID-19 pandemic on the environment is also an aspect that the media can focus on. Although further in-depth research is needed on the relationship between China's COVID-19 news and environmental pollution, the results of this study also highlight the efforts of Chinese mainstream media in handling medical waste and wastewater treatment during the pandemic, as well as the lack of comprehensive attention to environmental impact. It also reflects the important role of the media in public health crises and environmental pollution governance, namely, promoting public awareness and action change by disseminating key information.

In conclusion, our comprehensive text-mining analysis of mainstream media coverage during the COVID-19 pandemic in China reveals a focus on 12 frames. The selection of these frames varied across different stages of the epidemic, resulting in significant overall differences. Utilizing natural language processing for data mining offers advantages in broader coverage and efficient data processing compared to traditional analysis. These findings underscore the significance of tailored communication and strategies by Chinese mainstream media in addressing emerging pandemics. The study's outcomes can offer valuable crisis communication recommendations for governments, public health administrations, and media outlets when confronting evolving epidemics. It is worth noting that, in addition to focusing on the pandemic itself, we expanded our research scope to include environmental pollution as a significant consideration. We found that mainstream Chinese media's understanding of the relationship between the COVID-19 pandemic and environmental pollution is relatively narrow, primarily focusing on medical waste and medical wastewater. The reporting direction centers around the harm posed by medical waste and wastewater to people and how to use them to address the COVID-19 pandemic. Chinese mainstream media currently lacks attention to broader pollution issues such as energy, climate, and noise and does not convey the positive aspects of how the COVID-19 pandemic has affected environmental pollution. This discovery provides a new perspective for a more in-depth understanding of China's media focus on environmental issues during a crisis. Therefore, we suggest that future research, when considering pandemics, also pays attention to environmental pollution issues to gain a more comprehensive and insightful understanding. At the same time, we emphasize the necessity of cross-cultural comparisons in media discourse during crises. How Chinese media's framing choices differ from those of other countries during the COVID-19 pandemic and how their focus on environmental pollution aligns with global mainstream perspectives are crucial aspects to explore. Investigating these issues can pave the way for a more comprehensive and diverse research approach, enhancing our understanding of crisis communication strategies and their effectiveness across different contexts.

Y.L. and Z.Z. conceived the idea, designed the study, wrote the manuscript, and completed other major work. L.Y. reviewed the various drafts of the article and completed other auxiliary work. All authors participated in the review and revision of the manuscript and approved the final version for submission.

This research did not require ethics approval. However, we strictly adhere to ethical principles.

The authors have not received any funding or benefits from elsewhere to conduct this study.

Data cannot be made publicly available; readers should contact the corresponding author for details.

The authors declare there is no conflict.

Abbas
J.
,
Al-Sulaiti
K.
,
Lorente
D. B.
,
Shah
S. A. R.
,
Shahzad
U.
,
2023
Reset the industry redux through corporate social responsibility: The COVID-19 tourism impact on hospitality firms through business model innovation
. In:
Economic Growth and Environmental Quality in a Post-Pandemic World: New Directions in the Econometrics of the Environmental Kuznets Curve
(1st edn) (
Shahbaz
M.
,
Lorente
D. B.
&
Sharma
R.
, eds).
Routledge
,
London
, pp.
177
201
.
Abd-Alrazaq
A.
,
Alhuwail
D.
,
Househ
M.
,
Hamdi
M.
&
Shah
Z.
2020
Top concerns of tweeters during the COVID-19 pandemic: Infoveillance study
.
Journal of Medical Internet Research
22
(
4
),
e19016
.
https://doi.org/10.2196/19016
.
Alhuzali
H.
,
Zhang
T.
&
Ananiadou
S.
2022
Emotions and topics expressed on Twitter during the COVID-19 pandemic in the United Kingdom: Comparative geolocation and text mining analysis
.
Journal of Medical Internet Research
24
(
10
),
e40323
.
https://doi.org/10.2196/40323
.
Altman
D. G.
1990
Practical Statistics for Medical Research
.
Chapman & Hall Press
,
New York
.
Ang
L.
,
Hernández-Rodríguez
E.
,
Cyriaque
V.
&
Yin
X.
2023
COVID-19's environmental impacts: Challenges and implications for the future
.
Science of the Total Environment
899
,
165581
.
https://doi.org/10.1016/j.scitotenv.2023.165581
.
Berry
M. W.
&
Kogan
J.
2010
Text Mining: Applications and Theory
.
Wiley
.
https://doi.org/10.1002/9780470689646
.
Danowski
J. A.
1993
Network analysis of message content
.
Progress in Communication Sciences
12
,
198
221
.
Dudo
A. D.
,
Dahlstrom
M. F.
&
Brossard
D.
2007
Reporting a potential pandemic: A risk-related assessment of Avian Influenza Coverage in U.S. Newspapers
.
Science Communication
28
(
4
),
429
454
.
https://doi.org/10.1177/107554700730221
.
Feldman
R.
&
Sanger
J.
2007
The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
.
Cambridge University Press
,
Cambridge
.
Hammad
H. M.
,
Nauman
H. M. F.
,
Abbas
F.
,
Jawad
R.
,
Farhad
W.
,
Shahid
M.
,
Bakhat
H. F.
,
Farooque
A. A.
,
Mubeen
M.
,
Fahad
S.
&
Cerda
A.
2023
Impacts of COVID-19 pandemic on environment, society, and food security
.
Environmental Science and Pollution Research
30
,
99261–
99272
.
https://doi.org/10.1007/s11356-023-25714-1
.
Higuchi
K.
2016
A two-step approach to quantitative content analysis: KH Coder tutorial using Anne of Green Gables (Part I)
.
Ritsumeikan Social Science Review
52
(
3
),
77
91
.
Higuchi
K.
2017
A two-step approach to quantitative content analysis: KH coder tutorial using Anne of Green Gables (Part II)
.
Ritsumeikan Social Sciences Review
53
(
1
),
137
147
.
Holmes
B. J.
2008
Communicating about emerging infectious disease: The importance of research
.
Health, Risk & Society
10
(
4
),
349
360
.
https://doi.org/10.1080/13698570802166431
.
Iyengar
S.
1991
Is Anyone Responsible? How Television Frames Political Issues
.
University of Chicago Press
,
Chicago
.
Landis
J. R.
&
Koch
G. G.
1977
The measurement of observer agreement for categorical data
.
Biometrics
33
(
1
),
159
174
.
https://doi.org/10.2307/2529310
.
Lee
S. T.
&
Basnyat
I.
2013
From press release to news: Mapping the framing of the 2009 H1N1 A Influenza Pandemic
.
Health Communication
28
(
2
),
119
132
.
https://doi.org/10.1080/10410236.2012.658550
.
Mattei
M.
,
Caldarelli
G.
,
Squartini
T.
&
Saracco
F.
2021
Italian Twitter semantic network during the COVID-19 epidemic
.
EPJ Data Science
10
(
1
),
47
.
https://doi.org/10.1140/epjds/s13688-021-00301-x
.
McHugh
M. L.
2012
Interrater reliability: The kappa statistic
.
Biochemia Medica
22
(
3
),
276
282
.
https://doi.org/10.11613/BM.2012.031
.
Meadows
C. Z.
,
Tang
L.
&
Zou
W.
2022
Managing government legitimacy during the COVID-19 pandemic in China: A semantic network analysis of state-run media Sina Weibo posts
.
Chinese Journal of Communication
15
(
2
),
156
181
.
https://doi.org/10.1080/17544750.2021.2016876
.
Mori
Y.
,
Miyatake
N.
,
Suzuki
H.
,
Mori
Y.
,
Okada
S.
&
Tanimoto
K.
2023
Comparison of impressions of COVID-19 vaccination and influenza vaccination in Japan by analyzing social media using text mining
.
Vaccines
11
(
8
),
1327
.
https://doi.org/10.3390/vaccines11081327
.
Nucci
M. L.
,
Cuite
C. L.
&
Hallman
W. K.
2009
When good food goes bad: Television net-work news and the spinach recall of 2006
.
Science Communication
31
(
2
),
238
265
.
https://doi.org/10.1177/107554700934033
.
Odlum
M.
&
Yoon
S.
2015
What can we learn about the Ebola outbreak from tweets?
American Journal of Infection Control
43
(
6
),
563
571
.
https://doi.org/10.1016/j.ajic.2015.02.023
.
Reynolds
B.
&
Seeger
M. W.
2005
Crisis and Emergency Risk Communication as an integrative model
.
Journal of Health Communication
10
(
1
),
43
55
.
https://doi.org/10.1080/10810730590904571
.
Risch
J.
,
Kao
A.
,
Poteet
S. R.
&
Wu
Y. J. J.
2008
Text visualization for visual text analytics
. In:
Visual Data Mining: Theory, Techniques and Tools for Visual Analytics
(Simoff, S. J., Böhlen, M. H. & Mazeika, A. (eds.)
).
Springer
,
Berlin, Heidelberg
, pp.
154
171
.
Rume
T.
&
Islam
S. M. D.
2020
Environmental effects of COVID-19 pandemic and potential strategies of sustainability
.
Heliyon
6
(
9
),
e04965
.
https://doi.org/10.1016/j.heliyon.2020.e04965
.
Scheufele
D. A.
&
Lewenstein
B. V.
2005
The public and nanotechnology: How citizens make sense of emerging technologies
.
Journal of Nanoparticle Research
7
,
659
667
.
https://doi.org/10.1007/s11051-005-7526-2
.
Shih
T. J.
,
Wijaya
R.
&
Brossard
D.
2008
Media coverage of epidemic hazards: Linking framing and issue attention cycle towards an integrated theory of print news coverage of epidemic hazards
.
Mass Communication and Society
11
(
2
),
141
160
.
https://doi.org/10.1080/15205430701668121
.
Shih
T. J.
,
Brossard
D.
&
Wijaya
R.
2011
News coverage of public health issues: The role of news sources and the processes of news construction
.
International Public Health Journal
3
(
1
),
87
97
.
Shim
J. G.
,
Ryu
K. H.
,
Lee
S. H.
,
Cho
E. A.
,
Lee
Y. J.
&
Ahn
J. H.
2021
Text mining approaches to analyze public sentiment changes regarding COVID-19 vaccines on social media in Korea
.
International Journal of Environmental Research and Public Health
18
(
12
),
6549
.
https://doi.org/10.3390/ijerph18126549
.
Su
Z.
,
Cheshmehzangi
A.
,
Bentley
B. L.
,
McDonnell
D.
,
Šegalo
S.
,
Ahmad
J.
&
da Veiga
C. P.
2022
Technology-based interventions for health challenges older women face amid COVID-19: A systematic review protocol
.
Systematic Reviews
11
(
1
),
271
.
https://doi.org/10.1186/s13643-022-02150-9
.
Suzuki
R.
,
Iizuka
Y.
&
Lefor
A. K.
2021
COVID-19 related discrimination in Japan: A preliminary analysis utilizing text-mining
.
Medicine
100
(
36
),
e27105
.
https://doi.org/10.1097/MD.00000000000027105
.
Sweetser
K. D.
&
Brown
C. W.
2008
Information subsidies and agenda building during the Israel–Lebanon crisis
.
Public Relations Review
34
(
4
),
359
366
.
https://doi.org/10.1016/j.pubrev.2008.06.008
.
Tang
L.
,
Bie
B.
&
Zhi
D.
2018
Tweeting about measles during stages of an outbreak: A semantic network approach to the framing of an emerging infectious disease
.
American Journal of Infection Control
46
(
12
),
1375
1380
.
https://doi.org/10.1016/j.ajic.2018.05.019
.
Tian
Y.
&
Stewart
C. M.
2005
Framing the SARS crisis: A computer-assisted text analysis of CNN and BBC online news reports of SARS
.
Asian Journal of Communication
15
(
3
),
289
301
.
https://doi.org/10.1080/01292980500261605
.
Van der Meer
T. G.
,
Verhoeven
P.
,
Beentjes
H.
&
Vliegenthart
R.
2014
When frames align: The interplay between PR, news media, and the public in times of crisis
.
Public Relations Review
40
(
5
),
751
761
.
https://doi.org/10.1016/j.pubrev.2014.07.008
.
Viswanath
K.
&
Emmons
K. M.
2006
Message effects and social determinants of health: Its application to cancer disparities
.
Journal of Communication
56
(
1
),
238
264
.
https://doi.org/10.1111/j.1460-2466.2006.00292.x
.
Weaver
A. K.
,
Head
J. R.
,
Gould
C. F.
,
Carlton
E. J.
&
Remais
J. V.
2022
Environmental factors influencing COVID-19 incidence and severity
.
Annual Review of Public Health
43
,
271
291
.
https://doi.org/10.1146/annurev-publhealth-052120-101420
.
Wu
M.
2006
Framing AIDS in China: A comparative analysis of US and Chinese wire news coverage of HIV/AIDS in China
.
Asian Journal of Communication
16
(
3
),
251
272
.
https://doi.org/10.1080/01292980600857781
.
Yang
M.
,
Chen
L.
,
Msigwa
G.
,
Tang
K. H. D.
&
Yap
P.-S.
2022
Implications of COVID-19 on global environmental pollution and carbon emissions with strategies for sustainability in the COVID-19 era
.
Science of the Total Environment
809
,
151657
.
https://doi.org/10.1016/j.scitotenv.2021.151657
.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (CC BY 4.0), which permits copying, adaptation and redistribution, provided the original work is properly cited (http://creativecommons.org/licenses/by/4.0/).