How to analyze politicians’ discourse on climate change using techniques connected to natural language processing?
Climate change is currently one of the biggest challenges facing society. Its socio-environmental impact is already visible worldwide, with rising temperatures, intensification of extreme weather events, and severe consequences for biodiversity and humans. Therefore, governments, organizations, and individuals must take concrete steps to confront this challenge.
In this sense, it is crucial to monitor the policies and speeches of government representatives on climate change. They are responsible for creating laws and establishing measures to protect the environment and combat climate change. Therefore, understanding what they are saying about the subject if they are committed to taking concrete measures and if they are aligned with international goals and objectives becomes crucial for dealing with contemporary challenges.
In this context, natural language processing tools are powerful for discourse analysis. With the popularization of artificial intelligence and specific models for this area, it is now possible to do a more in-depth and precise analysis of politicians’ speeches regarding climate change.
Discourse analysis[1] allows us to identify trends and patterns in the words and phrases used by politicians about the climate and also helps us evaluate the degree of politicians’ commitment to the theme. Additionally, it is possible to compare speeches from politicians of different parties and regions to verify if there are differences in approaches and priorities.
Taking into account the aspects above, the objective of this text is to present some tips and examples of how to perform discourse analysis of politicians on climate change using Python. In the following subsections of this text, we will show how to collect data from online sources, clean them, and prepare them for analysis[2]. Finally, we will provide the source code so that others interested can replicate and expand this research.
Collecting data from online sources
Data about the speeches of congress members can be collected from the Brazilian National Congress website[3]: https://dadosabertos.camara.leg.br/swagger/api.html#api.
These data can be extracted manually or from an API. The API is an application programming interface that allows access to the data in an automated and integrated way with other systems. In this case, the API was implemented in the Python code available here, allowing the data to be collected more efficiently and quickly.
Cleaning and preparing data for analysis
For the analysis of speeches, it is important to use techniques that aim to clean and prepare the data for later modeling. In the present text, the following actions were performed on the speeches of congressmen:
- Removal of accents: this technique removes accents from the texts so that the processing can be done without distinction between accented and unaccented letters.
- Removal of stopwords: stopwords are common words, such as “is”, “and”, “of”, “in”, that do not add significant information to the speech analysis. Therefore, they are removed to reduce the amount of data to be processed.
- Removal of punctuation: this technique removes all punctuation from the texts, including commas, periods, exclamation marks, among others, so that processing is done only with words.
The execution of the above steps resulted in a list of clean text and prepared for analysis. From this, a count of words related to the theme “climate change” was made. A corpus of words related to this theme, such as “climate”, “climate change”, “global warming”, “climate crisis”, among others, was created and used for the count of these words in the speeches. This allowed us to evaluate the importance that each deputy gives to the theme and helps to understand the position of each one on the subject. The data collected on the speeches covered the period from January 1988 to August 2022. It is important to note that mainly in the early years, there were a series of missing information in the database made available by the Brazilian National Congress.
Results
From the data, static and dynamic graphs were built for data presentation. These covered the following aspects regarding the “climate change” topic:
- Deputies who presented the highest number of speeches
- Parties that presented the highest number of speeches
- States that presented the highest number of speeches by their deputies
- Years that presented the highest number of speeches
- Birth years of congress members who presented the highest number of speeches
The static and dynamic charts below have been presented in detail for most of the aspects above. It is worth mentioning that regarding the dynamic chart, it is possible to observe how the number of speeches has taken place over the years.
Additionally, it is important to inform that the word count did not consider the context in which the word corpus used was spoken. In this sense, caution must be taken in the analysis, as possibly many of these words were said with a connotation that goes against the face of the climate crisis. Finally, the possibility of information not cataloged by the National Congress database is noted. Thus, the total number of speeches may be inaccurate.
Graphs 1 and 2 below show the members of the Chamber of Deputies who presented the highest number of speeches in the years analyzed.
Graph 1: Members of the Chamber of Deputies who presented the highest number of speeches (static)
Source: own elaboration. Data from the Brazilian National Congress.
Graph 2: Members of the Chamber of Deputies who presented the highest number of speeches (dynamic)
Source: own elaboration. Data from the Brazilian National Congress.
Graphs 3 and 4 below show the political parties that presented the highest number of speeches by linked deputies, considering the years analyzed.
Graph 3: Political parties that presented the highest number of speeches (static)
Source: own elaboration. Data from the Brazilian National Congress.
Graph 4: Political parties that presented the highest number of speeches (dynamic)
Source: own elaboration. Data from the Brazilian National Congress.
Graphs 5 and 6 below show the States that presented the highest number of speeches by linked deputies, considering the years analyzed.
Graph 5: States that presented the highest number of speeches by their deputies (static)
Source: own elaboration. Data from the Brazilian National Congress.
Graph 6: States that presented the highest number of speeches by their deputies (dynamic)
Source: own elaboration. Data from the Brazilian National Congress.
Graph 7 below shows the years that presented the highest number of speeches, considering those analyzed here. Only the static chart was added.
Graph 7: Years that presented the highest number of speeches (static)
Source: own elaboration. Data from the Brazilian National Congress.
Graphs 8 and 9 below show the birth years of the members of the Chamber of Deputies who presented the highest number of speeches, considering the period analyzed.
Graph 8: Birth years of the members of the Chamber of Deputies who presented the highest number of speeches (static)
Source: own elaboration. Data from the Brazilian National Congress.
Graph 9: Birth years of the members of the Chamber of Deputies who presented the highest number of speeches (dynamic)
Source: own elaboration. Data from the Brazilian National Congress.
Source code in Python
To access the code and perform the actions explained here, from downloading the speeches to building the graphs, access the Github repository: cpscesar/deputyspeech: The files in this repository contain the Python codes for the text published on Medium. #NLP (github.com)
Finally, it is worth noting that the code provided can easily be adapted to other word corpora.
If you have any questions/suggestions about the content of the text and code, leave a comment below.
[1] I refer here to the broader meaning of discourse analysis, not concerning a qualitative data analysis technique.
[2] This text will not address the implementation of natural language processing models. This subject will be addressed in later texts.
[3] This research is restricted to Brazil. However, the techniques presented here can be applied to other speeches and also to different themes from those presented here.