Last updated: 2023-12-20

Checks: 7 0

Knit directory: workflowr-policy-landscape/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20220505)

The command set.seed(20220505) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: a30bb03

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version a30bb03. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/.DS_Store
    Ignored:    data/original_dataset_reproducibility_check/.DS_Store
    Ignored:    output/.DS_Store
    Ignored:    output/Figure_3B/.DS_Store
    Ignored:    output/created_datasets/.DS_Store

Untracked files:
    Untracked:  gutenbergr_0.2.3.tar.gz

Unstaged changes:
    Modified:   Policy_landscape_workflowr.R
    Modified:   data/original_dataset_reproducibility_check/original_cleaned_data.csv
    Modified:   data/original_dataset_reproducibility_check/original_dataset_words_stm_5topics.csv
    Modified:   output/Figure_3A/Figure_3A.png
    Modified:   output/created_datasets/cleaned_data.csv

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the repository in which changes were made to the R Markdown (analysis/1a_Data_preprocessing.Rmd) and HTML (docs/1a_Data_preprocessing.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File	Version	Author	Date	Message
html	5c836ab	zuzannazagrodzka	2023-12-07	Build site.
html	c494066	zuzannazagrodzka	2023-12-02	Build site.
Rmd	627fdff	zuzannazagrodzka	2023-12-01	correction number documents
Rmd	926693b	zuzannazagrodzka	2023-12-01	correction
html	8b3a598	zuzannazagrodzka	2023-11-10	Build site.
html	729fc52	zuzannazagrodzka	2023-11-10	Build site.
html	a66c8a9	zuzannazagrodzka	2023-11-09	Build site.
Rmd	e8c5afe	zuzannazagrodzka	2023-11-09	wflow_publish(c("./analysis/ListMissionVision.Rmd", "./analysis/1b_Dictionaries_preparation.Rmd",
Rmd	41dd1ca	Thomas Frederick Johnson	2022-11-25	Revisions to the text, and pushing the write thing this time…
html	5bdfc2a	Andrew Beckerman	2022-11-24	Build site.
html	34ddc80	Andrew Beckerman	2022-11-24	Build site.
html	93838e7	Andrew Beckerman	2022-11-24	fixing paths in index
html	a3f02dc	Andrew Beckerman	2022-11-24	fixing paths in index
html	693000e	Andrew Beckerman	2022-11-24	Build site.
html	60a6c61	Andrew Beckerman	2022-11-24	Build site.
html	fb90a00	Andrew Beckerman	2022-11-24	Build site.
Rmd	e08d7ac	Andrew Beckerman	2022-11-24	more organising and editing of workflowR mappings
Rmd	31239cd	Andrew Beckerman	2022-11-24	more organising and editing of workflowR mappings
Rmd	c95aa82	Andrew Beckerman	2022-11-10	updating pre-processing mission html for workflowr
html	c95aa82	Andrew Beckerman	2022-11-10	updating pre-processing mission html for workflowr
html	0a21152	zuzannazagrodzka	2022-09-21	Build site.
html	796aa8e	zuzannazagrodzka	2022-09-21	Build site.
html	91d5fb6	zuzannazagrodzka	2022-09-20	Build site.
Rmd	e8852f1	zuzannazagrodzka	2022-09-20	wflow_publish(c("analysis/1a_Data_preprocessing.Rmd", "analysis/1b_Dictionaries_preparation.Rmd"))

Mission and aim statement overview

General information

We collected 129 mission and aim statements among six stakeholder groups involved in the ecology and evolutionary biology research landscape.

advocates (24 documents)
funders (30 documents)
journals (non Open Access = 16 documents, journals Open Access = 14 documents)
publishers (for-profit = 6 documents, not-for-profit = 9 documents)
repositories (17 documents)
learned societies (13 documents).

Stakeholder characteristic and identities

Journals

We used the Scimago Journal & Country Rank website (https://www.scimagojr.com/) to search for the journals with the highest impact value in 2020 (subject areas include: Environmental Science, Agricultural and Biological Sciences, Biochemistry, Genetics and Molecular Biology); all of which publish ecology and evolutionary biology research. We identified a combined 14 open access (OA) journals and 16 non-open access (non-OA) journals. We included some journals that were priori aware of, but were not on the list. This collection of journals included both learned society and non-society journals.

Publishers

We identified publishers as the owner or production unit of the journals.

Funders

To find funders we searched in the “Acknowledgments” sections of some scientific articles published in 2019 and 2020 in high impact factor journals (OA and non-OA). We focused on finding funders from all continents, with a limit of three national funders per country. Moreover, we contacted some colleagues/colleges/universities outside of the UK, for information on the funding sources in their country.

Repositories

We looked at the Data availability statements of articles published in 2019 and 2020 in high impact factor journals (OA and non-OA) and collected information on where the data and code were archived. Our list includes generalist repositories and subject specific repositories.

Societies

We identified societies based on the journals they own and by priori experience.

Advocates

Advocates are a group of organisations that actively support or promote good quality and accessible research (open research). We considered different aspects of open research (open access, open data, open methods) when looking for these advocacy organisations. Most advocates are not exclusively supporting research in ecology and evolutionary biology.

Aims and missions statements

In August 2021 we collected the Aims and Mission Statements on the official website of each stakeholder. We did not contact anyone associated with the stakeholders to request more information. If there was no separate section for the aim or mission statements, but text resembling these statements was contained within an “About” section, this was deemed acceptable. The text from these websites were manually copied and separately saved for each of the stakeholders (List of the organisations. The first line in the documents is a source website.

Documents preprocessing

To analyse the content of the statements, we first preprocessed the documents following the cleaning process suggested in Maier et al. 2018 “Applying LDA Topic Modelling in Communication Research: Toward a Valid and Reliable Methodology”:

Importing all documents and converting them into a table. Columns: name - name of the stakeholder filename - name of the file (NameOfStakeholder_DocumentType) stakeholder - stakeholder group (here: advocates, funders, journals,for-profit publishers, not-for-profit publishers, repositories, societies) txt - text (Statements) doc_type - type of the document (Mission Statement or About)
Removing link formatting from the text (http:// and https:// links)
Separating text into sentences and keeping information on what document and stakeholder they belong to.
Tokenisation - creating a tidy text, converting tokens to lowercase, removing punctuation, deleting special characters
Removing stop-words, for this we used lexicons SMART and snowball in stop_words lexicon (library tidytext) and removing other not significant words like: numbering (ii, iii, iv, v), name of document type (aim, aims, mission…), name of the stakeholders (erc, nerc, wellcome)
Lemmatization (library lexicon) - converting words to their lemma form/lexeme (e.g., “contaminating” and “contamination” become “contaminate”) (Manning & Schütze, 2003, p. 132).

We worked on a relatively small number of documents and because of that we did not perform relative prunning (stripping very rare and extremely frequent word occurrences from the observed data).

Setup and R packages

Cleaning environment and loading R packages

rm(list=ls())

library(tidyverse)
library(purrr)
library(tidyr)
library(stringr)
library(tidytext)

# Additional libraries
library(quanteda)

Warning in .recacheSubclasses(def@className, def, env): undefined subclass
"pcorMatrix" of class "replValueSp"; definition not updated

Warning in .recacheSubclasses(def@className, def, env): undefined subclass
"pcorMatrix" of class "xMatrix"; definition not updated

Warning in .recacheSubclasses(def@className, def, env): undefined subclass
"pcorMatrix" of class "mMatrix"; definition not updated

library(quanteda.textplots)
library(quanteda.dictionaries)
library(tm)
library(topicmodels)
library(ggplot2)
library(dplyr)
library(wordcloud)
library(reshape2)
library(igraph)
library(ggraph)
library(stm)

library("kableExtra") # to create a table when converting to html

Importing data

Impotring stakeholder statements (.txt format), compiling them into a list, and converting this list into a corpus

dirs <- list.dirs(path = "./data/mission_statements", recursive = FALSE)
getwd()

[1] "/Users/zuzannazagrodzka/Library/CloudStorage/GoogleDrive-z.zagrodzka@sheffield.ac.uk/My Drive/PhD_folder_laptop/11_2023_Accelerating_OR_agenda/workflowr-policy-landscape"

# List of files
files <- list()
for (i in 1:length(dirs)){files[[i]] <- list.files(path = dirs[i], pattern = ".txt", full.names = TRUE, recursive = FALSE)}

# files
use_files <- unlist(files)

dirs <- list.dirs(path = "./data/mission_statements", recursive = FALSE)
# dirs

files <- list()
# files

for (i in 1:length(dirs)){
  files[[i]] <- list.files(path = dirs[i], 
                           pattern = ".txt", 
                           full.names = TRUE, 
                           recursive = FALSE)}

# files
use_files <- unlist(files)
# use_files

# using purrr to generate a data frame of the corpuses
corpus_df <- map_df(use_files, 
                    ~ data_frame(txt = read_file(.x)) %>%
                      mutate(filename = basename(.x)))

Warning: `data_frame()` was deprecated in tibble 1.1.0.
ℹ Please use `tibble()` instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
generated.

corpus_df$txt <- iconv(corpus_df$txt, from = "ISO-8859-1", to = "UTF-8")

# removing encoded junk from the text column
corpus_df$txt <- gsub("[^[:print:]]", " ", corpus_df$txt)

Adding columns to the corpus

Add metadata to the corpus clarifying which stakeholder and stakeholder group each statement belongs to

# create new columns: name, stakeholder
corpus_df$name <- corpus_df$filename

corpus_df <- corpus_df %>% separate(name, c("name","doc_type"), sep = "_") 
corpus_df <- corpus_df %>% mutate_at("doc_type", str_replace, ".txt", "")

# creating a column: stakeholder 
corpus_df$stakeholder <- corpus_df$name

# filling stakeholder column with the stakeholders' names

# Funders
corpus_df$stakeholder[corpus_df$stakeholder%in% c("CNPq", "Alexander von Humboldt Foundation", "Australian Research Council", "Chinese Academy of Sciences", "Conacyt", "CONICYT", "Consortium of African Funds for the Environment", "Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior", "CSIR South Africa", "Deutsche Forschungsgemeinschaft", "ERC", "FORMAS", "French National Centre for Scientific Research", "Helmholtz-Gemeinschaft", "JST", "Max Planck Society", "MOE China", "National Natural Science Foundation", "National Research Council Italy", "National Science Foundation", "NERC", "NRC Egypt", "NRF South Africa", "NSERC", "RSPB", "Russian Academy of Science", "Sea World Research and Rescue Foundation", "Spanish National Research Council", "The Daimler and Benz Foundation", "The French National Research Agency", "Wellcome")] <- "funders"


# Journals OA
corpus_df$stakeholder[corpus_df$stakeholder%in% c("Arctic, Antarctic, and Alpine Research", "Biogeosciences","Conservation Letters", "Diversity and Distributions", "Ecology and Evolution", "Ecology and Society", "eLifeJournal", "Evolution Letters", "Evolutionary Applications", "Frontiers in Ecology and Evolution", "Neobiota", "PeerJJournal", "Plos Biology", "Remote Sensing in Ecology and Conservation")] <- "journals_OA"

# Journals nonOA (including transitioning, hybrid and closed - last time checked August 2021)
corpus_df$stakeholder[corpus_df$stakeholder%in% c("BioSciences", "American Naturalist", "Annual Review of Ecology Evolution and Systematics", "Biological Conservation", "Conservation Biology", "Ecological Applications", "Ecology Letters", "Ecology", "Evolution", "Frontiers in Ecology and the Environment", "Global Change Biology", "Journal of Applied Ecology", "Nature Ecology and Evolution", "Philosophical Transactions of the Royal Society B", "Proceedings of the Royal Society B Biological Sciences", "Trends in Ecology & Evolution")] <- "journals_nonOA"


# Societies
corpus_df$stakeholder[corpus_df$stakeholder%in% c("BES", "ESEB", "RS", "SORTEE", "The Society for Conservation Biology", "The Zoological Society of London", "Society for the Study of Evolution", "Max Planck Society", "American Society of Naturalists", "British Ecological Society", "Ecological Society of America", "European Society for Evolutionary Biology", "National Academy of Sciences", "Australasian Evolution Society", "Ecological Society of Australia", "Royal Society Te Aparangi", "The Royal Society")] <- "societies"

# Repositories
corpus_df$stakeholder[corpus_df$stakeholder%in% c("Australian Antarctic Data Centre", "BCO-DMO", "DNA Databank of Japan", "Dryad", "European Bioinformatics Institute", "Figshare", "GBIF", "Harvard Dataverse", "KNB", "Marine Data Archive", "NCBI", "TERN", "World Data Center for Climate", "Zenodo", "EcoEvoRxiv", "bioRxiv", "OSF")] <- "repositories"

# Publishers non for profit and for profit

corpus_df$stakeholder[corpus_df$stakeholder%in% c("The University of Chicago Press", "Annual Reviews", "BioOne", "eLife", "Frontiers", "PLOS", "Resilience Alliance", "The Royal Society Publishing", "AIBS")]  <- "publishers_nonProfit"

corpus_df$stakeholder[corpus_df$stakeholder%in% c("Cell Press", "Elsevier", "Springer Nature", "PeerJ", "Pensoft", "Wiley")]  <- "publishers_Profit"


# Advocates - stakeholders promoting good research practices and Open Research agenda
corpus_df$stakeholder[corpus_df$stakeholder%in% c("Center for Open Science", "coalitionS", "CoData", "DataCite", "DOAJ", "Gitlab", "Peer Community In", "RDA", "Research Data Canada", "Africa Open Science and Hardware", "Amelica", "Bioline International", "Coko", "COPDESS", "FAIRsharing" , "FORCE11", "FOSTER" , "Free our knowledge", "Jisc", "Open Access Australasia", "Reference Center for Environmental Information", "Research4life" , "ROpenSci" , "SPARC" )] <- "advocates"

Create workflowr corpus

Creating corpus_df_website_info which is going to be used later to get a list of the websites

corpus_df_website_info <- corpus_df

Preprocessing

Text cleaned and lemmatized. All stakeholder names are removed

# Cleaning the text from http:// and https:// links, removing numbers and "'s"
# remove http:// and https:// and www.
corpus_df$txt <- gsub("(s?)(f|ht)tp(s?)://\\S+\\b", " ", corpus_df$txt, useBytes = TRUE) 
corpus_df$txt <- gsub("www.\\S+\\s*", "", corpus_df$txt, useBytes = TRUE) 


# removing full names and phrases before tokenisation:
# change oa to open access and or to open research, for-profit and for profit to forprofit, no-profit 

corpus_df$txt <- gsub(" F.A.I.R. ", " FAIR ", corpus_df$txt, useBytes = TRUE)
corpus_df$txt <- gsub(" OA ", " open access ", corpus_df$txt, useBytes = TRUE)
corpus_df$txt <- gsub(" OR ", " open research ", corpus_df$txt, useBytes = TRUE)
corpus_df$txt <- gsub(" OS ", " open science ", corpus_df$txt, useBytes = TRUE)
corpus_df$txt <- gsub(" OA ", " open access ", corpus_df$txt, useBytes = TRUE)
corpus_df$txt <- gsub("no-profit|not-for-profit|not for-profit|no profit", "nonprofit", corpus_df$txt,useBytes = TRUE)
corpus_df$txt <- gsub("for-profit|for profit", "forprofit", corpus_df$txt,useBytes = TRUE)

corpus_df$txt <- gsub("DOIs|dois|DOI", "doi", corpus_df$txt, useBytes = TRUE)

# removing email addresses @
corpus_df$txt <- gsub("\\S*@\\S*","",corpus_df$txt, useBytes = TRUE)

# removing names mentioned in the documents:
corpus_df$txt <- gsub("Marc Schiltz the President of Science Europe|Dr. Francesca Dominici|Kaiser Wilhelm|Harold Varmus|Patrick Brown|Michael Eisen|Adolph von Harnack|Harnack|Otto Hahn Medal|Albert Einstein|Robert-Jan Smits|Carl Folke|Lance Gunderson|Abraham Lincoln|Sewall Wright|Ruth Patric|Douglas Futuyama|Louis Agassiz at Harvard's Museum of Comparative Zoology|Charles Darwin|Isaac Newton|Rosalind Franklin|Theodosius Dobzhansky","",corpus_df$txt, useBytes = TRUE)

# removing all names (part 1)
corpus_df$txt <- gsub("General Conference of the United Nations Educational, Scientific and Cultural Organization|International Association of Scientific, Technical & Medical Publishers|Coordination for the Improvement of Higher Education Personnel (CAPES)|Jasper Loftus-Hills Young Investigator Award|Edward O. Wilson Naturalist Award|International Network for the Availability of Scientific Publications|United Nations Educational, Scientific and Cultural Organization|Office of Polar Programs at the U.S. National Science Foundation|National Commission for Scientific and Technological Research|Coalition for Publishing Data in the Earth and Space Sciences|Natural Sciences and Engineering Research Council of Canada|Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior|Catalogue of Australian Antarctic and Subantarctic Metadata|Open Reliable Transparent Ecology and Evolutionary biology|International Nucleotide Sequence Database Collaboration|United States Government's National Science Foundation|Proceedings of the Royal Society B Biological Sciences|National Charter of Ethics for the Research Profession|Consortium of African Funds for the Environment (CAFE)|Committee on Data of the International Science Council|South African National Biodiversity Institute (SANBI)|Scholarly Publishing and Academic Resources Coalition|Malawi Environmental Endowment Trust (MEET) in Malawi|National Council of Science and Technology (Conacyt)|Annual Review of Ecology Evolution and Systematics|the University of Chicago Press Journals Division|Philosophical Transactions of the Royal Society B|International Max Planck Research Schools (IMPRS)|the National Health and Medical Research Council|Australian Government’s Department of Innovation|Consortium of African Funds for the Environment|the National Competitive Grants Program (NCGP)|European Society of Evolutionary Biology|Research for Development and Innovation (ARDI)|National Institute of Standards and Technology|International Congress of Conservation Biology|French National Centre for Scientific Research|University of Chicago Press Journals Division|Study of Environmental Arctic Change (SEARCH)|South African National Biodiversity Institute|Reference Center on Environmental Information|Biological and Chemical Oceanography Sections|Open Access Envoy of the European Commission|National Natural Science Foundation of China|National Institutes of Health|Big Hairy Audacious Goal|Deutsche Zentren für Gesundheitsforschung|University of Colorado Boulder|Study of Environmental Arctic Change (SEARCH)|John Maynard Smith|Darwin Core|PeerJ – the Journal of Life & Environmental Sciences (PeerJ)|PeerJ Computer Science|PeerJ Physical Chemistry|PeerJ Organic Chemistry|PeerJ Inorganic Chemistry|PeerJ Analytical Chemistry and PeerJ Materials Science", "", corpus_df$txt, useBytes = TRUE)
                      
# removing all names (part 2)
corpus_df$txt <- gsub("African Institute of Open Science & Hardware|Electronic Publishing Trust for Development|Remote Sensing in Ecology and Conservation|National Competitive Grants Program (NCGP)|Journal of Biogeography and Global Ecology|Excellence in Research for Australia (ERA)|Excellence in Research for Australia (ERA)|Intergovernmental Panel on Climate Change|Gottlieb Daimler and Karl Benz Foundation|Carl Benz House|European Society for Evolutionary Biology|Sea World Research and Rescue Foundation|Science for Nature and People Parnership|Global Biodiversity Information Facility|Frontiers in Ecology and the Environment|EMBL's European Bioinformatics Institute|Artificial Intelligence Review Assistant|Institute of Arctic and Alpine Research|State of Florida and Palm Beach County|Peer Community in Evolutionary Biology|European Group on Biological Invasions|Arctic, Antarctic, and Alpine Research|Weizmann Institute in Rehovot, Israel|UNESCO Universal Copyright Convention|UNESCO Recommendation on Open Science|International Panel on Climate Change|European Molecular Biology Laboratory|European Molecular Biology Laboratory|University of Toronto at Scarborough|Natural Environment Research Council|Knut and Alice Wallenberg Foundation|Global Open Science Hardware Roadmap|State of Alaska's Salmon and People|Research for Global Justice (GOALI)|National Natural Science Foundation|Knowledge Network for Biocomplexity|Society for the Study of Evolution|Research in the Environment (OARE)|Frontiers in Ecology and Evolution|Data Observation Network for Earth|Collaborative Peer Review Platform|the American Journal of Sociology|Spanish National Research Council|Research Ideas and Outcomes (RIO)|Research Ideas and Outcomes (RIO)|European Bioinformatics Institute|Directory of Open Access Journals|Cambridge Conservation Initiative|Alexander von Humboldt Foundation|the Zoological Society of London|Society for Conservation Biology|Open Educational Resources (OER)|Field Chief Editor Mark A. Elgar|Biogeosciences Discussions (BGD)|Australian Antarctic Data Centre|University of Toronto Libraries|The University of Chicago Press|Research in Agriculture (AGORA)|NIH Intramural Research Program|National Research Council|National Academy of Engineering|Millennium Ecosystem Assessment|Journal of Evolutionary Biology|Howard Hughes Medical Institute|German Climate Computing Centre|French National Research Agency|European Research Council (ERC)|eLife Sciences Publications Ltd|Ecological Society of Australia|Deutsche Forschungsgemeinschaft|American Society of Naturalists|Japan's Science and Technology|Australian Government Minister|Australasian Evolution Society|African Journals OnLine (AJOL)|Africa Open Science & Hardware|World Data Center for Climate|Trends in Ecology & Evolution|National Institutes of Health|Kurchatov Institute in Russia|International Science Council|Elsevier’s Clinical Solutions|Ecological Society of America|Department of Social Sciences|Cornell and Yale Universities|Cold Spring Harbor Laboratory|American Journal of Sociology|Research for Health (Hinari)|Philosophical Transactions B|Nature Ecology and Evolution|National Research Foundation|National Library of Medicine|National Academy of Sciences|National Academy of Medicine|Journal of Political Economy|Journal of Political Economy|Helmholtz-Alberta Initiative|Harvard Dataverse Repository|European Research Area (ERA)|ISI ScienceWatch|Royal Charter|Springer Nature|The Nature Portfolio|Scientific American", "", corpus_df$txt, useBytes = TRUE)
                      
# removing all names (part 3)
corpus_df$txt <- gsub("University of Chicago Press|Tropical Database in Brazil|Research Ideas and Outcomes|National Science Foundation|Ministry of Education (MEC)|Federal Republic of Germany|Diversity and Distributions|Daimler and Benz Foundation|Chinese Academy of Sciences|Chinese Academy of Sciences|Australian Research Council|Australia’s Chief Scientist|Russian Academy of Science|Nature Ecology & Evolution|National Research Strategy|Max Planck Innovation GmbH|Journal of Applied Ecology|Further Max Planck Centers|British Ecological Society|WHO, FAO, UNEP, WIPO, ILO|Royal Society Te Aparangi|Peer Community in Ecology|National Research Council|Evolutionary Applications|European Research Council|Environmental Funds|EFs|Biodiversity Data Journal|Biodiversity Data Journal|Royal Society Publishing|Dryad Digital Repository|Digital Editorial Office|Data Distribution Centre|Comparative Cytogenetics|Comparative Cytogenetics|American Biology Teacher|University of Melbourne|Public Research Centers|International Data Week|Ecological Applications|Ecological Applications|Center for Open Science|Biological Conservation|African Journals OnLine|African Journals OnLine|Wellcome Genome Campus|Research Data Alliance|Kaiser Wilhelm Society|Helmholtz-Gemeinschaft|Deutscher Wetterdienst|BirdLife international|Swedish Energy Agency|Social Service Review|Senator Claude Pepper|Ministry of Education|Institute of Medicine|Helmholtz Association|Helmholtz Association|Global Change Biology|Ecology and Evolution|DNA Databank of Japan|Congress of the Union|Bioline International|Bioline|Australian Government|ARC Discovery Program|Research Data Canada|Conservation Letters|Conservation Biology|Brazilian Federation|Big Garden Birdwatch|Albatross Task Force|Resilience Alliance|Nature Conservation|Nature Conservation|Marine Data Archive|European Commission|European Commission|Environmental Funds|Environmental Funds|Ecology and Society|Clarivate Analytics|American Naturalist|Russian Federation|Publication Ethics|Max Planck Society|Max Planck Society|Give Nature a Home|Free Our Knowledge|Fraunhofer Society|Peer Community In|Harvard Dataverse|Evolution Letters|Ecology & Society|CSIR South Africa|Bertha Benz Prize|United Utilities|Carl Benz House|NRF South Africa|Nature Portfolio|Helmholtz Senate|Ecology Letters|Daimler-Benz AG|CSIRO Australia|Colorado alpine|BioOne Complete|BioOne|HAMAGUCHI Plan|Gray's Anatomy|Biogeosciences|Annual Reviews|ZSL Whipsnade|ScienceDirect|ScienceDirect|Royal Society|Research4Life|PCI Evol Biol|Mexican State|GCB Bioenergy|Cell Symposia|Bose-Einstein|Plos Biology|Humboldtians|Humboldt|Horizon 2020|Google Drive|Future Earth|Biogeography|WDC-Climate|the Academy|Kichstarter|Humboldtian|FOSTER Plus|FAIRsharing|ELIXIR Node|cOAlition S|ZSL London|SciDataCon|Max Planck|Figure 360|EcoEvoRxiv|Daimler AG|CU-Boulder|Cell Press|Africa OSH|Sea World|PhytoKeys|NRC Egypt|MOE China|Frontiers|Evolution|Elseviere|CiteScore|Wellcome|rOpenSci|PCI Ecol|OpenAIRE|CU-Boulder |Neobiota|NeoBiota|MycoKeys|HUPO PSI|Figshare|EMBL-EBI|Elsevier|DataCite|ZooKeys|RESTful|Redalyc|Pensoft|FORCE11|Figshare|figshare|Ecology|Dropbox|DataONE|Conacyt|COMBINE|bioRxiv|AmeliCA|Zenodo|Plan S|Lancet|Gitlab|GitLab|Git|FORMAS|CoData|CODATA|Wiley|PeerJ|Inter|eLife|Dryad|Coko|CNPq|Cell |Hinari|Pronaces|Cnr|Vinnova|Minerva|uGREAT|Benz|GitHub|protocols.io|Andrea Stephens|Mtauranga|Metacat|ELIXIR|VSNU and the UKB|Springer|Nikau Consultancy|Aspiration", "", corpus_df$txt, useBytes = TRUE)

# removing all names (part 4)
corpus_df$txt <- gsub("Washington Watch|BioScience|Eye on Education|AIBS Bulletin|Dr. Francesca Dominici|PeerJ – the Journal of Life & Environmental Sciences (PeerJ)|PeerJ Computer Science|PeerJ Physical Chemistry|PeerJ Organic Chemistry|PeerJ Inorganic Chemistry|PeerJ Analytical Chemistry and PeerJ Materials Science", "", corpus_df$txt, useBytes = TRUE)

# removing words related to the locations and names
corpus_df$txt <- gsub("Global South|Global North|New Zealanders|New Zelanders|New Zeland|New Zealand|Great Britain|North America|Eastern Europe|South America|South africans|South africa|Eastern Europe|ARPHA Platform|Woods Hole Oceanographic Institution|US JGOFS|US GLOBEC|NSF Geosciences Directorate (GEO) Division of Ocean Sciences (OCE) Biological and Chemical Oceanography Sections, Division of Polar Programs (PLR) Antarctic Sciences (ANT) Organisms & Ecosystems, and Arctic Sciences (ARC) awards|(DACST)|(CSD)|(FRD)|GBIF.org","",corpus_df$txt, useBytes = TRUE)

# removing abbreviations and other missed words
corpus_df$txt <- gsub("(CREDIT)|BCO-DMO|CONICYT|NEOBIOTA|INSTAAR|COPDESS|CLOCKSS|CoESRA|CAASM|AADC|CONZUL|EMPSEB|SHaRED|SORTEE|SEARCH|SANBI|SPARC|INSTAAR|UNESCO|APEC|AOASG|ARPHA|NCEAS|ICPSR|IMPRS|CMIP5|JDAP|CERN|MBMG|INASP|NSERC|GOALI|AIRA|AJOL|APIs|EMBL|AIBS|CAUL|CRIA|DOAJ|ICBB|ESEB|GBIF|K-12|NCBI|NCGP|NERC|IPCC|CNRS|CSIC|CSIR|BEIS|OARE|HSRC|PLOS|AAAR|USGS|NCAR|NOAA|NEON|ARDI|RSPB|DDBJ|INSDC|INSD|STAR|TERN|TREE|UTSC|UKRI|ARC|BES|SSE|COS|CAS|CTFs|DDI|EPT|ERC|ERA|JST|KNB|NRF|DFG|MDA|NIH|NLM|NRC|NRF|OSF|SCB|OSH|OAI|OCE|PCB|PCI|RDA|GCB|RDC|NSF|BGD|BMC|BHAG|ESA|ZSL|SPP|RCC|RMB|TRL|API|ARC|PLR|DDC|DKRZ|DWD|DVCS|NAE|NAM|EBI|ANR|API|NAS|ASN|NSF|OCE|ANT|UIs|API|EiC|TEE|UCL|SDGs|PIA|CL|RA|RS|STI|SNI|BG|U.K.|U.S.|EC|SC|CU|R&D|Eos|EIDs","",corpus_df$txt, useBytes = TRUE)

# removing numbers 
corpus_df$txt <- gsub("[0-9]+","",corpus_df$txt, useBytes = TRUE)

# removing "'s"
corpus_df$txt <- gsub("'s","",corpus_df$txt, useBytes = TRUE)

# Replace [^a-zA-Z0-9 -] with an empty string.

corpus_df$txt <- gsub("[^a-zA-Z0-9 -]", "",corpus_df$txt, useBytes = TRUE)

Tokenisation

Each statements sentences tokenised. Stop words identified and removed

# Tokenisation - creating a tidy text: it convert tokens to lowercase, removes punctuation
# Starting with tokenizing text into sentences:

corpus_df$txt_copy <- corpus_df$txt
# library(stringi)
# corpus_df$txt_copy <- stri_enc_toutf8(corpus_df$txt)


data_tidy_sentences <- corpus_df %>%  
  unnest_tokens(sentence, txt_copy, token = "sentences")

data_tidy_sentences <- data_tidy_sentences %>% group_by(name) %>% mutate(sentence_id = row_number())

data_tidy_sentences$sentence_doc <- paste0(data_tidy_sentences$name, "_", data_tidy_sentences$sentence_id)

colnames(data_tidy_sentences)

[1] "txt"          "filename"     "name"         "doc_type"     "stakeholder" 
[6] "sentence"     "sentence_id"  "sentence_doc"

data_tidy_sentences <- as.data.frame(data_tidy_sentences)
data_tidy <- data_tidy_sentences %>%  
  # mutate(as.character(sentence)) %>%
  unnest_tokens(word, sentence, token = "words" )  %>%
  select(-sentence_id)


# Removal of stop-words: check the lexicons in stop_words, create a list of my stop words like: numbering (ii, iii, iv, v), name of document type (aim, aims, mission...), name of the stakeholders (erc, nerc, wellcome)
# onix lexicon contains words like "open", "opened" and so on, I decided to remove this lexicon from the analysis

my_stop_words <- stop_words %>%
  filter(!grepl("onix", lexicon))

# removing other words (names of stakeholders, types of documents, months,  abbreviations and not meaning anything)

my_stop_words <- bind_rows(data_frame(word = c("e.g", "i.e", "ii", "iii", "iv", "v", "vi", "vii", "ix", "x", "", "missions", "mission", "aims", "aimed", "aim", "values", "value", "vision", "about", "publisher", "funder", "society", "journal", "repository", "deutsche", "january", "febuary", "march", "april", "may", "june", "july", "august", "september", "october", "november", "december", "jan", "feb", "mar", "apr", "jun", "jul", "aug", "sep", "sept", "oct", "nov", "dec", "australasian", "australians", "australian", "australia", "latin", "america", "cameroon", "yaoundé", "berlin", "baden", "london", "whipsnade", "san", "francisco", "britain", "european", "europe", "malawi", "sweden", "florida", "shanghai", "argentina", "india", "florida", "luxembourg", "italy", "canadians", "canadian", "canada", "spanish", "spain", "france", "french", "antarctica", "antarctic", "paris", "cambridge", "harvard", "russian", "russia", "chicago", "colorado", "africans", "african", "africa", "japan", "japanese", "brazil", "zelanders", "zeland", "mori", "aotearoa", "american", "america", "australasia", "hamburg", "netherlands", "berlin", "china", "chinese", "brazil", "mexico", "germany", "german", "ladenburg", "baden", "potsdam", "platz", "oxford", "berlin", "asia", "budapest", "taiwan", "chile", "putonghua", "hong", "kong","helmholtz", "bremen", "copenhagen", "stuttgart", "hinxton", "mātauranga", "māori", "yaound", "egypt", "uk", "usa", "eu", "st", "miraikan", "makao", "billion", "billions", "eight", "eighteen", "eighty", "eleven", "fifteen", "fifty", "five", "forty", "four", "fourteen", "hundreds", "million", "millions", "nine", "nineteen", "ninety", "one", "ones", "seven", "seventeen", "seventy", "six", "sixteen", "sixty", "ten", "tens", "thirteen", "thirty", "thousand", "thousands", "three", "twelve", "twenty", "two", "iccb", "ca"), lexicon = c("custom")), my_stop_words)

data_tidy <- data_tidy %>%
 anti_join(my_stop_words)

Joining with `by = join_by(word)`

# lemmatizing using lemma table
token_words <- tokens(data_tidy$word, remove_punct = TRUE)
tw_out <- tokens_replace(token_words,
               pattern = lexicon::hash_lemmas$token,
               replacement = lexicon::hash_lemmas$lemma)
tw_out_df<- as.data.frame(unlist(tw_out))
data_tidy <- cbind(data_tidy, tw_out_df$"unlist(tw_out)")

colnames(data_tidy)[which(names(data_tidy) == "word")] <- "orig_word"
colnames(data_tidy)[which(names(data_tidy) == "tw_out_df$\"unlist(tw_out)\"")] <- "word_mix"

# changing American English to British English
ukus_out <- tokens(data_tidy$word_mix, remove_punct = TRUE)
ukus_out <- quanteda::tokens_lookup(ukus_out, data_dictionary_us2uk, exclusive = FALSE, capkeys = FALSE)
ukus_df <- as.data.frame(unlist(ukus_out))
data_tidy <- cbind(data_tidy, ukus_df$"unlist(ukus_out)")
colnames(data_tidy)[which(names(data_tidy) == "ukus_df$\"unlist(ukus_out)\"")] <- "word"

Adding metadata

Creating a column that will include info about OA and nonOA journals or publisher for profit and non-profit

data_words <- data_tidy

# Creating a column that will include info about OA and nonOA journals or publisher for profit and non-profit
data_words$org_subgroups <- data_words$stakeholder
data_words$stakeholder[data_words$stakeholder%in% c("journals_OA", "journals_nonOA" )] <- "journals"
data_words$stakeholder[data_words$stakeholder%in% c("publishers_Profit", "publishers_nonProfit" )] <- "publishers"

Stakeholder descriptives

Information and a table with the number of documents per stakeholder and list of documents

# Number of documents per stakeholder
number_of_documents <- data_tidy %>% 
  select(name, stakeholder) %>% 
  distinct(name, .keep_all = TRUE) %>% 
  group_by(stakeholder) %>% 
  count(stakeholder)

# Table with a number of documents per stakeholder group
number_of_documents %>% 
  kbl(caption = "Number of documents per stakeholder group") %>% 
  kable_classic("hover", full_width = F)

Number of documents per stakeholder group
stakeholder	n
advocates	24
funders	30
journals_OA	14
journals_nonOA	16
publishers_Profit	6
publishers_nonProfit	9
repositories	17
societies	13

# Creating a table with a source links of the statements

info <- corpus_df_website_info %>% 
  select(txt, filename, name, stakeholder)

info$stakeholder_more <- info$stakeholder

info$stakeholder[info$stakeholder%in% c("journals_OA", "journals_nonOA" )] <- "journals"
info$stakeholder[info$stakeholder%in% c("publishers_Profit", "publishers_nonProfit" )] <- "publishers"

# source links of the websites
source_website <- info$website <- word(info$txt, 1)

website_info_table <- info %>% 
  select(stakeholder, website)

website_info_table %>% 
  kbl(caption = "Source websites of the statements") %>% 
  kable_paper("hover", full_width = F)

Source websites of the statements
stakeholder	website
advocates	http://africaosh.com/about/
advocates	http://amelica.org/index.php/en/about/#que-es
advocates	http://www.bioline.org.br/info?id=bioline&doc=about
advocates	https://www.cos.io/about/mission
advocates	https://www.coalition-s.org/about/
advocates	https://codata.org/about-codata/
advocates	https://coko.foundation/about/
advocates	https://copdess.org/home/about-copdess/
advocates	https://datacite.org/value.html
advocates	https://doaj.org/about/
advocates	https://fairsharing.org/stakeholders/
advocates	https://www.force11.org/about/mission-and-guiding-principles
advocates	https://www.fosteropenscience.eu/about#objectives
advocates	https://freeourknowledge.org/about/
advocates	https://about.gitlab.com/company/
advocates	https://www.jisc.ac.uk/about/corporate/strategy
advocates	https://oaaustralasia.org/about/
advocates	https://peercommunityin.org/
advocates	https://www.rd-alliance.org/about-rda
advocates	https://www.cria.org.br/index
advocates	https://www.rdc-drc.ca/about-us/
advocates	https://www.research4life.org/about/
advocates	https://ropensci.org/about/
advocates	https://sparcopen.org/who-we-are/
funders	https://www.humboldt-foundation.de/en/explore/about-the-humboldt-foundation/strategy-of-the-alexander-von-humboldt-foundation
funders	https://www.arc.gov.au/about-arc
funders	https://english.cas.cn/about_us/
funders	https://www.gov.br/mcti/pt-br/composicao/rede-mcti/conselho-nacional-de-desenvolvimento-cientifico-e-tecnologico
funders	https://conacyt.mx/conacyt/que-es-el-conacyt/
funders	https://www.conicyt.cl/sobre-conicyt/que-es-conicyt/
funders	https://www.cafeconsortium.org/about/
funders	https://www.gov.br/capes/pt-br/acesso-a-informacao/institucional/historia-e-missao
funders	https://www.csir.co.za/csir-brief
funders	https://www.dfg.de/en/dfg_profile/mission/index.html
funders	https://erc.europa.eu/about-erc/mission
funders	https://formas.se/en/start-page/about-formas/what-we-do.html
funders	https://www.cnrs.fr/en/cnrs
funders	https://www.helmholtz.de/en/about-us/
funders	https://www.jst.go.jp/EN/about/overview.html
funders	https://www.mpg.de/short-portrait
funders	http://en.moe.gov.cn/about_MOE/what_we_do/
funders	http://www.nsfc.gov.cn/english/site_1/about/6.html
funders	https://www.cnr.it/en/about-us
funders	https://www.nsf.gov/about/glance.jsp
funders	https://nerc.ukri.org/about/
funders	https://www.nrc.sci.eg/mission/
funders	https://www.nrf.ac.za/about-us/
funders	https://www.nserc-crsng.gc.ca/NSERC-CRSNG/Index_eng.asp
funders	http://www.ras.ru/about/rascharter/tasks.aspx
funders	https://seaworld.com.au/conservation/our-mission
funders	https://www.csic.es/en/csic/about-csic/mission
funders	https://www.daimler-benz-stiftung.de/cms/en/foundation/about.html
funders	https://anr.fr/en/anrs-role-in-research/missions/
funders	https://wellcome.org/who-we-are/strategy
journals	https://www.journals.uchicago.edu/journals/an/about
journals	https://www.annualreviews.org/journal/ecolsys
journals	https://www.sciencedirect.com/journal/biological-conservation/about/aims-and-scope
journals	https://academic.oup.com/bioscience/pages/About
journals	https://conbio.org/publications/conservation-biology/
journals	https://esajournals.onlinelibrary.wiley.com/hub/journal/19395582/aims-and-scope/read-full-aims-and-scope
journals	https://onlinelibrary.wiley.com/page/journal/14610248/homepage/productinformation.html
journals	https://esajournals.onlinelibrary.wiley.com/hub/journal/19399170/aims-and-scope/read-full-aims-and-scope
journals	https://onlinelibrary.wiley.com/page/journal/15585646/homepage/forauthors.html#2
journals	https://esajournals.onlinelibrary.wiley.com/hub/journal/15409309/aims-and-scope/read-full-aims-and-scope
journals	https://onlinelibrary.wiley.com/page/journal/13652486/homepage/productinformation.html
journals	https://besjournals.onlinelibrary.wiley.com/hub/journal/13652664/aims-and-scope/read-full-aims-and-scope
journals	https://www.nature.com/natecolevol/about/aims
journals	https://royalsocietypublishing.org/rstb/about#question2
journals	https://royalsocietypublishing.org/rspb/about
journals	https://www.cell.com/trends/ecology-evolution/aims
journals	https://www.tandfonline.com/action/journalInformation?show=aimsScope&journalCode=uaar20
journals	https://www.biogeosciences.net/about/aims_and_scope.html
journals	https://conbio.onlinelibrary.wiley.com/hub/journal/1755263x/homepage/forauthors.html
journals	https://onlinelibrary.wiley.com/page/journal/14724642/homepage/productinformation.html
journals	https://onlinelibrary.wiley.com/page/journal/20457758/homepage/productinformation.html
journals	https://www.ecologyandsociety.org/about/policies.php#focus
journals	https://elifesciences.org/about/aims-scope
journals	https://onlinelibrary.wiley.com/page/journal/20563744/homepage/productinformation.html
journals	https://onlinelibrary.wiley.com/page/journal/17524571/homepage/productinformation.html
journals	https://www.frontiersin.org/journals/ecology-and-evolution#about
journals	https://neobiota.pensoft.net/about
journals	https://peerj.com/about/aims-and-scope/
journals	https://journals.plos.org/plosbiology/s/journal-information
journals	https://zslpublications.onlinelibrary.wiley.com/hub/journal/20563485/aims-scopes
publishers	https://www.aibs.org/
publishers	https://www.annualreviews.org/about/what-we-do
publishers	http://www.bioonepublishing.org/who-we-are/
publishers	https://elifesciences.org/about
publishers	https://www.frontiersin.org/about/about-frontiers
publishers	https://www.resalliance.org/about
publishers	https://royalsociety.org/journals/
publishers	https://www.journals.uchicago.edu/about
publishers	https://www.cell.com/about
publishers	https://www.elsevier.com/about/this-is-elsevier
publishers	https://peerj.com/about/
publishers	https://pensoft.net/about#Company-profile
publishers	https://plos.org/about/
publishers	https://www.springernature.com/gp/advancing-discovery
publishers	https://www.wiley.com/en-gb/aboutus
repositories	https://data.aad.gov.au/about
repositories	https://www.bco-dmo.org/
repositories	https://www.biorxiv.org/about-biorxiv
repositories	https://www.ddbj.nig.ac.jp/about/index-e.html
repositories	https://datadryad.org/stash/our_mission#community
repositories	https://www.ecoevorxiv.com/
repositories	https://www.ebi.ac.uk/about
repositories	https://help.figshare.com/article/what-is-figshare
repositories	https://www.gbif.org/what-is-gbif
repositories	https://support.dataverse.harvard.edu/
repositories	https://knb.ecoinformatics.org/about
repositories	https://marinedataarchive.org/introduction.php
repositories	https://www.ncbi.nlm.nih.gov/home/about/mission/
repositories	https://www.cos.io/products/osf
repositories	https://www.tern.org.au/
repositories	https://www.dwd.de/DWD-GCOS/EN/nationalcontributions/servicesforgcos/intdatacentres/wdc-climate/wdc-climate_node.html
repositories	https://about.zenodo.org/
societies	https://www.amnat.org/about/about-the-society.html
societies	http://ausevo.com/
societies	https://www.britishecologicalsociety.org/about/strategic-aims/
societies	https://www.esa.org/about/
societies	https://www.ecolsoc.org.au/about/
societies	https://eseb.org/
societies	http://www.nasonline.org/about-nas/mission/
societies	https://www.royalsociety.org.nz/who-we-are/our-role/
societies	http://www.evolutionsociety.org/index.php?module=content&type=user&func=view&pid=2
societies	https://www.sortee.org/
societies	https://royalsociety.org/about-us/mission-priorities/
societies	https://conbio.org/about-scb/who-we-are/
societies	https://www.zsl.org/about-us

Saving dataset

# This data will be used in 2_Topic_Modeling, 4_Language_analysis
write_csv(data_words, "./output/created_datasets/cleaned_data.csv")

Session information

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kableExtra_1.3.4          stm_1.3.6.1              
 [3] ggraph_2.1.0              igraph_1.5.1             
 [5] reshape2_1.4.4            wordcloud_2.6            
 [7] RColorBrewer_1.1-3        topicmodels_0.2-14       
 [9] tm_0.7-11                 NLP_0.2-1                
[11] quanteda.dictionaries_0.4 quanteda.textplots_0.94.3
[13] quanteda_3.3.1            tidytext_0.4.1           
[15] lubridate_1.9.3           forcats_1.0.0            
[17] stringr_1.5.0             dplyr_1.1.3              
[19] purrr_1.0.2               readr_2.1.4              
[21] tidyr_1.3.0               tibble_3.2.1             
[23] ggplot2_3.4.3             tidyverse_2.0.0          
[25] workflowr_1.7.1          

loaded via a namespace (and not attached):
 [1] gridExtra_2.3      rlang_1.1.1        magrittr_2.0.3     git2r_0.32.0      
 [5] compiler_4.3.1     getPass_0.2-2      systemfonts_1.0.4  callr_3.7.3       
 [9] vctrs_0.6.3        rvest_1.0.3        pkgconfig_2.0.3    crayon_1.5.2      
[13] fastmap_1.1.1      utf8_1.2.3         promises_1.2.1     rmarkdown_2.25    
[17] tzdb_0.4.0         ps_1.7.5           bit_4.0.5          xfun_0.40         
[21] modeltools_0.2-23  cachem_1.0.8       jsonlite_1.8.7     highr_0.10        
[25] SnowballC_0.7.1    later_1.3.1        tweenr_2.0.2       syuzhet_1.0.7     
[29] parallel_4.3.1     stopwords_2.3      R6_2.5.1           bslib_0.5.1       
[33] stringi_1.7.12     jquerylib_0.1.4    Rcpp_1.0.11        knitr_1.44        
[37] httpuv_1.6.11      Matrix_1.5-4.1     timechange_0.2.0   tidyselect_1.2.0  
[41] rstudioapi_0.15.0  yaml_2.3.7         viridis_0.6.4      processx_3.8.2    
[45] lattice_0.21-8     plyr_1.8.9         withr_2.5.1        evaluate_0.21     
[49] RcppParallel_5.1.7 polyclip_1.10-6    xml2_1.3.5         pillar_1.9.0      
[53] lexicon_1.2.1      janeaustenr_1.0.0  whisker_0.4.1      stats4_4.3.1      
[57] generics_0.1.3     vroom_1.6.3        rprojroot_2.0.3    hms_1.1.3         
[61] munsell_0.5.0      scales_1.2.1       glue_1.6.2         slam_0.1-50       
[65] tools_4.3.1        data.table_1.14.8  tokenizers_0.3.0   webshot_0.5.5     
[69] fs_1.6.3           graphlayouts_1.0.2 fastmatch_1.1-4    tidygraph_1.2.3   
[73] grid_4.3.1         colorspace_2.1-0   ggforce_0.4.1      cli_3.6.1         
[77] fansi_1.0.4        viridisLite_0.4.2  svglite_2.1.2      gtable_0.3.4      
[81] sass_0.4.7         digest_0.6.33      ggrepel_0.9.4      farver_2.1.1      
[85] htmltools_0.5.6    lifecycle_1.0.3    httr_1.4.7         bit64_4.0.5       
[89] MASS_7.3-60

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] kableExtra_1.3.4          stm_1.3.6.1              
 [3] ggraph_2.1.0              igraph_1.5.1             
 [5] reshape2_1.4.4            wordcloud_2.6            
 [7] RColorBrewer_1.1-3        topicmodels_0.2-14       
 [9] tm_0.7-11                 NLP_0.2-1                
[11] quanteda.dictionaries_0.4 quanteda.textplots_0.94.3
[13] quanteda_3.3.1            tidytext_0.4.1           
[15] lubridate_1.9.3           forcats_1.0.0            
[17] stringr_1.5.0             dplyr_1.1.3              
[19] purrr_1.0.2               readr_2.1.4              
[21] tidyr_1.3.0               tibble_3.2.1             
[23] ggplot2_3.4.3             tidyverse_2.0.0          
[25] workflowr_1.7.1          

loaded via a namespace (and not attached):
 [1] gridExtra_2.3      rlang_1.1.1        magrittr_2.0.3     git2r_0.32.0      
 [5] compiler_4.3.1     getPass_0.2-2      systemfonts_1.0.4  callr_3.7.3       
 [9] vctrs_0.6.3        rvest_1.0.3        pkgconfig_2.0.3    crayon_1.5.2      
[13] fastmap_1.1.1      utf8_1.2.3         promises_1.2.1     rmarkdown_2.25    
[17] tzdb_0.4.0         ps_1.7.5           bit_4.0.5          xfun_0.40         
[21] modeltools_0.2-23  cachem_1.0.8       jsonlite_1.8.7     highr_0.10        
[25] SnowballC_0.7.1    later_1.3.1        tweenr_2.0.2       syuzhet_1.0.7     
[29] parallel_4.3.1     stopwords_2.3      R6_2.5.1           bslib_0.5.1       
[33] stringi_1.7.12     jquerylib_0.1.4    Rcpp_1.0.11        knitr_1.44        
[37] httpuv_1.6.11      Matrix_1.5-4.1     timechange_0.2.0   tidyselect_1.2.0  
[41] rstudioapi_0.15.0  yaml_2.3.7         viridis_0.6.4      processx_3.8.2    
[45] lattice_0.21-8     plyr_1.8.9         withr_2.5.1        evaluate_0.21     
[49] RcppParallel_5.1.7 polyclip_1.10-6    xml2_1.3.5         pillar_1.9.0      
[53] lexicon_1.2.1      janeaustenr_1.0.0  whisker_0.4.1      stats4_4.3.1      
[57] generics_0.1.3     vroom_1.6.3        rprojroot_2.0.3    hms_1.1.3         
[61] munsell_0.5.0      scales_1.2.1       glue_1.6.2         slam_0.1-50       
[65] tools_4.3.1        data.table_1.14.8  tokenizers_0.3.0   webshot_0.5.5     
[69] fs_1.6.3           graphlayouts_1.0.2 fastmatch_1.1-4    tidygraph_1.2.3   
[73] grid_4.3.1         colorspace_2.1-0   ggforce_0.4.1      cli_3.6.1         
[77] fansi_1.0.4        viridisLite_0.4.2  svglite_2.1.2      gtable_0.3.4      
[81] sass_0.4.7         digest_0.6.33      ggrepel_0.9.4      farver_2.1.1      
[85] htmltools_0.5.6    lifecycle_1.0.3    httr_1.4.7         bit64_4.0.5       
[89] MASS_7.3-60

1a_Data_preprocessing

ZZ, APB, TFJ

2023-12-20