Last month, I was busy in preparing a report for the Hong Kong Monetary Authority. Our team conducted a hypothetical e-HKD experiment on the HKUST campus. The pilot program concluded on September 22, with the report due by September 28. Hence, I started to write the basic structure of the RMD on early September. At the end, I have written more than 20,000 lines. I tried my best to define functions to reduce the length of the code. Even so, the RMD file is still more than 7,000 lines long in the end.
Professor Kohei tasked me with writing the report using RMD. This was a first for me, as I had never written an RMD report from scratch before. My usual practice involves coding in Java or Python and writing reports in Latex. However, once I finished the report, I recognized the merits of RMD. It can even generate a Word document, a feature Latex lacks. This is particularly valuable today, as some top-tier journals continue to require submissions in Word format, which has presented challenges for me as a Latex user.
Another aspect of this experience I found worthy of appreciation was Kohei’s strict coding requirements. His “coding rules” can actually provide elegant and clear code structures. Adhering to these rules has notably improved the quality of my code. Even though it requires additional effort to comply, it is a beneficial practice for maintaining code readability.
Thanks to this experience and all research team members, I have gained a deeper understanding of a lot of things.
Here are a few takeaways from my experience using RMD:
- Use
bookdownto number sections. This can be achieved with the following code:documentclass: book output: bookdown::html_document2: default - Use BrBG colors to represent contrasting features.
- Use
table01to number an R code area. The compiler will automatically number figures and tables. Note that the ‘01’ in the name doesn’t affect the order of final report. It is just a name. - Commonly used packages include
dplyr,modelsummary,scales,kableExtra,ggplot2, andforeach. - When compiling into a Word document, ensure to add
options(kableExtra.auto_format = FALSE). stargazerprovides a good format for multinomial probit models. However, be aware that the resulting tables may not be compatible with Word documents.- Defining functions is essential. However, remember to use
!!sym(variable)when referring to column names. - In the main text, try to use r code to retrieve values rather than manually inputting them.
- Current AI tools, such as ChatGPT, can greatly enhance coding efficiency. Make sure to fully utilize them.
- When using
factor()to assign value orders, remember to usearrangeto sort the variable later. This process also applies when ordering labels in figures. - Use
bind_rowsandbind_colsto merge dataframes. - Use
case_whento conditionally replace values. - Use
mutateto create new variables. - Use
group_byandsummarizeto calculate summary statistics. - Use
modelsummaryto generate regression tables. - Use
scale_fill_manualto assign colors to bars in a bar chart. - Use
geom_textto add labels to a bar chart. - Use
geom_colto generate a horizontal bar chart.
The basic structure of data analysis should follow:
- load the data
- clean the data
- remove missing values
- remove outliers
- lower case column names
- transform the data
- merge data
- create new variables
- analyze the data
- visualize the data
Then I post some style codes for the tables and figures.
Table:
table_dataframe%>%
kable(
caption =
"Title",
col.names =
c(
"col_name",
"col_name",
"col_name"
),
align = "ccc"
) %>%
kable_styling() %>%
add_footnote(
"footnote"
)
Horizontal bar chart:
%>%
ggplot(
aes(
x = var_x,
y = proportion,
fill = var_fill,
label = percent(
ifelse(
proportion < .04,
NA,
proportion)
)
)
) +
geom_col(
aes(
fill = var_fill
),
width = 0.7,
position =
position_fill(
reverse = TRUE
)
) +
coord_flip() +
theme_classic() +
geom_text(
position = position_stack(
vjust = 0.5,
reverse = TRUE),
size=2.5) +
scale_fill_manual(
values=c(
'#01665e',
'#35978f',
"#80cdc1",
"#c7eae5",
'#f5f5f5'
)
)