R

Interchanging RMarkdown and "spinnable" R

Dean Attali wrote this nice post a few years ago describing knitr’s spin function. This function allows a regular R file, with comments written with the roxygen2-style comment tag #’ to be rendered as an HTML document with the comments rendered as text and the results of the R code rendered in place, much as a RMarkdown document would. The basic rules for this are (from Dean’s post): Any line beginning with #’ is treated as a markdown directive (#’ # title will be a header, #’ some bold text results in some bold text) Any line beginning with #+ is parsed as code chunk options Run knitr::spin on the file In effect, this “spinnable” R script is the complement of a RMarkdown document with respect to format, since the RMarkdown document is primarily a text (Markdown) document with code chunks, and the R script is primarily a code document with text chunks.

Modifying Excel Files using openxlsx

I’ve been creating several output tables for a paper, which I usually store as sheets in an Excel file, since my collaborators are entirely in the Microsoft Office ecosystem. One issue I often run into is having to modify a single sheet in that file with updated data, while keeping the rest of the file intact. This is necessary since I’ve perhaps done some custom formatting in Excel on some of the tables, and I don’t want to re-format them everytime I modify a single sheet.

Bootstrapping clustered data

When evaluating the sampling variability of different statistics, I’ll often use the bootstrap procedure to resample my data, compute the statistic on each sample, and look at the distribution of the statistic over several bootstrap samples. In principle, the bootstrap is straightforward to do. However, if you have correlated data (like repeated measures or longitudinal data or circular data), the unit of sampling no longer is the particular data point but the second-level unit within which the data are correlated; otherwise you break the correlation structure of the data by doing a naive bootstrap and distort the resultant distributions.

Joint Statistical Meetings Talk

The Joint Statistical Meetings are being held in the beautiful city of Vancouver, British Columbia this year. I gave a talk on data visualization this year, which is a new one for me, but an area I’m quite excited about. I’ve been looking into the newer toolsets using Javascript graphics for a while now for different projects, and this talk gave me a chance to coalesce my thoughts about data visualization principles and usage, and how the new toolsets can help.

Cleaning up tables

Context One of things I have to do quite often is create tables for papers and presentations. Often the “Table 1” of a paper has descriptives about the study, broken down by subgroups. For presentation purposes, it doesn’t look good (to me, at least) that the name of each subgroup be repeated down one column of the table. One way to deal with this is, of course, by hand. Save the table as a CSV or Excel file, open it up in your favorite spreadsheet program, and prettify things.

Tidying messy Excel data (tidyxl)

Well, here’s what I was dealing with: (You can download this dataset for your playtime here) Notice that we have 3 header rows, first with patient IDs, second with spine region, and third with variable names (A and B, to protect the innocent). Goal A dataset that, for each patient and each angle gives us corresponding values of A and B. So this would be a four-column data set with ID, angle, A and B.

Tidying messy Excel data (Introduction)

Personal expressiveness, or how data is stored in a spreadsheet When you get data from a broad research community, the variability in how that data is formatted and stored is truly astonishing. Of course there are the standardized formats that are output from machines, like Next Generation Sequencing and other automated systems. That is a saving grace! But for smaller data, or data collected in the lab, the possibilities are truly endless!

Surprising result when exploring Rcpp gallery

I’m starting to incorporate more Rcpp in my R work, and so decided to spend some time exploring the Rcpp Gallery. One example by John Merrill caught my eye. He provides a C++ solution to transforming an list of lists into a data frame, and shows impressive speed savings compared to as.data.frame. This got me thinking about how I do this operation currently. I tend to rely on the do.call method.

Finding my Dropbox in R

I’ll often keep non-sensitive data on Dropbox so that I can access it on all my machines without gumming up git. I just wrote a small script to find the Dropbox location on each of my computers automatically. The crucial information is available here, from Dropbox. My small snippet of code is the following: if (Sys.info()[‘sysname’] == ‘Darwin’) { info <- RJSONIO::fromJSON( file.path(path.expand("~"),‘.dropbox’,‘info.json’)) } if (Sys.info()[‘sysname’] == ‘Windows’) { info <- RJSONIO::fromJSON( if (file.

Changing names in the tidyverse: An example for many regressions

A collaborator posed an interesting R question to me today. She wanted to do several regressions using different outcomes, with models being computed on different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then filter the strata based on p-value levels. This seems straighforward, right? Let’s set up a toy example: library(tidyverse) ## Warning: package 'dplyr' was built under R version 3.