More on ONCOgen: What Is Context-Free Grammar?

This post contains more details about the inner workings of ONCOgen, a software program that can generate New England Journal of Medicine-formatted clinical trials. You can read more on the ONCOgen page or at the first post in this series.

When you pick up a scientific paper the expectation is that a certain, specific set of information is included in the manuscript that allows the reader to understand why the experiment was performed, what experiment was performed, what the results of the experiment were, and how they were analyzed. While we expect the authors to provide their interpretation of the results and their significance, we as the readers rightfully expect to have enough information to judge the results for ourselves.

In broad strokes, the information in every clinical trial comes through in four major sections:

  • Background & Introduction
  • Methods
  • Results
  • Discussion

Each of these can be subdivided further. For example, a Methods section may include any or all of the items below:

  • Patient Population
  • Study Design
  • End points
  • Assessments
  • Statistical Analysis
  • Study Oversight

While each section might be worded a little differently or subsections might be more or less inclusive,  everything should be there. You can keep breaking this down further: a sentence or two about inclusion criteria; a sentence about the a priori significance level; and a few sentences describing the target disease with lab values or specific biomarker requirements.

For the purposes of writing a fake paper, the high-level structure of a paper seems like an easy target; even the paragraph level seems pretty straight-forward. But there are practically a million ways to say, well, stuff. How do you make semi-convincing fake verbiage?

Well, we have to zoom in a little more for that.

Continue reading More on ONCOgen: What Is Context-Free Grammar?