NEJM’s Disappointing Decision to Publish the Boston School Mask Study

A point-by-point analysis of Cowger TL, et al.

Nov 12, 2022

This week, the New England Journal of Medicine published an observational study claiming that lifting mask mandates in schools in Massachusetts was associated with increased COVID-19 rates.

First, even if these results were true, they have no applicability in the present. Seroprevalence has rocketed past 90%. Most kids have already had COVID. Ergo, they no longer need protection from COVID (which they will get again, eventually). Whether mask mandates avert RSV or other viruses is speculation, and contradicted by a mountain of pre-pandemic data.

Second, these results aren’t true— they are unreliable, as detailed in this post by Tracy Beth Hoeg. She is going to walk you through the technical issues, including the crux of the matter: difference in difference analysis cannot handle time-varying confounders, and time-varying confounders are guaranteed when mask mandates fall.

Third, observational data will never settle mask debates. In recent years, a number of groups have asked what happens if the same dataset is approached with different analytic plans. What happens if you give the same data to multiple research teams, or if you simulate many analytic plans. These studies show observational studies can give a range of results, in many cases, they can conclude an intervention is harmful or helpful depending on the plan.

Masking kids is a divisive issue. Our friends across the pond do not do it. They do not like doing it to kids under 12. And they will never do it to kids under five. Our country is full of pro-mask zealots. They love masking kids. In this case, some of the authors of the NEJM paper have repeatedly advocated for the policy of masking kids in school.

What am I to think? When you give someone a question that has tons of analytic flexibility, and they have already said they vehemently support the policy. Their analysis delivers what they promised.

Is this science?

Fourth, the discussion of the NEJM paper says, “structural racism is embedded in public policies and that policy decisions have the potential to rectify or reproduce health inequities.” The authors frame masks as a tool to lower structural racism.

Whether masking kids slows respiratory viruses is a scientific question, and it is dangerous to turn this into a proxy for political debates. I believe it is a mistake to weaponize structural racism to support masking kids.

If masks slow COVID19 and improve long term outcomes, then they might help minority kids more than the majority kids. But if masks don't actually improve long term outcomes-- because all kids get COVID anyway, or because masks don’t slow spread— then masking kids disproportionately hurts poor minority kids. This study actually shows American society was happy to mask minority kids longer.

During the pandemic, poor people had to mask more than rich people. I mask on the public bus, but not when I drive my car. Workers mask when they serve you at a restaurant, while you enjoy life unmasked.

Let's not convert a scientific question about whether masks slow viral spread into a referendum about who cares more about structural racism or socioeconomic disparities. Evoking this rhetoric is inappropriate for the New England Journal of Medicine. It is a shame editors allowed it.

I suspect there is a reason why masking proponents want to tie their policy to structural racism. It allows them to claim their opponents aren't just scientists who questions their methods, they are bad people. We cannot turn masking kids, and COVID-19 into a proxy war for political issues. We can’t make a scientific question a moral one just to shut down debate.

Vinay Prasad MD MPH

The New England Journal’s Disappointing Decision to Publish the Boston School Mask Study

By Tracy Beth Høeg, MD, PhD

I was surprised to see an observational study on school mask mandates, which was problematic as preprint, published in the New England Journal of Medicine.

Why did NEJM publish this paper, which has many issues I will discuss below, when it runs counter to randomized data, and a nicely done regression discontinuity study from Spain? It also runs counter to a huge body of pre-pandemic randomized data finding limited to no effectiveness of cloth, surgical or N95 masks against influenza.

Proponents of the study, “Lifting Universal Masking in Schools — Covid-19 Incidence among Students and Staff” believe the authors can infer causation (of mask mandates being effective against COVID-19 cases) because they used the Difference in Difference technique. I am going to explain why I don’t think the necessary assumptions are met to infer causality and why observational studies of school interventions have been so challenging in general.

The study

This was a study of 72 public K-12 school districts in the greater Boston area during the 2021-2022 academic year including 294,084 students and 46,530 staff. They used a difference in difference analysis for staggered policy implementation to compare the COVID-19 incidence among districts that lifted or sustained mask mandates.

The districts that dropped their mask mandates (n=70), did so in a staggered fashion by week in Feb-March shown below in the blue colors and, those that didn’t (n=2) are shown in black.

The first thing that jumped out at me when I read the study was Figure 1, which curiously shows the case rates started to increase in the blue (unmasking) districts independent of when they dropped the mask mandate. This suggests there is at least one factor independent of masking leading to the rise in cases in the unmasking districts. The second thing that stood out from this figure was the district that dropped the mandates at the second time point had higher case rates post lifting of the mask mandate than the district that dropped them first despite these districts having indistinguishable case rates prior to lifting the mandates. There is a lack of dose-response effect here.

These two points make a convincing case that the difference between masked and unmasked districts cannot alone be attributed to the masks.

The populations in the masked and unmasked districts are also different.

The two districts that did not lift mask mandates were located in and around the metro area of Boston (black on this map) while the districts that lifted mandates were more suburban. As authors describe, “districts that chose to sustain masking requirements longer tended to have school buildings that were older and in worse condition and to have more students per classroom than districts that chose to lift masking requirements earlier. In addition, these districts had higher percentages of low-income students, students with disabilities, and students who were English-language learners, as well as higher percentages of Black and Latinx students and staff.”

There was also a pronounced difference in pediatric vaccination rates, with the district that unmasked second, having the highest city/town pediatric vaccination rate and the district that never lifted mask mandates, having the lowest pediatric vaccination rate.

The authors state, despite these differences, that causality can still be inferred from the masks because of their use of the Difference in Difference technique.

The difference in difference technique was originally used in econometrics as well and its usage dates back to the mid 19th century. It is also called the “controlled before and after study” and can be used to estimate the effectiveness of an intervention using observational data if certain assumptions are met.

One critical assumption is the parallel trend effect, which is shown below

This means the difference between the control and treatment groups are constant over time prior to the treatment or intervention. It also assumes that only one thing changes at the time of the treatment that could affect the results – and that’s the treatment itself. Finally it assumes there are no other changes in behavior, policy or any other time varying differences between the groups that could falsely increase or decrease the appearance of effectiveness of the intervention (masks).

Just intuitively most people reading this will know that when schools change masking policies, other things are likely to change too. One of these things may be testing policies or reporting policies. Schools may also have changed masking policies or practices on their own during the study period (which I will get to in more detail below). Since these districts are spread out geographically, there will be changing differences in community case rates. There will also be differences in levels of immunity due to differing levels of vaccination and natural immunity.

Visual inspection is a useful way to see if the parallel trend assumption is met. It also can affect your results by where you decide to start measuring the pre intervention period – shorter time period prior may be preferable for capturing recent deviations in the constant difference between groups. As you can see here, preceding the mask lifting, the difference between the mask lifters and controls is not stable. So this assumption is not met by looking at this particular time frame.

Second testing rates in the cities/towns were declining at different rates prior to the dropping of the mask mandates so this is a time varying confounder with inconsistent trend for the unmasked vs control group.

To get back to immunity levels, this will be varying and inconsistent across the control and mask lifting groups as the more highly vaccinated students (in the mask lifting group) would be expected to experience an early decrease in infection risk followed by decreased vaccine-related immunity, increasing their susceptibility to infection over time. The masked group with lower vaccination rates may have had higher levels of seroprevalence which increased during the omicron wave. This may have provided the masked group with more durable protection going into spring. It would be fascinating to see the seroprevalence of these different areas of Boston over the course of the school year, and importantly, the seroprevalence rates were not included in this study.

Finally, to use the Diff in Diff technique to infer causation, there should not have been changes in masking policies or behaviors before or after the “treatment” time and it appears that at least one major change did occur that the authors I believe were unaware of (though I invite their comments!): a number of the schools who were stated to have lifted mask mandates in this study in the Feb-March period actually received a waiver and had already lifted their mask mandates prior to time 1, once they achieved an 80% vaccination rate. I understand a list is being compiled of the number of confirmed schools to which this applies.

Further, in the Boston study, the authors do not have information on testing rates, including at home testing, at the district level. This is a hugely important limitation and potential confounder- even with the diff in diff design. They state that in January 2022, the Massachusetts DESE strongly recommended replacing test to stay with rapid at home antigen testing. They state they do not know which programs the districts participated in – and this would not matter with the Diff in Diff design – if the policies or programs did not change over time- but we do know that though the MSESE had recommended discontinuing close contact testing in January, replacing those with at-home tests, the CDC continued to recommend testing of unmasked close contacts through the Spring of 2022 and we are not certain if any schools or districts may have continued to test unmasked close contacts either at school or at home. Further as at home testing became more heavily relied upon, the more affluent districts that dropped masked mandates may have been more likely to test.

For those who have spent time studying the epidemiology of COVID-19 cases in schools and children (I have; for example 1, 2), you will know there are many constantly moving parts that can lead to bias and changing biases and degrees if bias over time when evaluating the effect of a school intervention. My research group published a preprint of masking in two very similar districts in North Dakota (it is currently being revised after peer review and based on helpful outside feedback we received) and we discussed using the Diff in Diff method but I argued we could not assume there we no time varying confounders or other changes in policy or behavior that accompanied the mask policy changes and could affect the appearance of mask effectiveness… even if we could show a parallel trend prior to the change in policy.

As you can see, in our study, in two demographically very similar K-12 geographically adjacent districts in the same municipality in Fargo North Dakota, there was no difference in student case rates (y axis) while the districts had different mask policies (FPS, in blue, had a mask mandate prior to the vertical line at 1/17/2022 and WF in purple did not) or when they had the same lack of mask mandate after 1/17/2022.

Because student infection rates are so highly dependent on current community rates and prior immunity levels within the community, I would argue our findings of lack of major difference in mask vs no mask mandate districts are more convincing, yet still with many limitations due to being an observational study with many factors unique to the districts that can influence case rates.

But this also brings up the issue of publication bias, with many observational studies of school masking finding conflicting results. For example, in September of 2021, MMWR released a publication by Budzyn et al which found significantly lower increases in pediatric COVID-19 cases in counties with school mask mandates, but included only two weeks of in-school data from a limited subset of counties in the US. When my research group extended Budzyn et al’s analysis out to nine weeks and included 1832 counties, instead of the original 520, we failed to identify a significant correlation between school mask mandates and pediatric covid cases.

Interestingly our more robust analysis with the opposite conclusion was rejected by MMWR, (though no faults were identified in our methodology), but did go on to be published in the Journal of Infection. This points to the issue of some journals being more inclined to publish findings that align with their beliefs: another reason we really should not be using observational data to justify recommending or mandating masking, when we have some randomized data finding little to no effect of community masking, which is consistent with randomized data from influenza and there are obvious downsides of recommending or mandating especially children continue to mask.

Effect Size and Community Cases

In the Boston study, the identified masking effect size against cases is implausibly high. They say the dropping of the mask mandates corresponded with an additional 11,901 cases, which was 33.4% of ALL cases in the unmasked districts. Among the staff they found 40.4% of the cases to be attributable to the lifting of the mask mandates.

This is unrealistic considering most cases come from the community into the school AND we have a randomized study from Bangladesh failing to find any effect of either community or cloth masking in anyone under 50 (and that signal was modest, at around 11% decrease rate with surgical masks, which was uncertain, and no significant decrease with cloth masks). We also have a regression discontinuity design study from Spain which takes advantage of the fact that 5-year-olds don’t mask and 6 year old do and there was no significant discontinuity from age 5-6 as compared with other ages to suggest an effect of masking on case rates.

Additionally, the authors of the Boston study made the difficult-to-understand choice of “consider[ing] community rates of COVID-19 as part of the causal effect of school masking policies rather than a source of bias” in other words they saw community case rates to be a result of school masking policies/school case rates rather than community case rates to be the major source of school cases. A large body of research has suggested the opposite finding COVID-19 in schools is up to 10-20x more likely to come from outside of the school than from within. This includes a study from the UK where children <12 were not masked.

But they did plot the results of school COVID cases vs community covid cases and, as you can see here:

school case rates had a similar relationship to city/town case positivity rates in all districts, which speaks against any large impact of school masking. Further, it is unclear why cases would be presumed to be coming from the school to the community with the school peak lagging the community peak in two of the districts or why any difference seen in the masked district would be presumed to be due to masks when the March 17 and the did not lift groups appear so similar in the difference between school (black) and community (orange) case rates. Looking overall at how much the mask district differs from the community compared with the unmasked districts, it’s clear the difference is modest--- and of course may not be attributable to masks!

Even in light of the additional supplementary analyses and information (I am not presenting every single analysis and figure here) my critique above still holds. I am trying to keep my explanation of the limitations of this study as simple and concise as possible. Though I want to acknowledge all of the hard work and thought the authors put into the analyses. Also, I welcome critique on critical points anyone feels I did not mention!

Discussion & Limitations

In the their Discussion, the only limitations they list is the lack of information on district testing and that this was a study of mask mandates, not masks. While these are appropriate to mention, they did not spend time addressing the many other limitations that I have mentioned above.

Instead they discussed structural racism and educational inequities, which are important topics, but their study did not directly address inequalities or structural racism. Their conclusion that “universal masking may be an important tool for mitigating the effects of structural racism in schools” struck me as odd.

If we are considering structural racism and educational inequalities, they also failed to weigh the known downsides of continued masking of children. Beyond the obvious fact that children seeing their friends smile and understanding them and not seeing them as vectors of disease have value, research in this topic has already found children with hearing impairment to have impaired word recognition in settings with mask wearing. Even children without hearing impairment have been found to have reduced word identification particularly in a noisy environment when the speaker is masked. Face masks also appear to impair recognition of emotions, trustworthiness and perceived closeness and may “undermine the success of our social interactions.” These are drawbacks that need to be weighed… covid is for most children a small threat at this point and very far from the largest threats they face to their health and wellbeing.

So why was this study published in the NEJM with its limitations and implausibility when COVID-19 is a diminishing threat to children, and we have higher quality data that contradict their findings? I’ll leave that to the readers of Sensible Medicine to discuss.

A guest post by

Tracy Beth Høeg, MD, PhD

PhD Epidemiologist & Practicing MD. Currently at NCOA, UC San Francisco, Associate Prof Clinical Research-U of Southern Denmark. Former Team 🇺🇸 &🇩🇰 ultra/mountain runner, musician, writer, podcast host, mom of 4