Qiong Wang1, John F. Quensen1, Jordan Fish1, Tae Kwon Lee2, Yanni Sun3, James M. Tiedje1, James R. Cole1
1Center for Microbial Ecology, Michigan State University, East Lansing, Michigan, USA
2School of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea
3Computer Science and Engineering Department, Michigan State University, East Lansing, Michigan, USA
Tóm tắt
ABSTRACT
Biological nitrogen fixation is an important component of sustainable soil fertility and a key component of the nitrogen cycle. We used targeted metagenomics to study the nitrogen fixation-capable terrestrial bacterial community by targeting the gene for nitrogenase reductase (
nifH
). We obtained 1.1 million
nifH
454 amplicon sequences from 222 soil samples collected from 4 National Ecological Observatory Network (NEON) sites in Alaska, Hawaii, Utah, and Florida. To accurately detect and correct frameshifts caused by indel sequencing errors, we developed FrameBot, a tool for frameshift correction and nearest-neighbor classification, and compared its accuracy to that of two other rapid frameshift correction tools. We found FrameBot was, in general, more accurate as long as a reference protein sequence with 80% or greater identity to a query was available, as was the case for virtually all
nifH
reads for the 4 NEON sites. Frameshifts were present in 12.7% of the reads. Those
nifH
sequences related to the
Proteobacteria
phylum were most abundant, followed by those for
Cyanobacteria
in the Alaska and Utah sites. Predominant genera with
nifH
sequences similar to reads included
Azospirillum
,
Bradyrhizobium
, and
Rhizobium
, the latter two without obvious plant hosts at the sites. Surprisingly, 80% of the sequences had greater than 95% amino acid identity to known
nifH
gene sequences. These samples were grouped by site and correlated with soil environmental factors, especially drainage, light intensity, mean annual temperature, and mean annual precipitation. FrameBot was tested successfully on three ecofunctional genes but should be applicable to any.
IMPORTANCE
High-throughput phylogenetic analysis of microbial communities using rRNA-targeted sequencing is now commonplace; however, such data often allow little inference with respect to either the presence or the diversity of genes involved in most important ecological processes. To study the gene pool for these processes, it is more straightforward to assess the genes directly responsible for the ecological function (ecofunctional genes). However, analyzing these genes involves technical challenges beyond those seen for rRNA. In particular, frameshift errors cause garbled downstream protein translations. Our FrameBot tool described here both corrects frameshift errors in query reads and determines their closest matching protein sequences in a set of reference sequences. We validated this new tool with sequences from defined communities and demonstrated the tool’s utility on
nifH
gene fragments sequenced from soils in well-characterized and major terrestrial ecosystem types.