Explanation
"Search the BNC for concordances" provides a user-friendly
but powerful interface to query and return up to 1000 examples from the
BNC of your search terms highlighted in context (the sentence in which they occur
flanked by the preceding and following sentences). An option to display results
in traditional concordance lines is being tested for cross-browser compatibility.
Simple Query supports different kinds of matches. Match
"the phrase", "all the words" or "any of the words" are self-explanatory
options. "Boolean" adds support for logical AND, OR, NOT and grouping
with ( ). When Boolean mode is selected, a list of the Boolean
operators appears below the input box; the list can be hidden by
unchecking the box above it.
To maximize performance, wildcard operators are not supported.
Instead one can choose between exact wordform matching and lemma
matching. Lemma means base-form of a word. Enter any form
of a lemma to match all other forms. For example, been
matches itself and be, am, is, are, was, were and being. Currently
there is no distinction by part of speech (PoS),
so being matches both verb and noun forms.
Random sort returns a random sample of matches from the BNC in random
order. Sort by text id also yields a random sample in text id
order, which is more likely to show excerpts from similar texts near
each other.
Advanced Query has a larger query entry box for more complex queries. It also adds two functions,
filtering by text-type and extended mode matching.
There are two sets of text-type categories. The compilers of the BNC assigned each
text to one of 17 "domains" detailed in the BNC User's Guide. David Lee later introduced a far more nuanced distinction
among 71 "genres", some of which have a very small sample size. In
this detailed article Lee motivates and discusses his finer classification.
This response by Guy Aston raises a number of valid counterarguments and concerns.
One argument is particularly compelling:
The BNC, which contains just over 4,000 texts, uses a framework which guarantees at least 100 texts in most principal categories.
You may or may not like the categories chosen, but the corpus arguably allows you to generalize about these categories
– about spoken and written texts, the nine different domains of written texts, the four different domains of
"context-governed" spoken texts, and so forth – with reasonable certainty that findings will not be unduly biased
by any particular text or any particular subcategory of texts.
Anyone who undertakes serious research on language in various text-types in the BNC
should read both papers thoughtfully.
In addition to domain and genre there is a further breakdown of written texts into six medium types. Eventually medium filters
will be combined with filtering by domain or genre.
Four "Power Select" buttons assist in selecting related groups of domain and genre types:
- Select all is obvious.
- Toggle selection changes "selected" categories to "not selected" and vice-versa.
Tips: To clear filters by deselecting all, click Select all, then Toggle selection.
To select all but a few categories, first select the categories to exclude, then click Toggle selection.
- Select items containing / Deselect items containing
selects / deselects all text-types whose description contain the words or
word fragments you enter
(case-sensitive; separate multiple terms with a space). For example, wri selects all written texts.
For spoken texts specify at least spok since spo also matches sport.
To prevent selection of a subset of matching categories which contain the string you enter, precede
a string that distinguishes them with a -, e.g.
aca -non- matches academic but not non-academic.
As you (de)select text-types, you will see a running tally of the sample size in the right-hand column.
Extended query matching supports more sophisticated queries than Boolean mode. In addition to the Boolean operators,
extended mode supports: proximity match – forms must occur within n words of each other;
quorum match – at least n of the words you specify must occur in the sentence; strict order match
– one form must occur before another one; a specified word-form must occur in sentence-initial / sentence-final
position. Operators and examples are displayed when you select "extended mode" on the Advanced Query tab.
To make your queries more productive, this page can return datasets with up to 1000 matches.
Please be gentle with our server and request no more than you need. On the other hand, if you do need larger
or complete datasets feel free to request them from me.