Natural class sandbox

Last updated (release 1.0) July 3, 2020.

Please feel free to email me with errors/bugs/improvements.

Drag symbols between boxes to move them in/out of your language's inventory and target set. The inventory is initialized to resemble English.

After clicking submit, a report will appear below the boxes indicating whether the targets compose a natural class given the inventory. If so, you'll see one or more matrices; if not, it will tell you. A natural class is a group of one or more phones that can be isolated from the language's inventory using a conjunction (matrix) of one or more features. For natural classes, it will also report all the features that the target phones share, the maximal characterization of the class.

If the targets compose a natural class, the program will search for the minimal (fewest features) characterization(s) of that natural class that successfully isolate(s) it from the rest of the inventory. In the frequent event of a tie, all such options are given. It does not use brute force, since that would require checking up to 268 million combinations of 28 features. Rather, it uses a branch-and-bound algorithm (Ailsa Land and Alison Doig, 1960) as implemented for phonological features by Hubie Chen and Mans Hulden, “The computational complexity of distinctive feature minimization in phonology” (2018). As the authors note, this algorithm, which in the present context usually checks some hundreds or thousands of partitions, is not guaranteed to find the best real solution. But in practice, it seems to work well here. Please tell me if you find a case where it doesn't converge on the minimal solution.

The feature system is adapted from Bruce Hayes, Introductory phonology (2009), except that Hayes values features as +, -, or 0, while the present implementation translates 0 to -. This was done for a few reasons. First, consider the inventory {p, k}. How would one isolate {k}? A phonologist would suggest [(+)dor] (or perhaps [-lab]). But with ternary specification, other, less intuitive solutions emerge, such as [-low] (since k is [-low] while p is [0low]). Dichotomizing seems to rein in these unintuitive solutions. One might reply to this example by saying that [low] is dependent on [dor], thus any matrix that includes [low] should also include [dor], rendering the [-low] solution non-minimal. But in practice, that's not how phonologists usually write up matrices: We're usually happy to characterize, say, low vowels as [+low] (without mentioning [dor]) or sibilants as [+strid] (without mentioning [cor]). A second problem with 0 is that Hayes (2009) is inconsistent in whether he uses it to mean underspecified vs. dependent. For example, [low] is specified (non-zero) only for [+dor] segments, suggesting that [low] is dependent on [dor]. This is intuitive. However, [back] is given as - for c, 0 for k, and + for q. This is using 0 not for dependency (on [dor]) but as a ternary classifier. For consistency, Hayes should treat k as [-back] (as well as [-front]), so we could say that [back] is dependent on [dor] just as [low] is. Finally, the published algorithms (notably Chen and Hulden 2018 above) assume that features are independent, and thus it was easiest for me to maintain that assumption. I'd be happy to hear any suggestions/discussion of these matters. In future versions, I might be able to improve the minimization algorithm and/or add additional proposed feature systems as options.