Carving at the Joints: On Segmentation, Its Limits, Its Use
Carving at the Joints: On Segmentation, Its Limits, Its Use
A tour through the literature on classification, from Plato and Mill to Hacking, Foucault, Bowker and Star, on why we cannot think without categories and why the line drawn is never innocent.
In 1936, the British statistician Ronald Fisher published a four-page paper on the iris flower. From four measurements per specimen (sepal and petal length and width), he showed that a single linear combination of the four could separate Iris setosa from Iris versicolor with no error. The technique, linear discriminant analysis, draws the best dividing surface between two classes of points in the space of their features.1 Once the surface is fixed, the labels look like found facts about the world rather than choices made by an analyst, and the act of carving disappears under the categories it produced.
These notes pick up from a recent digest and a long-standing discomfort around classification thresholds in psychometrics. The longer reading is overdue.
Why we cannot think without categories
The case for classification is easy to forget, because we could not think without it. Plato, in Phaedrus (265e), describes the right method of thought as one that divides things “according to their natural joints, and not breaking any limb in half as a bad carver might.” The metaphor of carving nature at its joints has survived twenty-four centuries for a reason. John Stuart Mill, in A System of Logic (1843), turned it into the doctrine of natural kinds. Some groupings hold together in a way that lets us learn from one instance and predict the next. Gold conducts electricity, melts at a known temperature, has a fixed atomic weight, resists corrosion. Discover one of those properties on one piece of gold, and you can expect the next piece to share them. Other groupings have no such grip. “Things in my room” or “items that are blue or square” point at no underlying structure, and an observation about one member tells us nothing about the next. On Mill’s account, science is the search for the kinds that bear the weight of induction, the move from past observations to future expectations. The kinds sit at the beginning of inquiry, before any data is gathered. Every experiment is run against a category presupposed before it begins, and behind that category sit intuition, imagination, prior experience, and belief about how the result is likely to come out. W. V. O. Quine put the worry sharpest in his 1969 essay “Natural Kinds”. Induction works only when our categories pick out projectible regularities, the features that genuinely repeat from instance to instance, and there is no view from nowhere that tells us in advance which categories will project. We start with the rough partitions handed down by evolution, language, and tradition, and keep the ones that keep paying.
Mill’s case for science has cognitive and computational siblings. George Miller’s 1956 paper on the magical number seven put working memory at about seven items, which means a population larger than that cannot be reasoned about without first chunking it. The same constraint runs through any computer program. Every operation on a dataset, a groupby, a join, a filter, a model fit,2 is preceded by a partition, a rule for which rows belong together and which do not. Without partitions, nothing computes, no policy applies, no intervention targets. To think a thing at all, or to act on it, is already to have placed it in a class.
The joints that are not always there
The trouble starts when the joints are not always there to be carved. Nelson Goodman, in Fact, Fiction, and Forecast (1955), introduced a small predicate-monster called grue. An object counts as grue if it has been examined before some future time t and is green, or has not yet been examined by t and is blue. Every emerald examined to date is therefore both green and grue, since each was examined before t and is green. Both “all emeralds are green” and “all emeralds are grue” fit the existing data perfectly. The two predicates only disagree about emeralds first seen after t. The first hypothesis predicts they will be green, the second that they will be blue, and the data we have cannot tell us which to bet on. We bet on green anyway, because “green” is entrenched in our practice and “grue” is not. Goodman’s lesson is that the apparent neutrality of a category, the feeling that it tracks the world rather than constituting it, is a function of how often the category has been used, not of the world itself.
Categories as social stakes
The sociological version landed earlier. Émile Durkheim and Marcel Mauss, in De quelques formes primitives de classification (1903), argued that a society’s cosmology, the way it carves up the natural world into kinds, is a projection of its own social structure. The four elements, the seasons, totems, kin groups, the great chain of being. Each scheme for sorting the world reproduces the schemes by which the society sorts itself. There is no neutral grid of natural categories on which social ones are then laid; the natural ones were social all along. Mary Douglas, in Purity and Danger (1966), gave the line still quoted, that dirt is matter out of place. The point is that “dirt” is not a property of objects but of arrangement. Soil on a boot is dirt indoors and not dirt in the garden, food on a plate is normal and food on a face is mess, and what counts as misplaced presupposes a scheme that says where things belong. Pierre Bourdieu carried the argument into the politics of taste. In La Distinction (1979) and the work around it, he showed how cultural classifications, what counts as art versus craft, serious music versus entertainment, a real intellectual versus a mere commentator, function as instruments of class reproduction. The dominant group sets the distinctions, the dominated internalise them as habitus, the durable dispositions that feel personal but are socially produced, and the partition stabilises the hierarchy under the appearance of taste. Bourdieu’s compressed formulation, that les goûts sont avant tout des dégoûts, tastes are above all distastes, the disgust at the tastes of others, captures the mechanism precisely. Every aesthetic preference is also a refusal, every refusal sorts a population, and the sort is doing political work whether or not the taster notices.
When the classified can read
Things change again when the classified can read about the classification. This is Ian Hacking’s contribution (1936-2023, Canadian philosopher of science). In Rewriting the Soul (1995) and the essay “Making Up People” (2006), Hacking argued that human kinds (anorexic, autistic, fugueur, multiple personality, “the kind of person who acts out of frustration”) produce looping effects that natural kinds do not. The category, once published, reaches the people it labels. They recognise themselves in it, organise around it, contest or embrace it, and change because of it. The category in turn gets revised to fit the people now occupying it, in a loop the original namers do not control. Hacking’s signature case is multiple personality disorder, which moved from a handful of recorded cases in the 1970s to tens of thousands by the late 1990s, then receded again as the diagnostic frame shifted across two decades of co-authorship. The DSM-5 threshold for major depressive disorder (five symptoms of nine over two weeks with clinically significant distress) is the same kind of artefact, a defensible line drawn on a smooth gradient, producing the realities of who carries the diagnosis for life. Autism is the live case today. Clinical prevalence has expanded several-fold since the 1990s, partly through diagnostic substitution, partly through real broadening of the criteria, and partly through an active self-advocacy movement (identity-first language, the rejection of cure-based research priorities, the #ActuallyAutistic hashtag) that has reframed autism as identity rather than pathology, with the category shifting accordingly across each new edition of the manual.
Hacking’s diagnosis admits its own inverse. Paul Watzlawick, John Weakland, and Richard Fisch, of the Palo Alto school of brief therapy, name the move in Change (W. W. Norton, 1974). They distinguish first-order change (moving pieces inside the existing frame) from second-order change (treating the frame itself as a degree of freedom). Their book is full of clinical cases where the cure is reframing, swapping the description under which a situation is held so that the problem of the old category dissolves in the new one. If categories make their objects, then a different category, deliberately substituted, can sometimes unmake the problem the old one defined. The line is movable. Noticing that it is a line is the first step in moving it.
Madness, upstream of psychiatry
The DSM threshold and MPD cases are recent. The broader category of madness sits upstream of both. Michel Foucault, in Histoire de la folie à l’âge classique (1961), traced how the modern category of madness took shape in seventeenth-century Europe with the great confinement, when the mad were locked up alongside the poor, the unemployed, and the criminal under a shared institutional rubric of unreason. Whether some forms of psychic distress have an underlying neurological basis is a separate, empirical question, and for several conditions (severe depression, schizophrenia, bipolar disorder) decades of work in genetics and neuroimaging have given partly affirmative answers. Whether “the mad” is a coherent kind, fit to bind those dysfunctions into a single sort of person, is a different question, and Foucault’s answer was that the modern concept does more institutional work than scientific. The anti-psychiatry tradition that followed, Thomas Szasz in The Myth of Mental Illness (1961) and R. D. Laing soon after, pressed the same line. Mainstream psychiatry has gone the other way. The DSM-III (1980) and its successors reorganised diagnosis around discrete operational checklists (lists of symptoms with explicit thresholds) in place of psychoanalytic narrative, in what is sometimes called the neo-Kraepelinian turn, after Emil Kraepelin’s late-nineteenth-century attempt at a categorical taxonomy of mental disorders. Each subsequent edition has tried to refine the kinds and ground them in biology where it can. Some categories do sit on firm biological ground. Down syndrome shows up on a karyotype, Huntington’s on a single-gene test, certain epilepsies on EEG, and the structural brain differences associated with several psychotic disorders show up reliably on MRI. These are kinds in something close to Mill’s sense, with boundaries that live in biology and support indefinitely many true generalisations. Most DSM categories sit at lower resolution. The contemporary attempt to break the impasse is the Research Domain Criteria (RDoC) framework, launched by the US National Institute of Mental Health under Thomas Insel from 2010. RDoC explicitly sets aside DSM categories as research targets and organises mental dysfunction along measurable dimensions (negative valence, cognitive control, social processes, arousal regulation), betting that biology will track the dimensions rather than the categories. The parallel HiTOP consortium3 makes the empirical case that DSM’s categorical structure is poorly supported by factor analyses of symptom data, and proposes a hierarchical dimensional alternative. Neither has displaced the manual in clinical practice, but the field’s most active researchers no longer take its categories at face value. The Foucault objection bites where the category outruns what the instruments can confirm, and the disagreement between his reading and the manual’s project has not been resolved, only redrawn with each new edition.
A working defence: classifications as scaffolding
This is where my running discomfort with psychometrics sits, and where I owe a small concession. The colour-personality frameworks that travel through corporate workshops (DISC, MBTI, Insights, the Big Five rendered in pastel) are the same kind of artefact at consumer scale, and a defence of them is available. The British statistician George Box gave it the working line, “all models are wrong, but some are useful”. Held lightly, as a temporary scaffold for noticing that a colleague processes information differently from oneself, these tools offer rough orientation. People I respect have found them useful as keys for testing behaviours and forming working hypotheses about colleagues. The trouble lies in the more common move, where the category slides from heuristic to identity. The test result becomes a self-description, the loop closes tighter than any DSM’s, and ten years on the user is still introducing themselves as an INTJ. The class outlives its function, and what was scaffolding gets treated as discovery.
Categories as institutional infrastructure
The stakes harden when a category becomes institutional infrastructure. Geoffrey Bowker and Susan Leigh Star, in Sorting Things Out: Classification and Its Consequences (MIT Press, 1999), walk through the International Classification of Diseases, the racial categories of South African apartheid, and the Nursing Interventions Classification, showing how a formal scheme, once built into a payment system or a database, reaches back into bodies, accounts, and rights. They coin torque4 for what happens when a person fails to fit cleanly and the infrastructure bends the person into shape. James C. Scott, in Seeing Like a State (1998), tells the same story from the state’s side. Cadastral surveys, surnames, standardised weights, and uniform forestry each made populations legible5, which made them taxable, conscriptable, and governable. Legibility, on Scott’s reading, is what brings a population into existence as a manipulable object.
What categories do
A category is performative, doing real work in the world the moment it is in use rather than describing the world from a distance. The work is sometimes welfare, sometimes harm, often both. The trap is older than any modern instrument. Abraham Maslow named it in 1966 in The Psychology of Science, the law of the instrument. “I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” The hammer changes shape across the entry. For the diagnostician it is the manual. For the auditor the chart of accounts. For the conservation biologist the Linnaean tree. For the consultant the colour-personality framework. Each sees only the categories it brought to the encounter. Agüera y Arcas’s What Is Intelligence? makes the same move in biology, with what counts as alive resolving differently depending on whether mitochondria, viruses, or Daisyworld sit inside or outside the frame. The same operation now runs at machine speed in algorithmic classification systems, where the categories are no longer designed by a committee but emerge from training on billions of examples. That arc deserves its own treatment, taken up in a sequel entry.
What remains
Carving the world is what produces the kinds we then think with, and a kind once produced does work whose first cost is its invisibility. Classification cannot be refused. One set of joints is rarely strictly better than another. What remains is a discipline of attention, to keep the line in sight as a line.
Some categories have earned their place, each been stress-tested across decades or centuries, the institutional infrastructure built on top of them encodes a collective learning no individual could replicate. The periodic table, the Linnaean species concept, double-entry bookkeeping, the core taxonomies of common law each pick out kinds that keep paying their inductive rent.
Others are debris of forgotten choices that have stopped being interrogated, or fictions whose costs are now hidden in the systems built around them.
The hard part is that the two often look identical from inside. Plausibility is not evidence of fit. A category that pattern-matches recognisable fragments feels true even when it carves nothing, the trick behind the Barnum effect (the readiness with which people accept vague horoscope-style descriptions as uniquely accurate to themselves, recognising the description precisely because it was engineered to recognise everyone).
The classifier who keeps the carving visible to themselves does less harm than the one who has stopped seeing it.
And the cost of an unexamined line is rarely paid by the one drawing it. The DSM committee does not carry the diagnosis; the apartheid reclassifier did not lose his neighbourhood; the actuary who set the underwriting threshold does not pay the premium. This is the recurring asymmetry across every case the entry has visited, from Bourdieu’s distastes to Bowker and Star’s torque to Scott’s legibility, and it is what keeps the question of classification a political one rather than a merely epistemic one.
Footnotes
-
[Definition] Given two clouds of labelled points in a space whose dimensions are the measurements (here sepal and petal length and width), linear discriminant analysis draws the flat surface that best separates them. In two dimensions the separator is a line, in three a plane, in N dimensions a flat hyperplane of dimension N-1. The technique sits at the foundation of spam filters, medical diagnostics, and most pre-deep-learning industrial machine learning. ↩
-
[Definition] Standard operations in any data pipeline. Groupby partitions rows by the value of a chosen column (e.g. group customers by country). Join matches rows across two tables on a shared key (e.g. attach each order to its customer). Filter keeps only the rows that satisfy a condition (e.g. orders above 100 euros). Model fit tunes a function’s parameters so its predictions track the data, the elementary act of learning a relationship from examples. Each presupposes that the data is already organised into rows that count as the same kind of thing. ↩
-
[Expansion] Hierarchical Taxonomy of Psychopathology. The consortium took shape around 2015 and published its key statement of the dimensional model in Kotov et al. (2017) in Journal of Abnormal Psychology. HiTOP’s empirical argument rests on factor analyses of symptom-level data showing that DSM’s categorical boundaries do not carve the underlying psychopathological space at its joints. The proposed alternative is a hierarchical dimensional structure with broad spectra (internalising, thought disorder, externalising) at the top and narrower symptom dimensions below. ↩
-
[Definition] In Bowker and Star’s usage, the friction generated when a body, a record, or a life fails to fit cleanly into a classification scheme that the surrounding infrastructure cannot easily accommodate. Rather than revising the scheme, the system bends the person, the record, or the life to fit it. The South African racial reclassifier, who could move someone from one category to another and redirect their schools, neighbourhoods, marriages, and rights accordingly, is the limit case. ↩
-
[Definition] In Scott’s usage, the property that makes a population, a forest, or a landscape readable to a centralising power, especially the modern state. Surnames, cadastral surveys, standardised weights, single-species plantations, and street numbering each impose schemes that allow the centre to count, tax, and govern at distance. The cost is paid in the local complexity that the standardising scheme erases. ↩
