March 10, 2015
Guest blog post by Lusann Yang
When I met Dr. Steve Cole, I didn’t know what to make of his hypothesis. It’s easy to imagine poetic, heartfelt words as a window into the soul, but can the way we speak every day reveal meaningful insights into our health and well-being? Cole is the VP of R&D for HopeLab and a professor of medicine at UCLA. At HopeLab, he develops tools to promote human resilience, an innate capacity that enables people to thrive in the face of adversity. Drawing upon research by Dr. James Pennebaker, Cole hypothesized that data analysis of a person’s natural language style—not what they say, but how they say it—could provide a measure for the key psychological ingredients of resilience. And he was going to let me dig into the data.
What Makes Us Resilient
Psychologists who study resilience have identified three attributes that are common among resilient people: a sense of purpose in life, meaningful connection with others, and a sense of control or agency in shaping our future. The first ingredient in this resilience formula, having a sense of purpose in life, is related to what’s known in academic circles as eudaimonic well-being. The second, connection, is the experience of deep connection with others – the opposite of loneliness. The third is self-efficacy, a can-do spirit reflecting a person’s confidence that they can take action to overcome challenges and reach their goals.
Psychologists often measure these markers of resilience through surveys. But surveys are notoriously unreliable—people interpret questions differently, or answer in the way that they think will please the researcher or present themselves in a positive light.
This is where Cole’s idea came in: he wanted to borrow the big-data analytics tools we use in my field, applied physics, to explore resilience and natural language. He theorized that people’s word choices in essays about mundane things, like descriptions of images or places, could provide insight into resilience markers. This idea grows out of research by Pennebaker that suggests that how people say things might be as revealing as what they say.
Investigating New Measures of Resilience
Cole collaborated with Pennebaker’s group at the University of Texas, Austin, to collect survey data and natural language samples from 800 undergraduates, including demographic information, self-reported survey measures of loneliness, self-efficacy, and eudaimonic well-being, and five essays written by each student. The essays included stream of consciousness writing, responses to an ambiguous picture (the Thematic Apperception Test), and descriptions of places.
Our first step toward analyzing the language data was to look at it through the Linguistic Inquiry and Word Count (LIWC) toolkit developed by Pennebaker, et al. The toolkit counts the number of times specific types of words are used, and includes a dictionary to categorize words, such as personal pronouns, self references, and terms that connote positive and negative emotions.
Once we had the LIWC data, we developed language-based prediction algorithms for the three resilience markers (purpose, connection, control) using support vector machines (SVM), a classification tool popular in machine learning. Our goal was develop a system of data analysis for language that could match, and therefore potentially predict, the outcomes of standard psychological surveys about loneliness, eudaimonic well-being, and self-efficacy measured through self-reported data by subjects. Figure 1 shows results using a metric called “percentage swapped,” in which values below 50% mean our data is more accurate than chance. There was a definite signal, especially for the stream-of-consciousness writing samples. Our algorithm showed as much predictive power as demographic information, including age, sex, ethnicity, health, religion, and employment.
The Predictive Power of Our Language Analysis Algorithm To Match Psychological Surveys
Exploring Opposite Extremes
Psychologists are especially interested in people on the ends of the spectrum—those who are extremely resilient and those who struggle. With this in mind, we wanted to explore the accuracy of our SVM language predictors to identify the top and bottom 20% of students in each trait of our resilience formula. A perfect algorithm would achieve 100% accuracy; random chance would achieve 20%. Our SVM algorithms far outperformed chance (see Figure 2), especially when we analyzed creative storytelling (the Thematic Apperception Test) and sample essays that described places.
Glen Coppersmith, a scientist at Johns Hopkins’ Human Language Technology Center of Excellence, gave us another inspiration. He suggested that we explore the specific words that are more or less likely to be used by the 20% of our sample with the highest resilience scores, compared with the lowest 20%. These results were especially intriguing. Cole noted that the people who scored high for resilience used words that suggested being in motion – striving and moving forward.
Some Key Words Used by Resilient People
Next, we moved from words to phrases, so we divided our sample essays into word-windows, analyzed phrases for their relationships to the markers of resilience using LIWC, and put the data through our SVM-rank model. For the samples in Figures 4 and 5, we coded the phrases associated with loneliness in red; phrases that are less lonely are blue. Stand back and squint, and you get a good picture of which person might be lonelier.
Finding Resilience in Natural Language
Our exploration of natural language through the tools of big data analysis yielded intriguing results. We found clear links between the words people use and psychological measures of their well-being, which opens the possibility for supplementing notoriously problematic psychological surveys with a new strategy for measuring resilience.
For Cole and his colleagues at HopeLab, this new approach to resilience measurement might be used both to create new resilience-promoting interventions and to evaluate their efficacy. That’s exciting, ground-breaking work – a great example of how insights from scientific inquiry can be translated into practical tools for people everywhere.
Words, as it turns out, may indeed be a window into our well-being.
Pennebaker, James W., R. J. Booth, and M. E. Francis. “Linguistic inquiry and word count: LIWC [Computer software].” Austin, TX: liwc. net (2007).
Vladimir N. Vapnik, The Nature of Statistical Learning Theory. Springer, 1995.
T. Joachims, Training Linear SVMs in Linear Time,Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2006.