Truth of varying shades: Analysing language in fake news and political fact

This research paper presents an analytic study on the language of news media in the context of political fact-checking and fake news detection. The authors compare the language of real news with that of satire, hoaxes, and propaganda to find the characteristics of untrustworthy text. Additionally, it probes the feasibility of automatic political fact-checking by presenting a case study using PolitiFact.com's factuality judgements on a 6-point scale.

Highlights:

The results show that first-person and second-person pronouns are more often used in less reliable or deceptive news type. The more trustworthy news outlets tend to be more impersonal in their reporting. this corroborates with previous work which finds the use of such pronouns to be indicative of imaginative writing. The evidence of imaginative writing might be a closer match when it comes to detecting fake news rather than lie detection on opinions. The results also show that more superlatives, subjective and modal adverbs are used in untrustworthy news.
To test news reliability prediction the researchers trained a Max-Entropy classifier to assess articles without relying on author-specific cues. While the model achieved 65% reliability on out of domain sources, there is still room for improvement.
Using PolitiFact's 6 point truthfulness scale, the study finds a combination of the Long short-term memory model (LSTM) along with the Max-entropy classifier model, Naive Bayes model, and the Linguistic Inquiry and Word Count model (LIWC) to be the most effective measure of truthfulness.

Sources: