The 2016 US Presidential Election ended with a surprise that Mr. Donald Trump won, despite the overwhelming prediction of a Clinton victory. There have been many studies challenging the theories in traditional political forecasting.

Some took an approach regarding statistics. Many studies concluded that many election forecasting models did not take into account between individual states predictions. However, a classical computation method limited such type of models that connects individual states (or fully-connected models). Hence, a group from QxBranch and Standard Cognition resorted to adiabatic quantum computation. (See: arXiv:1802.00069.)

D-Wave computers are adiabatic quantum computers that perform quantum annealing. A D-Wave 2X has 1152 qubits, and can naturally describes a Boltzmann Machine (BM) model, equivalent to Ising model in statistical physics. The energy function is described by:

$E[\mathbf{s}] = -\sum_{\mathbf{s}_i \in \mathbf{S}} b_i s_i - \sum_{\mathbf{s}_i, \mathbf{s}_j \in \mathbf{S}} W_{ij} s_i s_j$ ,

where $\mathbf{s}$ are the values of all qubits (0, 1, or their superpositions). The field strength $b_i$ and coupling constants $W_{ij}$ can be tuned. Classical models can handle the first term, which is linear; but the correlations, described by the second term, can be computationally costly for classical computers. Hence, the authors used a D-Wave quantum computer to trained the election models from June 30, 2016 to November 11, 2016 for every two weeks, and retrieved the correlations between individual states. Then The correctly simulated that Mr. Trump would win the election.

This Ising model of election was devised after the election, and it is prone to suspicion for fixing the problems using the results. However, this work demonstrated the power of a quantum computer that it solves some political modeling problems that can be too complicated for classical computers.

Today is the presidential election.

Regardless of the dirty things, we can do some simple simulation about the election. With the electoral college data, and the poll results from various sources, simple simulation can be performed.

Look at this sophisticated model in R: http://blog.yhat.com/posts/predicting-the-presidential-election.html

(If I have time after the election, I will do the simulation too…)

The first presidential debate 2016 was held on September 26, 2016 in Hofstra University in New York. An interesting analysis will be the literacy level demonstrated by the two candidates using Flesch readability ease and Flesch-Kincaid grade level, demonstrated in my previous blog entry and my Github: stephenhky/PyReadability.

First, we need to get the transcript of the debate, which can be found in an article in New York Times. Copy and paste the text into a file called first_debate_transcript.txt. Then we want to extract out speech of each person. To do this, store the following Python code in first_debate_segment.py.

# Trump and Clinton 1st debate on Sept 26, 2016

from nltk import word_tokenize
from collections import defaultdict
import re

def untokenize(words):
"""
Untokenizing a text undoes the tokenizing operation, restoring
punctuation and spaces to the places that people expect them to be.
Ideally, untokenize(tokenize(text)) should be identical to text,
except for line breaks.
"""
text = ' '.join(words)
step1 = text.replace(" ", '"').replace(" ''", '"').replace('. . .',  '...')
step2 = step1.replace(" ( ", " (").replace(" ) ", ") ")
step3 = re.sub(r' ([.,:;?!%]+)([ \'"])', r"\1\2", step2)
step4 = re.sub(r' ([.,:;?!%]+)\$', r"\1", step3)
step5 = step4.replace(" '", "'").replace(" n't", "n't").replace(
"can not", "cannot")
step6 = step5.replace("  ", " '")
return step6.strip()

ignored_phrases = ['(APPLAUSE)', '(CROSSTALK)']
persons = ['TRUMP', 'CLINTON', 'HOLT']
fin = open('first_debate_transcript.txt', 'rb')
fin.close()

lines = filter(lambda s: len(s)>0, map(lambda s: s.strip(), lines))
speeches = defaultdict(lambda : '')
person = None

for line in lines:
tokens = word_tokenize(line.strip())
ignore_colon = False
for token in tokens:
if token in ignored_phrases:
pass
elif token in persons:
person = token
ignore_colon = True
elif token == ':':
ignore_colon = False
else:
speeches[person] += ' ' + untokenize(added_tokens)

for person in persons:
fout = open('speeches_'+person+'.txt', 'wb')
fout.write(speeches[person])
fout.close()


There is an untokenize function adapted from a code in StackOverflow. This segmented the transcript into the individual speech of Lester Holt (the host of the debate), Donald Trump (GOP presidential candidate), and Hillary Clinton (DNC presidential candidate) in separate files. Then, on UNIX or Linux command line, run score_readability.py on each person’s script, by, for example, for Holt’s speech,

python score_readability.py speeches_HOLT.txt --utf8

Beware that it is encoded in UTF-8. For Lester Holt, we have

Word count = 1935
Sentence count = 157
Syllable count = 2732
Flesch-Kincaid grade level = 5.87694629602

For Donald Trump,

Word count = 8184
Sentence count = 693
Syllable count = 10665
Flesch-Kincaid grade level = 4.3929136992

And for Hillary Clinton,

Word count = 6179
Sentence count = 389
Syllable count = 8395
Flesch-Kincaid grade level = 6.63676650035