2020-12-20

The trump tweet machine

As usual , something silly and simple has turned in to many hours, and learning some lessons.

I made a Trump tweet machine that prints Trump's tweets on an ASR33 teletype. It was a fun use of the teletype as it only does ALL CAPS anyway, meaning Trump tweets were a perfect fit. It was also printing on the only paper I had which was a sort of tissue paper (so you could, in theory, wipe your arse with the tweets if you wanted to). I have some proper teletype paper rolls now.

The original version was very simplistic - it used IFTTT which can do something for a tweet by a person. It can send that via http, with the name, time, and tweet text. I printed that. It worked.

However, it was not perfect. The main issue is a hour or so lag with IFTTT. I decided I could do better.

The code is asrtweet.c on GitHub. It has had an annoying amount of tinkering to get it right. I suspect there will be more.

I decided to use twarc after a bit of googling. It seemed to be one of the few command line tools that would feed the raw JSON which Twitter provide. Handling JSON is easy, but it seems handling tweets has a lot of gotchas, and is not as obvious as it seems.

  • There are two formats, the text in the main body, and full_text in extended_tweet (if present). I can only assume this allows for legacy where tweets were shorter.
  • There is a range of characters to display as the text will start with lots of @names which you don't show in the text usually.
  • The character positions / ranges are unicode characters not bytes - arg!
  • It, or something, changes " and ' and several other characters to unicode versions, so I replaced them to allow printing.
  • The text content is XML coded for &, < and >. Why? This is JSON, so why do that? Crazy! And these are not counted as one character for character counts, but as &, a, m, p, and ; as separate characters.
  • There is a timestamp based on unix time to the ms, good, but not for the time stamp of an original tweet if retweeting. Date/times seem to be in a text format only otherwise. It even does a time zone (e.g. +0000) but shows UTC (yes +0000) for a Trump tweet which is clearly wrong.
  • A retweet only has the abbreviated text with RT on it, you have to look in retweet_status to find the original text, possibly within an extended_tweet block.
  • Whilst referenced twitter handles are @name in the text, some things are not as you might want to display them. E.g. a URL is a twitter URL. There is then a list of replacements/expansions which you can use to replace things like this with the original. Again, all unicode character counts not byte counts.
  • It seems twarc can filter follow a user by user number, but gets all events mentioning that user, so I have to filter if I want see tweets by that user. Mostly that is simple, but when it is Trump, that is tweets mentioning him several times a second.
  • I replaced media with [photo] or [video] as a teletype is not good at such things. I also did not bother printing tweets that only have media in them and no text.
  • There seems to be no flag in the JSON for twitters "disputed" status, though as I am printing as tweets they would not have time to add it. But even so, printing an old tweet, does not have any way to know. So I had to add this to his tweets myself :-)

There was a lot to learn.

But the end result (so far) is code that runs twarc to filter follow one or more users, and pick tweets by them. Handles retweets and replies with an extra line of text printed. Handles wrapping to 72 characters. It is typically printing a Trump tweet within about 10 seconds of him tweeting. No more hour long lag.

It is rather surreal having a teletype (very noisy) suddenly wake up and print a message from the president of the united states in my study here.

Here's one which did not even need converting to upper case :-)

5 comments:

  1. Haven't you got your oh and zero the wrong way round, or is that deliberate?: https://en.wikipedia.org/wiki/Slashed_zero

    Edit, I see it's even more complicated:
    https://en.wikipedia.org/wiki/Slashed_zero#Slashed_'O'

    ReplyDelete
  2. > The character positions / ranges are unicode characters not bytes

    Why on earth would you expect anything else? Characters have been Unicode (or some encoding of it anyway, ideally UTF-8) for decades now unless you're an American who speaks nothing but English. Bytes are for non-textual content.

    ReplyDelete
    Replies
    1. Because utf8 has bytes, that is all. Way simpler to add N to a pointer than count N characters from a pointer.

      Delete
  3. can it print an ascii penis made of 1-9 and a-z

    ReplyDelete

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.