Each of Elon Musk’s tweets has the power to turn market prices upside down. So how to code an AI that can quickly analyze them?
Elon Musk’s fans have understood that when he tweets about a crypto-currency, prices go crazy !
On the other side of the field, we have NLP (text processing) AI that are increasingly powerful. Some of them even go as far as being able to engage discussions with you.
The most crypto-compatible Data-Scientists understand that there is a move to play by merging AI with tweet analysis.
Good news ! They’ve just found the tutorial that they need !
Step one : Scrap the tweets
🐦 The Twitter problem
In Python there are web scraping libraries.
These libraries allow to retrieve texts from web pages.
Ideal for the algorithm we want to create !
The problem is that Twitter doesn’t let one look at the content of its pages.
Indeed, to scrap tweets, we need to have access to the Twitter API. That is to say, to have unique identifiers that can only be used by one person.
But we want an algorithm easily usable and that does not create long lasting problems.
We will hence bypass this problem…
🥷 Bypass Twitter
To bypass the Twitter API, we’ll have to be smarter than the blue bird.
On the Internet, there are sites that reproduce exactly the content of Twitter.
They are called Viewers and it exists for many other social networks.
Well, now the solution seems simple ! Instead of doing Web Scraping directly on Twitter, we will bypass it by focusing on a Viewer site that will let us access its content.
In this first piece of code, we use the request library to Web Scrape our internet link (https://twstalker.com/elonmusk) :
import requests
URL = "https://twstalker.com/elonmusk"
page = requests.get(URL)
Now we have Elon Musk’s tweet page in HTML format.
To browse this HTML page more easily, we use BeautifulSoup.
It is a library that allows us to easily extract content from an HTML page:
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, "html.parser")
Above, we have transformed our HTML page in BeautifulSoup format.
We will then be able to search only the tags (the elements of this page) that interest us:
job_elements = soup.find_all("p")
That’s it, we have scraped the tweets in HTML format.
Tip for a better understanding : Feel free to display the output of each code brick. This will help you to understand better what is going on in our algorithm.
Here, we will extract only the text of each of these tweets. In fact we go from HTML format to Python string format:
tweets = []
for job_element in job_elements:
element = job_element.text
tweets.append(element)
#remove 'View a Private Twitter Instagram Account' from the list
tweets = tweets[1:]
We now have a list of the last 20 tweets from Elon Musk 🔥
You can already do your own analysis by displaying them :
print(tweets)
However, developers are known to be lazy. We don’t want to bother going on Twitter, and we certainly don’t want to dissect each and every Elon tweet by ourselves.
No, what we want is an AI working for us !
Step Two: Program an AI.
🤖 Analyzing tweets
To analyze text with Python, a particularly efficient library exists: NLTK.
With it, we’ll be able to filter out all the irrelevant words from a text to keep only the best !
We start by importing the library and installing the dependencies we need :
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
Next, a mandatory step in any AI dealing with text : transform sentences into tokens.
By the way, if your goal is to master Deep Learning - I've prepared the Action plan to Master Neural networks. for you.
7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:
- Plan your training
- Structure your projects
- Develop your Artificial Intelligence algorithms
I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.
To access it, click here :
Now we can get back to what I was talking about earlier.
Currently, we have lists of tweets. For an AI to analyze them, we need lists of words.
So we will transform each tweet into a list of words. This is tokenization :
from nltk.tokenize import word_tokenize
tweets_tokenized = []
for tweet in tweets:
tokens = nltk.word_tokenize(tweet)
for token in tokens:
tweets_tokenized.append(token)
That’s it, we have a list that contains only tokens.
In a classic NLP project, we would have had to clean it up : remove punctuation, emojis, etc.
But after reflection, we considered that this cleaning should not be applied to tweets.
Indeed, in the internet language, it is common to use emojis when referring to a company or a crypto-currency.
Thus, on Twitter, smileys are not necessarily an indicator of emotion, but on the contrary can be a vector of important information… 👻
So, no text cleaning for this algo !
We then use the NLTK tag module.
It will allow us to understand the value of each word : adjective, preposition, noun, etc
from nltk.tag import pos_tag
result = nltk.pos_tag(tweets_tokenized)
🏆 Display the Results
If you try to display the result variable as is, you may have no idea what you are looking at.
The result is a list of tuples of tokens and tags.
In fact NLTK uses abbreviations, tags, to describe the value of words.
In a previous post we described each of these abbreviations.
Here, tags we’re interested in are the ‘NNP’ and the ‘NNPS’: singular proper nouns and plural proper nouns.
Hence we filter the result by displaying only the ‘NNP’ and the ‘NNPS’:
[word for (word, pos) in result if pos == 'NNP' or pos == 'NNPS']
Output : [‘Doge’, ‘Doge’, ‘Endurance’, ‘Hangar’, ‘Dragon’, ‘Wow’, ‘Tesla’, ‘Starbase’, ‘Tesla’, ‘Bee’, ‘Who’, ‘Truth’, ‘Hertz’, ‘Tesla’, ‘Autopilot’, ‘QA’, ‘Sorry’, ‘Tesla’, ‘FSD’, ‘QA’, ‘’’, ‘Internal’, ‘QA’, ‘Strange’, ‘Tesla’, ‘Wild’, ‘T1mes’]
The result is immediately more accessible!
Apparently Elon Musk is talking a lot about a certain ‘Doge’ these days… is it going to increase in value in the next few minutes ?
Going further
For this tweet analysis, we could have used another library: spacy.
This library provides a more detailed text analysis.
With it, we could extract not only proper names but also company names.
Spacy is therefore an interesting option, but in the analysis of crypto-currencies its features would not be effective.
For example, the word Bitcoin would be taken as a company name, due to its notoriety. But a crypto-currency like Dogecoin would not get the same treatment and would go by the wayside.
Our advice is to use the approach that suits best your needs !
And above all, feel free to improve the code we just saw together.
For those more curious about NLP and word processing in general, we wrote a detailed tutorial for sentiment analysis in Machine Learning. It’s right here ! 😉
sources :
- Towardsdatascience – Named Entity Recognition
- Photo by Executium on Unsplash
One last word, if you want to go further and learn about Deep Learning - I've prepared for you the Action plan to Master Neural networks. for you.
7 days of free advice from an Artificial Intelligence engineer to learn how to master neural networks from scratch:
- Plan your training
- Structure your projects
- Develop your Artificial Intelligence algorithms
I have based this program on scientific facts, on approaches proven by researchers, but also on my own techniques, which I have devised as I have gained experience in the field of Deep Learning.
To access it, click here :