How to Stream Tweets using the Twitter API and Python

How to Stream Tweets using the Twitter API and Python
Photo by Alexander Shatov / Unsplash

Collecting tweets from Twitter can be done in a number of different ways. In a previous blog post,  we showed how to extract social network interactions using TWINT - a Python package for scraping Twitter without actually using the API itself as no account is needed!

In this blog post, we show how to stream tweets directly using the API. Unlike TWINT, this approach collects tweets as and when they are being tweeted. This means that in order to collect data, you need to let this run in the background over a good few days to ensure that enough data is collected. This is particularly useful if you wish to monitor an ongoing campaign or event.

Getting Started

To get things going you'll need to have created a set of API keys in order to access all the features (such as streaming) of the Twitter API.

Don't know how to generate API keys or what they are? Check out this post.

Once you have obtained your API keys, you'll need to install tweepy - a simple Python wrapper for interacting with the Twiter API. This package makes it extremely easy to build your very own streamer with very few lines of code.

$ pip install tweepy

Creating a Stream

Once you've installed it, go ahead and copy the code below and add your API bearer token where appropriate.

# Load in tweepy
import tweepy

# Create streamer class
class TweetStream(tweepy.StreamingClient):
    def on_status(self, status):
        print(status)
        
# Create a new streamer
stream = TweetStream("[BEARER_TOKEN]")

# Apply rules
stream.add_rules(tweepy.StreamRule("#coffee"))
stream.filter()

And that's it! You've created a very basic streamer for tracking #coffee. But what exactly does all this code mean? The main parts of the code are broken down as follows

Streamer Class

...
# Create streamer class
class TweetStream(tweepy.StreamingClient):
    def on_status(self, status):
        print(status)
...

To handle incomming tweets, a class named TweetStream is created which is a subclass of StreamingClient - a class provided by tweepy for basic tweet streaming.

As new tweets are created, the on_status method is called which is responsible for handling events as and when a new tweet arrives. In this case, we simply print out the tweet object.

This method can be adapted to do more than just printing tweets. It can also be used to export tweets to a database (such as MySQL or even a NoSQL database such as MongoDB) or as a raw JSON object. Ultimately, this is up to you.

In addition to this, there are a bunch of other methods which can be used in this class for handling other events. A few of these are defined below:

  • on_error: For when an error occurs
  • on_closed: When a stream has been closed by Twitter
  • on_connect: When a successful connection has been made
  • on_disconnect: For when the stream has been disconnected

The complete list can be found here.

Applying Filters

...     
# Create a new streamer
stream = TweetStream("[BEARER_TOKEN]")

# Apply rules
stream.add_rules(tweepy.StreamRule("#coffee"))
stream.filter()

Fillters can be applied using the add_rules method using an instance of the TweetStream class created above.

These are used to stream tweets which features a ceratin hashtag or keyword in the body of the text. This is achived by appling a StreamRule using the example query "#coffee".

It's worth mentioning that it's not just hashtags and keywords that can be streamed but individual users and other items too. This makes it possible to keep track of tweets that are being published by a specific user as opposed to a keyword.

For example, if I wanted to track Elon Musk's tweets and replace #coffee with "from:elonmusk".

Or if you wanted to track tweets featuring a certian URL, you can use url:"https://dataground.io".

There are so many possible rules that could be applied for streaming such as streaming according to location, mention and even retweets. A complete list of streaming rules provided by tweet can be found here.

Conclusions

Overall I hope you can see the streaming tweets from Twitter is fairly straightforward using the tweepy package provided in Python. In this post, we covered a very basic setup for collecting tweets from the Twitter API using a single search term but this set up can be extended to include multiple search queries and could be used to export tweets to a database.