blog.x.com Open in urlscan Pro
2606:4700:4400::ac40:98f6 Public Scan

Back to summary
URL:
https://blog.x.com/developer/en_us/a/2015/guest-post-analyzing-an-nhl-game-through-the-twitter-apis
Submission: On December 10 via api (December 10th 2024, 11:42:11 am UTC) from US — Scanned from CA
Form analysis
0 forms found in the DOM

Text Content

Twitter Developer Platform Blog Back
 * Twitter Developer Platform Blog
 * Tips
 * Community
 * Tools
 * Spotlight

Sign Up

‎English (US)‎ ‎日本語‎

‎English (US)‎ ‎日本語‎
Sign Up



GUEST POST: ANALYZING AN NHL GAME THROUGH THE TWITTER APIS

Wednesday, 18 March 2015

Link copied successfully

Early on when I was experimenting with Tweepy, I began thinking of interesting
projects that could come out of all this data I had the potential of collecting.
One idea that always stuck was the thought of collecting Tweets during sports
games and seeing what could be done with the resulting data. Being a
Philadelphia Flyers (@NHLFlyers) fan, I chose to use Twitter’s streaming API to
collect Tweets sent during their last playoff game against the New York Rangers
(@NYRangers). (Spoiler alert: my Flyers lost).



The following code takes all Tweets with the keyword ‘flyers’ and sends the time
it was created, text of the Tweet, location (if available) and the source of the
Tweet to a local MongoDB database. Even though I removed my consumer and secret
keys, you can obtain your own by creating an app on Twitter’s dev site.

import tweepy
import sys
import pymongo

consumer_key=""
consumer_secret=""

access_token=""
access_token_secret=""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        super(tweepy.StreamListener, self).__init__()

        self.db = pymongo.MongoClient().Flyers

    def on_status(self, status):
        print status.text , "\n"

        data ={}
        data['text'] = status.text
        data['created_at'] = status.created_at
        data['geo'] = status.geo
        data['source'] = status.source

        self.db.Tweets.insert(data)

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=['flyers'])


Once you run the script you will see the Tweets appear on the terminal window
and the MongoDB begin to fill. I use the Robomongo GUI to keep track of this. I
ran the script fifteen minutes prior to the beginning of the game and ended it
fifteen minutes after for the sake of consistency. By the end I had collected
35,443 Tweets. For a little bit of context I was collecting around 7,000 Tweets
for regular season Flyers games and was able to gather 640,000 during the Super
Bowl.



Once I had all of the data collected I exported a CSV of everything and began
looking at it in iPython. The code below creates a pandas dataframe from the CSV
file, makes the created_at column into the index and then converts it to a
pandas time series. I also converted the time to EST 12 hour format for graph
readability.

import pandas as pd
from pandas.tseries.resample import TimeGrouper
from pandas.tseries.offsets import DateOffset
flyers = pd.readcsv('/Users/danielforsyth/Desktop/PHINYRG3.csv')
flyers['createdat'] = pd.todatetime(pd.Series(flyers['createdat']))
flyers.setindex('createdat', drop=False, inplace=True)
flyers.index = flyers.index.tzlocalize('GMT').tzconvert('EST')
flyers.index = flyers.index - DateOffset(hours = 12)
flyers.index

Next I took a quick look at everything using the head and described methods
built into pandas.

flyers.head()




flyers.describe()




Now it was time to get the data ready to graph. One quick line and the
created_at time series is in a per minute minute format.

flyers1m = flyers['created_at'].resample('1t', how='count')
flyers1m.head()




You can also quickly find the average amount of Tweets per minute. Which in this
case was 187.

avg = flyers1m.mean()


Now that I had all of the data formatted properly I imported Vincent and created
a graph.

import vincent
vincent.core.initialize_notebook()
area = vincent.Area(flyers1m)
area.colors(brew='Spectral')
area.display()




Because the search term used here was ‘flyers’ the results are very biased
towards them. The two highest peaks in Tweet volume are during the first Flyers
goal (700 Tweets per minute) and the final Rangers goal by ex-Flyer Dan Carcillo
(938 Tweets per minute). There are also two large peaks at the beginning and end
of the game.

This was a very interesting project with some pretty cool results, especially
considering I was only using around one percent of all the Tweets being sent
during the game. If you have any questions, feedback or advice, please get in
touch with me on Twitter.

Note: This post is edited and abridged from the original version; to get more
detail on geo-locating NHL Tweets and identifying fans, read the original post.








Share:

Link copied successfully



X platform
 * X.com
 * Status
 * Accessibility
 * Embed a post
 * Privacy Center
 * Transparency Center
 * Download the X app

X Corp.
 * About the company
 * Company news
 * Brand toolkit
 * Jobs and internships
 * Investors

Help
 * Help Center
 * Using X
 * X for creators
 * Ads Help Center
 * Managing your account
 * Email Preference Center
 * Rules and policies
 * Contact us

Developer resources
 * Developer home
 * Documentation
 * Forums
 * Communities
 * Developer blog
 * Engineering blog
 * Developer terms

Business resources
 * Advertise
 * X for business
 * Resources and guides
 * X for marketers
 * Marketing insights
 * Brand inspiration
 * X Ads Academy

‎© 2024 X Corp.‎

Cookies
Privacy
Terms and conditions

Did someone say … cookies?

X and its partners use cookies to provide you with a better, safer and faster
service and to support our business. Some cookies are necessary to use our
services, improve our services, and make sure they work properly. Show more
about your choices.

 * Accept all cookies
 * Refuse non-essential cookies
blog.x.com Open in urlscan Pro 2606:4700:4400::ac40:98f6 Public Scan

Form analysis 0 forms found in the DOM

Text Content

blog.x.com Open in urlscan Pro
2606:4700:4400::ac40:98f6 Public Scan

Form analysis
0 forms found in the DOM