6 Twitter Streams
This overview shows an example script to collect tweets based on keywords on a daily basis via crontab
. The script saves the stream for the specified keywords and in regular steps saves a data frame so that the file won’t be too big to handle.
6.1 Presettings
6.2 Libraries
6.3 Connect with Twitter App
The next lines of code build a connection with your twitter app. If you haven’t set up this file already please do so as it is explained in the rtweets Package documentation. Otherwise this script will not work.
6.4 Data Collection
6.4.2 Step 2)
Set up the time frame of the stream, i.e., after when a data frame will be saved to keep data files small. This version defines a stream which collects data for 4 hours straight before starting a new data frame. Repeating the script for 8 times a day thus gives a full day of Tweets.
6.4.3 Step 3)
Prepare the local directory to save the data frames.
# where is your directory?
mainDir <- "~/Data/Twitter/Streams/Bundesrat"
# Get todays date:
date <- Sys.Date()
# Make Dir for data if it does not already exist.
subDir <- paste0("bundesrat_",date)
ifelse(!dir.exists(file.path(mainDir, subDir)),
dir.create(file.path(mainDir, subDir)), FALSE)
Sys.chmod(file.path(mainDir, subDir), mode = "777", use_umask = FALSE)
6.4.4 Step 4)
With the token ready, the time set up and a place to save the tweets, data collection can be started.
# Stream Tweets to large json file (one file for every 6 hours)
# i runs up to 360 in case there is a reconnect the loop does not stop to early
setwd(paste0(mainDir,"/",subDir)) #set dir
for(i in 1:8){
part <- i
cat("Currently ", i, " steps have been streamed! ", "Day Var is: ", date,
" and partday is: ", part, "\n")
stream_tweets(q = keywords, timeout = streamtime,
parse = FALSE, verbose = FALSE,
file_name = paste0("Bundesrat_Stream_",date, "_part_",part,
".json"),
token = token)
cat("One Quarter Day has been Streamed...\n")
Sys.chmod(paste0("Bundesrat_Stream_",date, "_part_",part, ".json"),
mode = "777", use_umask = FALSE)
Sys.sleep(1)
}
6.4.5 Step 5)
In a last step, make one rds file from all json files.
# Get list of all files with the tweets collected from arena this week
path <- paste(getwd())
filenames <- list.files(path, pattern = "*.json", full.names = T)
tweetsdf <- data.frame()
for(j in filenames){
setwd(path)
tmp <- parse_stream(j)
tmpu<- as.data.frame(tmp)
tweetsdf <- rbind(tweetsdf, tmpu)
}
saveRDS(tweetsdf, paste0("bundesrat_tweets_", date,".rds"))
similarly to the user data collection, you can use crontab
to schedule a recurring stream collection.