there are several limitations imposed by the Twitter API, but there are definitely some workarounds. If you’re tracking a specific account, you can retrieve up to 3,200 of its most recent tweets using this method ( https:///rest/reference/get/statuses/user_timeline ). An example of implementation using Python is in my book ( https:///bonzanini/Book-SocialMediaMiningPython/blob/master/Chap02-03/twitter_get_user_ ). On top of the limitation given by the total number of tweets that you can retrieve with this approach, there is also a rate limit (described in the Twitter API link above), so retrieving a lot of data will likely require some time just because you need to pause the requests (they don’t let you hammer the API). If a user tweets a lot, you’re unlikely to be able to capture a specific time window in the past because you can only retrieve the most recent 3,200 tweets.
Domain knowledge is also critical for outlier detection needed to clean data and avoid
classic problems such as a juvenile crime committed by a 80-year-old "child". If
a data mining model were build using the data in Figure 1, it is possible that outliers
(most likely caused by incorrect data entry) will skew the resulting model (especially the
zero-year-old children, which are more reasonable than eighty-year-old children). The
common role of visualization here is mostly in terms of annotating model structures with
domain knowledge that they violate.