Skip to content
This repository was archived by the owner on Aug 12, 2024. It is now read-only.

Added universal search function which passes testing #148

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 42 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -23,7 +23,7 @@ Linux and macOS:
```bash
git clone https://github.com/bisguzar/twitter-scraper.git
cd twitter-scraper
sudo python3 setup.py install
sudo python3 setup.py install
```

Also, you can install with PyPI.
@@ -37,23 +37,51 @@ pip3 install twitter_scraper
Just import **twitter_scraper** and call functions!


### → function **get_tweets(query: str [, pages: int])** -> dictionary
You can get tweets of profile or parse tweets from hashtag, **get_tweets** takes username or hashtag on first parameter as string and how much pages you want to scan on second parameter as integer.
### → function **get_tweets(query: str, search: str [, pages: int])** -> dictionary
You can get tweets of profile or parse tweets from hashtag, **get_tweets** takes username or hashtag on first parameter as string and how many pages you want to scan on second parameter as integer.

#### Keep in mind:
* First parameter need to start with #, number sign, if you want to get tweets from hashtag.
* **pages** parameter is optional.
*get_tweets* function now supporting 'search' paramter for new search functionality.

To enable backwards compatibility with existing twitter_scraper API users, `query` can be directly addressed by using `query=` or by providing a positional string. You can get tweets of a given twitter user or parse tweets from a provided hashtag.

Example:

```python
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from twitter_scraper import get_tweets
>>>
>>>
>>> for tweet in get_tweets('twitter', pages=1):
... print(tweet['text'])
...
spooky vibe check
...
Which will function identically to:
>>> from twitter_scraper import get_tweets
>>>
>>> for tweet in get_tweets(query='twitter', pages=1):
... print(tweet['text'])
...
```

If `search` is specified, **get_tweets** will yield a dictionary for each tweet which contains the given term. The term can be any string, supporting search keywords of twitter.


#### Keep in mind:
* You must specify either `query`, or `search`. If you supply one string, `query` will be used by default.
* You can not use more than one string, and you cannot specify more than one of the two search arguments (`query`,`search`)
* **pages** parameter is optional, default is 25.

```python
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from twitter_scraper import get_tweets
>>>
>>> for tweet in get_tweets(search='to:bugraisguzar', pages=1):
... print(tweet['text'])
...
pic.twitter.com/h24Q6kWyX8
```

@@ -78,7 +106,7 @@ It returns a dictionary for each tweet. Keys of the dictionary;
You can get the Trends of your area simply by calling `get_trends()`. It will return a list of strings.

```python
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from twitter_scraper import get_trends
@@ -91,7 +119,7 @@ You can get personal information of a profile, like birthday and biography if ex


```python
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from twitter_scraper import Profile
@@ -109,7 +137,7 @@ Type "help", "copyright", "credits" or "license" for more information.
**to_dict** is a method of *Profile* class. Returns profile datas as Python dictionary.

```python
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
Python 3.7.3 (default, Mar 26 2019, 21:43:19)
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from twitter_scraper import Profile
@@ -118,8 +146,6 @@ Type "help", "copyright", "credits" or "license" for more information.
{'name': 'Buğra İşgüzar', 'username': 'bugraisguzar', 'birthday': None, 'biography': 'geliştirici@peptr', 'website': 'bisguzar.com', 'profile_photo': 'https://pbs.twimg.com/profile_images/1199305322474745861/nByxOcDZ_400x400.jpg', 'banner_photo': 'https://pbs.twimg.com/profile_banners/1019138658/1555346657/1500x500', 'likes_count': 2512, 'tweets_count': 756, 'followers_count': 483, 'following_count': 255, 'is_verified': False, 'is_private': False, user_id: "1019138658"}
```



## Contributing to twitter-scraper
To contribute to twitter-scraper, follow these steps:

@@ -139,6 +165,7 @@ Thanks to the following people who have contributed to this project:
* @bisguzar (maintainer)
* @lionking6792
* @ozanbayram
* @sean-bailey
* @xeliot


11 changes: 11 additions & 0 deletions test.py
Original file line number Diff line number Diff line change
@@ -40,6 +40,17 @@ def test_languages(self):
self.assertIsInstance(tweets[0]["replies"], int)
self.assertGreaterEqual(tweets[1]["retweets"], 0)

class TestSearch(unittest.TestCase):
def search_25pages(self):
tweets = list(get_tweets(search="hello, world!", pages=2))
self.assertGreater(len(tweets), 1)
def search_user(self):
user = "gvanrossum"
tweets = list(get_tweets(user, pages=2))
self.assertGreater(len(tweets), 1)




class TestTrends(unittest.TestCase):
def test_returned(self):
17 changes: 11 additions & 6 deletions twitter_scraper/modules/tweets.py
Original file line number Diff line number Diff line change
@@ -6,13 +6,22 @@

session = HTMLSession()

def get_tweets(query, pages=25):
def get_tweets(query=None, search=None, pages=25):
"""Gets tweets for a given user, via the Twitter frontend API."""

if not query and not search:
raise RuntimeError("Please specify a 'query' or a 'search' to check the tweets on.")
elif query and search:
raise RuntimeError("Please specify only one of either a 'search' or 'query'.")

after_part = (
f"include_available_features=1&include_entities=1&include_new_items_bar=true"
)
if query.startswith("#"):
if not query: # if query not exists, it's a search method
search_term=quote(search)
url = f"https://twitter.com/i/search/timeline?f=tweets&vertical=default&q={search_term}&src=tyah&reset_error_state=false&"

elif query.startswith("#"):
query = quote(query)
url = f"https://twitter.com/i/search/timeline?f=tweets&vertical=default&q={query}&src=tyah&reset_error_state=false&"
else:
@@ -59,13 +68,9 @@ def gen_tweets(pages):


tweet_id = tweet.attrs["data-item-id"]

tweet_url = profile.attrs["data-permalink-path"]

username = profile.attrs["data-screen-name"]

user_id = profile.attrs["data-user-id"]

is_pinned = bool(tweet.find("div.pinned"))

time = datetime.fromtimestamp(