Scrape User Accounts from Instagram & TikTok
In today’s data-driven landscape, social media platforms like Instagram and TikTok are rich sources of information. Whether you're analyzing trends, gathering insights, or building a dataset for research, scraping user accounts can be highly beneficial. This article will guide you through scraping user accounts from Instagram and TikTok using AWS infrastructure, along with Python tools and libraries.
AWS (Amazon Web Services) provides scalable and reliable cloud computing resources. By leveraging AWS, you can:
- Scale your scraping operations efficiently.
- Access powerful EC2 instances for processing.
- Store large datasets securely in S3.
Before diving into the scraping process, ensure you have:
- An AWS account.
- Basic knowledge of Python and web scraping concepts.
- Familiarity with command-line operations.
Sign up for an account at AWS. You may need to provide billing information, but AWS offers a free tier for new users.
Navigate to the EC2 dashboard and launch a new instance. Choose an Amazon Machine Image (AMI), preferably Ubuntu for ease of setup. Select an instance type; t2.micro is often sufficient for low-volume scraping. Configure the security group to allow SSH access.
Use SSH to connect to your EC2 instance:
1ssh -i your-key.pem ubuntu@your-ec2-public-ip
2
Once connected, install Python and the required libraries.
1sudo apt update
2sudo apt install python3 python3-pip
3
1pip3 install requests beautifulsoup4 selenium instaloader TikTokApi
2
Install Chrome and ChromeDriver (if you plan to use Selenium):
- Download Chrome from here.
- Download ChromeDriver from here.
Instaloader is a powerful tool specifically designed for Instagram scraping.
Basic Usage
- Log in and Scrape User Data:
1import instaloader
2
3L = instaloader.Instaloader()
4L.login('your_username', 'your_password') # Replace with your credentials
5
6# Get profile information
7profile = instaloader.Profile.from_username(L.context, 'target_username') # Replace with target username
8
9print(f'Username: {profile.username}')
10print(f'Bio: {profile.biography}')
11print(f'Followers: {profile.followers}')
12print(f'Following: {profile.followees}')
13
14# Scraping posts
15for post in profile.get_posts():
16 print(f'Post URL: {post.url}')
17
If you need to scrape data from a public profile or handle specific interactions:
1from selenium import webdriver
2from selenium.webdriver.common.by import By
3import time
4
5# Set up Selenium
6driver = webdriver.Chrome() # Ensure chromedriver is in your PATH
7driver.get('https://www.instagram.com/accounts/login/')
8
9# Wait for the login page to load
10time.sleep(3)
11
12# Log in
13username_input = driver.find_element(By.NAME, 'username')
14password_input = driver.find_element(By.NAME, 'password')
15
16username_input.send_keys('your_username')
17password_input.send_keys('your_password')
18password_input.submit()
19
20# Wait for the profile page to load
21time.sleep(5)
22
23# Navigate to target profile
24driver.get('https://www.instagram.com/target_username/') # Replace with target username
25
26# Scrape user data
27bio = driver.find_element(By.CSS_SELECTOR, 'div.-vDIg > span').text
28print(f'Bio: {bio}')
29
30# Close the driver
31driver.quit()
32
33
Read More: When Using Selenium Proxy, Web Scraping Activities Detected As Bot
The TikTokApi library allows easy access to TikTok's public data.
Basic Usage
1from TikTokApi import TikTokApi
2
3api = TikTokApi.get_instance()
4
5# Get user object
6user = api.user.getUserObject('username') # Replace with target username
7
8print(f'Username: {user.username}')
9print(f'Display Name: {user.display_name}')
10print(f'Followers: {user.follower_count}')
11print(f'Following: {user.following_count}')
12
If you want to interact with TikTok's web interface:
1from selenium import webdriver
2from selenium.webdriver.common.by import By
3import time
4
5# Set up Selenium
6driver = webdriver.Chrome()
7driver.get('https://www.tiktok.com/@target_username') # Replace with target username
8
9# Wait for the page to load
10time.sleep(5)
11
12# Scrape user data
13username = driver.find_element(By.TAG_NAME, 'h1').text
14followers = driver.find_element(By.XPATH, '//strong[contains(text(),"Followers")]/..').text
15
16print(f'Username: {username}')
17print(f'Followers: {followers}')
18
19# Close the driver
20driver.quit()
21
For More, please read:
- Rate Limiting: Both Instagram and TikTok have rate limits. Be mindful of how frequently you make requests to avoid being banned.
- Respect Privacy: Scrape only public data and adhere to each platform's terms of service.
- Captcha Handling: Be prepared to handle CAPTCHA challenges, especially with automated scripts.
- Proxy Management: Regularly rotating proxies to reduce the risk of being blocked.
Scraping user accounts from Instagram and TikTok using AWS can provide valuable insights while allowing for scalable operations. By following this guide, you can set up a robust scraping environment and gather the data you need ethically and responsibly.