Scrape User Accounts from Instagram & TikTok

Post Time: Sep 26, 2024
Last Time: Nov 25, 2024

In today’s data-driven landscape, social media platforms like Instagram and TikTok are rich sources of information. Whether you're analyzing trends, gathering insights, or building a dataset for research, scraping user accounts can be highly beneficial. This article will guide you through scraping user accounts from Instagram and TikTok using AWS infrastructure, along with Python tools and libraries.

Why Use AWS for Scraping?

AWS (Amazon Web Services) provides scalable and reliable cloud computing resources. By leveraging AWS, you can:

  • Scale your scraping operations efficiently.
  • Access powerful EC2 instances for processing.
  • Store large datasets securely in S3.

Prerequisites

Before diving into the scraping process, ensure you have:

  • An AWS account.
  • Basic knowledge of Python and web scraping concepts.
  • Familiarity with command-line operations.

Step 1: Setting Up Your AWS Environment

A. Create an AWS Account

Sign up for an account at AWS. You may need to provide billing information, but AWS offers a free tier for new users.

B. Launch an EC2 Instance

Navigate to the EC2 dashboard and launch a new instance. Choose an Amazon Machine Image (AMI), preferably Ubuntu for ease of setup. Select an instance type; t2.micro is often sufficient for low-volume scraping. Configure the security group to allow SSH access.

C. Connect to Your Instance

Use SSH to connect to your EC2 instance:

bash Copy
1ssh -i your-key.pem ubuntu@your-ec2-public-ip
2

Step 2: Install Necessary Dependencies

A. Install Python and Libraries

Once connected, install Python and the required libraries.

1. Install Python:

bash Copy
1sudo apt update
2sudo apt install python3 python3-pip
3

2. Install Libraries:

bash Copy
1pip3 install requests beautifulsoup4 selenium instaloader TikTokApi
2

B. Set Up Web Driver for Selenium

Install Chrome and ChromeDriver (if you plan to use Selenium):

  • Download Chrome from here.
  • Download ChromeDriver from here.

Step 3: Scraping Instagram Accounts

A. Using Instaloader

Instaloader is a powerful tool specifically designed for Instagram scraping.

Basic Usage

  1. Log in and Scrape User Data:
python Copy
1import instaloader
2
3L = instaloader.Instaloader()
4L.login('your_username', 'your_password')  # Replace with your credentials
5
6# Get profile information
7profile = instaloader.Profile.from_username(L.context, 'target_username')  # Replace with target username
8
9print(f'Username: {profile.username}')
10print(f'Bio: {profile.biography}')
11print(f'Followers: {profile.followers}')
12print(f'Following: {profile.followees}')
13
14# Scraping posts
15for post in profile.get_posts():
16    print(f'Post URL: {post.url}')
17

B. Using Selenium

If you need to scrape data from a public profile or handle specific interactions:

python Copy
1from selenium import webdriver
2from selenium.webdriver.common.by import By
3import time
4
5# Set up Selenium
6driver = webdriver.Chrome()  # Ensure chromedriver is in your PATH
7driver.get('https://www.instagram.com/accounts/login/')
8
9# Wait for the login page to load
10time.sleep(3)
11
12# Log in
13username_input = driver.find_element(By.NAME, 'username')
14password_input = driver.find_element(By.NAME, 'password')
15
16username_input.send_keys('your_username')
17password_input.send_keys('your_password')
18password_input.submit()
19
20# Wait for the profile page to load
21time.sleep(5)
22
23# Navigate to target profile
24driver.get('https://www.instagram.com/target_username/')  # Replace with target username
25
26# Scrape user data
27bio = driver.find_element(By.CSS_SELECTOR, 'div.-vDIg > span').text
28print(f'Bio: {bio}')
29
30# Close the driver
31driver.quit()
32
33

Read More: When Using Selenium Proxy, Web Scraping Activities Detected As Bot

Step 4: Scraping TikTok Accounts

A. Using TikTokApi

The TikTokApi library allows easy access to TikTok's public data.

Basic Usage

python Copy
1from TikTokApi import TikTokApi
2
3api = TikTokApi.get_instance()
4
5# Get user object
6user = api.user.getUserObject('username')  # Replace with target username
7
8print(f'Username: {user.username}')
9print(f'Display Name: {user.display_name}')
10print(f'Followers: {user.follower_count}')
11print(f'Following: {user.following_count}')
12

B. Using Selenium

If you want to interact with TikTok's web interface:

python Copy
1from selenium import webdriver
2from selenium.webdriver.common.by import By
3import time
4
5# Set up Selenium
6driver = webdriver.Chrome()
7driver.get('https://www.tiktok.com/@target_username')  # Replace with target username
8
9# Wait for the page to load
10time.sleep(5)
11
12# Scrape user data
13username = driver.find_element(By.TAG_NAME, 'h1').text
14followers = driver.find_element(By.XPATH, '//strong[contains(text(),"Followers")]/..').text
15
16print(f'Username: {username}')
17print(f'Followers: {followers}')
18
19# Close the driver
20driver.quit()
21

C. Use Octoparse

For More, please read:

Step 5: Important Considerations

  1. Rate Limiting: Both Instagram and TikTok have rate limits. Be mindful of how frequently you make requests to avoid being banned.
  2. Respect Privacy: Scrape only public data and adhere to each platform's terms of service.
  3. Captcha Handling: Be prepared to handle CAPTCHA challenges, especially with automated scripts.
  4. Proxy Management: Regularly rotating proxies to reduce the risk of being blocked.

Conclusion

Scraping user accounts from Instagram and TikTok using AWS can provide valuable insights while allowing for scalable operations. By following this guide, you can set up a robust scraping environment and gather the data you need ethically and responsibly.

Start your Free Trial Now!

Click below to begin a free trial and transform your online operations.