Scraping data from a dynamic website is not easy. Many of our readers requested a few examples of how to scrape a dynamic website using Python. This is the second example in the tutorial series. You can see the first example here.

ULTA Beauty, America’s largest beauty specialty store, is turning to technology to make shopping for beauty products a little simpler and more enjoyable. It is creating an omnichannel approach to enhance the customer experience, whether they are buying in-person or online. A concept ULTA refers to as “connected beauty.” Our goal is to assist fragrance enthusiasts worldwide in exploring, discovering, and enjoying the enthralling world of scent.

If you do this by manually searching on the website, it will take you forever to find every detail about a fragrance. That’s where web scraping comes in. Web scraping is a process by which you can extract data from websites and transform it into .csv/json files, which helps you to analyze and understand the fragrance scent and brand, price, reviews, and much more.

In this blog, we’ll see how to scrape the data from Ulta Beauty’s Website. We’ll use Python to scrape the Women Fragrance Data from Ulta Beauty’s Website and save it as a CSV file. Then, we can analyze the data using python or another program. We will be extracting the following data attributes from the individual pages of Ulta Beauty.

  • Product URL – The URL gets us to the target page of Women’s Fragrance.

  • Product Name – The name of the Women’s Fragrance products.

  • Brand – The brand of Women Fragrance Products.

  • Number of Reviews – The number of Women Fragrance Products reviews.

  • Rating – The rating of Women Fragrance Products.

  • Fragrance Description – The description of Women’s Fragrance Products.

  • Details – The details of Women’s Fragrance Products include Composition, Fragrance Family, Scent Type, Key Notes, and Features of each Women’s Fragrance Product.

  • Ingredients – The Ingredients of the Women Fragrance Products.

Web scraping with Python Packages

In this tutorial, we will be using Python to extract data. There are several reasons why Python is a good choice for web scraping:

  1. Python has a large and active community, which means that many libraries and frameworks can help you with web scraping. For example, Beautiful Soup is a popular library for parsing HTML and XML documents.

  2. Python is easy to learn and use, especially for those new to programming. The syntax is simple and readable, which makes it a good choice for prototyping and rapid development.

  3. Python is efficient and fast. It can handle large amounts of data and can scrape websites that are heavily loaded with JavaScript, CSS, and other resources.

  4. Python has good support for handling different types of data, such as text, images, and videos. This makes it easy to extract and process data from websites containing various media types.

  5. Python is a versatile language that can be used for many purposes beyond web scraping. This means that you can use the skills you learn for web scraping in other areas of programming as well.

Importing Libraries:

The first step is importing the required libraries. Here we use a mix of BeautifulSoup and Selenium to scrape the data. So We first import BeautifulSoup, Selenium, Webdriver, Unidecode, Time modules, and ElementTree (Etree).

We’ve also used the Beautiful Soup and Etree libraries here. Beautiful Soup parse HTML into an easily machine-readable tree format to extract DOM Elements quickly. It allows extraction of a specific paragraph and table elements with a specific HTML ID or Class or XPATH. Whereas, Etree is a Python library for parsing and generating XML data. It’s an alternative to the standard ElementTree package, which allows you to easily parse, generate, validate, and otherwise manipulate XML data.

Selenium is a tool designed to automate Web Browsers. Additionally, it is very useful to web scrape because of these automation capabilities like Clicking specific form buttons, Inputting information in text fields, and Extracting the DOM elements for browser HTML code.

These are the necessary packages that are required to extract data from an HTML page.

import re
import time
import random
import warnings
import pandas as pd
from typing import List
from lxml import etree as et
from bs4 import BeautifulSoup
from selenium import webdriver
from import By
from selenium.webdriver.common.keys import Keys
from import ChromeDriverManager
from selenium.common.exceptions import NoSuchElementException

driver = webdriver.Chrome(ChromeDriverManager().install())

Here Chrome and ChromeDriver are installed with the Selenium package. The ‘ChromeDriverManager’ library helps manage the ChromeDriver executable, which is used by webdriver to control the Chrome browser.

To scrape or extract data, you first need to know where that data is located. For that reason, locating website elements is one of the essential requirements of web scraping. There are a few standard ways to find a specific element on a page. For example, you could search by the tag’s name OR filter for a specific HTML class or HTML ID or use CSS selectors or XPath expressions. But, as usual, the easiest way to locate an element is to open your Chrome dev tools and inspect the element you need.

How to Scrape a Dynamic Website Using Python

Extraction of Page Links:

The second step is extracting the resultant page links, searching for Women Fragrance Products lying on several website pages, and needing to go from one page to another to see the remaining products.

How to Scrape a Dynamic Website Using Python

So first, we want to scrape a website and collect the URLs of different pages of search results. Here are six resultant pages so we can scrape the page URL of each page on the website from the Base URL. Here a while loop is used to iterate through the search result pages. The loop starts by navigating to the current URL using the ‘driver. get()’ method. It then obtains the page’s HTML source code using ‘driver.page_source’ and parses it using the BeautifulSoup library. We are opening the website with Selenium and parsing the page’s main content with BeautifulSoup.

#List to store the url of every resultant page
def get_page_urls(url):
    page_urls = [url]
    while url:
        soup = BeautifulSoup(driver.page_source, 'html.parser')
        next_page = soup.find('li', class_='next-prev floatl-span').find('a', class_='next')
        if next_page:
            url = "" + next_page['href']
            url = None
    return set(page_urls)
# Ulta website link
url = ""
page_urls = get_page_urls(url)

We need URLs for every page. After the while loop finishes executing, we store each of this page_url in the list page_lst_link. Here we used HTML class to locate the elements.

Extraction of Product Links:

The next step is extracting product links from the resultant pages. Using the extracted page link in the second step, we can easily extract the resultant product link from the corresponding pages. Here, the ‘page_lst_link’ variable should contain a list of page links from which you want to scrape product links. The code will iterate through each page link in the list and use the web driver to navigate to that page. It will then use BeautifulSoup to parse the HTML of the page and extract all product links.

# Fetching all resulted product links
def get_product_links(page_urls: List[str]) -> List[str]:
    product_links = []
    for url in page_urls:
        response = requests.get(url)
        soup = BeautifulSoup(response.content, 'html.parser')
        links = ["" + row.a['href'] for row in soup.find_all('p', class_='prod-desc')]
    return product_links

product_links = get_product_links(page_urls)

# Indicate scraping completion
print("Got All Product Links! There are {} products in total.".format(len(product_links)))

We need URLs for every Women’s Fragrance product. So we create a for loop for getting each product_url and store each of these product links in the list product_links. Here we used the HTML class to locate the elements.

Creating Dataframe to Store the Data:

The next step is to create a dataframe to store the extracted data. Here we are creating a dataframe with nine columns like Product URL, Product Name, Brand, Number of Reviews, Rating, Fragrance Description, Details, and Ingredients .

# Creating a dictionary of the required columns
data = {
    'Product_url': [], 
    'Brand': [], 
    'Product_name': [],
    'Number_of_reviews': [],
    'Details': [], 
    'Star_rating': [],
    'Price': [], 
    'Fragrance Description': [],
    'Ingredients': []
# Creating a dataframe with those columns
df = pd.DataFrame(data)

Information Extraction:

In this step, we will identify wanted attributes from Ulta Beauty’s Website and extract the Product Name, Brand, Number of Reviews, Rating, Fragrance Description, Details, and Ingredients of each product.

def extract_content(url):
    page_content = driver.page_source
    product_soup = BeautifulSoup(page_content, 'html.parser')
    dom = et.HTML(str(product_soup))
    return dom

The function extract_content() method is to scrape the content of a web page at the specified URL using the Selenium library and a web driver. The content is then parsed using the BeautifulSoup library and returned as an ‘lxml.html.HtmlElement’ object.

We pass the_url as an argument. We then store the requests from the URL in page_content using selenium web driver. We create the product_soup variable by parsing page_content with Beautifulsoup and also create the dom using the ElementTree. This method returns the dom, which you can then use to extract specific elements from the page using methods like .xpath() and .cssselect().

Extraction of Brand of the Products:

Here is the function to extract the brand name from an lxml.html.HtmlElement object using an XPath expression. In this step, we iterate through the brand of each product list one by one. Whenever the loop picks up the URL, we use Xpath to find the above attributes. Once the attributes are extracted – data will be added to the corresponding column. Sometimes data will be obtained in [“Brand”] this format. So we will remove those unwanted characters here.

def Brand(dom):
    brand = dom.xpath('//*[@id="92384e5c-2234-4e8f-bef7-e80391889cfc"]/h1/span[1]/a/text()')
    if not brand:
        brand = 'brand is not available'
        brand = re.sub('[\[\]\']', '', str(brand))
    df['Brand'].iloc[each_product] = brand
    return brand

Extraction of Product Name:

Here is the function to extract the product name. This function appears to be similar to the Brand() function, but it is intended to extract the product name from an lxml.html.HtmlElement object using an XPath expression. We iterate through the product name list one by one. Every time the loop picks up the URL, we use Xpath to find the attributes listed above. Once the attributes are extracted – data will be added to the corresponding column. Sometimes data will be obtained in [“product name”] this format. So we will remove those unwanted characters here.

def Product_name(dom):
    product = dom.xpath('//*[@id="92384e5c-2234-4e8f-bef7-e80391889cfc"]/h1/span[2]/text()')
    if product:
        product = re.sub(r'[\[\]\'\"]', '', str(product))
        df.loc[each_product, "Product_name"] = product
        df.loc[each_product, "Product_name"] = "Product name is not available"
    return product

Similarly, we can extract the Number of Ratings, Ratings, and Ingredients.

Number of Ratings of the Products :

def Reviews(dom):
    number_of_reviews = dom.xpath('//*[@id="92384e5c-2234-4e8f-bef7-e80391889cfc"]/div/span[2]/text()')
    if number_of_reviews:
        number_of_reviews = re.sub(r'[\,\(\)\[\]\'\"]', '', str(number_of_reviews))
        df.loc[each_product, "Number_of_reviews"] = number_of_reviews
        df.loc[each_product, "Number_of_reviews"] = "Number of reviews is not available"
    return number_of_reviews

Ratings of the Products :

def Star_Rating(dom):
    star_rating = dom.xpath('//*[@id="92384e5c-2234-4e8f-bef7-e80391889cfc"]/div/a/span/text()')
    if star_rating:
        star_rating = re.sub(r'[\,\(\)\[\]\'\"\ Q & A\ Ask A Question]', '', str(star_rating))
        df.loc[each_product, "Star_rating"] = star_rating
        df.loc[each_product, "Star_rating"] = "Star rating is not available"
    return star_rating

Ingredients of the Products :

def Ingredients(dom):
    ingredients = dom.xpath("//*[@aria-controls='Ingredients']//p/text()")
    if ingredients:
        ingredients = re.sub(r'[\[\]\']', '', str(ingredients))
        df.loc[each_product, "Ingredients"] = ingredients
        df.loc[each_product, "Ingredients"] = "Ingredients is not available"
    return ingredients

In the next step we call the functions. Here the loop iterates over the rows of a dataframe, extracts the Product_url column for each row, and passes it to the extract_content() function to get the page content as an lxml.html.HtmlElement object. It then calls several functions (Brand(), Product(), Reviews(), Star_rating(), and Ingredients()) on the product_content object to extract specific data from the page.

for each_product in range(len(df)):
    product_url = df['Product_url'].iloc[each_product]
    product_content = extract_content(product_url)

Extraction of Price of the Products :

Here is the function to extract the price of a product from a web page. We iterate through the price of each product list one by one. Sometimes when we try to extract the data with Python BeautifulSoup, it can’t access dynamic content because it is just an HTTP client and will not be able to access dynamic content. In that case, we will use Selenium to parse the data; Selenium works because Selenium is a full browser with a javascript engine. Whenever the loop picks up the URL, we use Xpath to find the above attributes. Once the attributes are extracted – data will be added to the corresponding column. Sometimes data will be obtained in a different format so that we will remove those unwanted characters here.

def Price():
    prices = driver.find_element("xpath",'//*[@id="1b7a3ab3-2765-4ee2-8367-c8a0e7230fa4"]/span').text
    if prices:
        prices = re.sub(r'[\$\,\(\)\[\]\'\"]', '', prices)
        prices="Price is not available"
    return prices

Similarly we can extract the Fragrance Description and Details.

Fragrance Description of the Products:

def Fragrance_Description():
    element = driver.find_element("xpath", '//*[@id="b46bc3ad-9907-43a6-9a95-88c160f02d7f"]/p')
    if element:
        description = element.text
        description = re.sub(r'[\[\]]', '', description)
        description = "Fragrance description is not available"
    return description

Details of the Products:

def Detail():

    details = driver.find_element("xpath", "//*[@aria-controls='Details']").text
    if details:
        return details
        return "Details are not available"
    return Details          

To extract the details, we need to first click on the ‘+’ button. We will click on this button by selenium using Xpath.

How to Scrape a Dynamic Website Using Python

The ‘Details’ data includes information about the Composition, Fragrance Family, Scent Type, Key Notes, and Features of each Women’s Fragrance Product. So we can extract that information to another column if we need.

In the next step we call the functions. Here the loop that iterates over the rows of a dataframe and extracts specific data from the web page at the URL in the Product_url column of each row. The data is extracted using the Price(), Fragrance_Description(), and Detail() functions and is added to the corresponding columns of the dataframe.

for each_product in range(len(df)):
    df["Price"].iloc[each_product]= price                                       
    df['Fragrance Description'].iloc[each_product] = fragrance_description          
    df['Details'].iloc[each_product] = details                                     


We write the data of each Women Fragrance onto the csv file.

# Convering data to a csv file


Python and Selenium are powerful tools for web scraping dynamic websites. Python is a popular and easy-to-learn programming language that offers a wide range of libraries and frameworks for handling different types of data. Selenium is a browser automation library that can simulate user interactions with websites, making it possible to scrape websites that use JavaScript and other dynamic features. Together, Python and Selenium can be used to effectively and efficiently scrape data from dynamic websites, making them a valuable choice for data extraction tasks.

Want to gain a competitive advantage by gathering product information through web scraping?

Unlock the power of data with our web scraping services. Don’t let your competition stay ahead; contact us today and see how we can help you gain a competitive edge!

Related Reading:

  1. How to Build an Amazon Price Tracker using Python

  2. The Most Popular Smartwatches: A Data-Driven Comparison

  3. How to Scrape Data from Flipkart using Python