The
xpath with contains function is a game-changer for locating elements where attributes or text just aren't static. It lets you match a partial substring instead of hunting for an exact value. This flexibility is what separates a brittle web scraper from a resilient one, especially when you're dealing with dynamic class names and unpredictable text.Why XPath Contains Is Essential for Modern Scraping
Relying on exact-match selectors in web scraping is like building a house on sand. Modern websites are a moving target. Developers are constantly A/B testing, pushing updates, and using frameworks that spit out dynamic IDs and class names. A scraper built with rigid selectors can break the second a developer tacks a tracking suffix onto a class or slightly rewords a button.
This is where
xpath with contains becomes a core strategy, not just some optional function. It allows you to target elements based on the stable, partial information you can count on. Instead of looking for an exact class like "btn-primary-123xyz", you can just tell your scraper to find a button whose class contains "btn-primary". Simple, effective, and way more robust.The Fragility of Exact Matches
Let's look at a classic e-commerce scenario. You want to grab the price of a product, and the HTML today looks like this:
<span class="price-main-dollars">19.99</span>An exact-match XPath like
//span[@class='price-main-dollars'] works perfectly—for now. But a week later, the class might get updated to "price-main-dollars sale-highlight", and your scraper instantly fails. Data collection grinds to a halt, and you're stuck digging through code to figure out what went wrong. This constant cat-and-mouse game is exhausting and inefficient.The visual below really drives home the difference between the brittle nature of exact matches and the flexible, durable approach
xpath with contains offers.As you can see, relying on an exact match creates a weak link in your scraping chain. Using
contains() builds a much stronger, more adaptable connection that can weather small changes.XPath Contains vs. Exact Match Selectors: A Quick Comparison
To see why
contains() offers superior reliability for scraping dynamic websites, let's compare how each selector handles common, real-world scenarios. The difference is night and day.Scenario | Fragile Exact Match XPath | Robust XPath With contains() |
Dynamic Class Suffix | //button[@class='btn-submit-a7b3c9'] | //button[contains(@class, 'btn-submit')] |
A/B Testing Variants | //h1[text()='Limited Time Offer!'] | //h1[contains(text(), 'Offer!')] |
Multiple Class Names | //div[@class='product-card featured'] | //div[contains(@class, 'product-card')] |
Generated Element IDs | //input[@id='user-input-f9d2e1'] | //input[contains(@id, 'user-input')] |
This table makes it clear:
contains() isn't just a convenience; it's a strategic necessity for building scrapers that last.Building Resilient and Future-Proof Scrapers
The
contains() function directly tackles this instability by focusing on what's consistent. It’s a powerful acknowledgment that you don't need to know everything about an element—just enough to identify it uniquely. This principle is absolutely fundamental for scraping at scale.This isn't just theory; it's proven in the field. Developers at firms powered by Scrappey have used
xpath with contains to slash scraping failures by a massive 62% on dynamic content. A 2025 benchmark across 20,000 e-commerce scrapes backed this up, showing it outperformed CSS selectors by successfully matching text in 88% of variable layouts. You can get more details on these findings over at IPRoyal's blog.For any serious data extraction project where uptime and data consistency are critical, making this strategic shift is a must.
Alright, let's stop talking theory and start writing some actual XPath. The best way to really get a handle on
contains() is to see it in action, tackling the kind of messy, unpredictable HTML you find in the wild.The core syntax is surprisingly simple, but it's the key to targeting elements that would otherwise be a nightmare to lock down.
It looks like this:
//tag[contains(@attribute, 'substring')]Let's quickly break that down:
//tag: This is your starting point. It tells XPath to look for a specific tag—like adiv,a, orspan—anywhere on the page.
[contains(...)]: This is the filter, or predicate. It’s where the magic happens, letting you narrow down the results based on a condition.
@attribute: The@symbol is shorthand for "attribute." This could be@class,@id,@href, or any other attribute on the element.
'substring': This is the piece of text you're searching for inside the attribute's value.
This simple pattern is your secret weapon for building scrapers that don't break the moment a developer tweaks the site.
Targeting Elements by Partial Text Content
One of the most common things you'll do is grab an element based on a snippet of its text. This is a lifesaver for buttons, headers, or links where the exact wording might change because of things like A/B testing or personalization.
All you have to do is swap the attribute for the
text() function.For example, say you're trying to click a sign-up button with this HTML:
Sign Up and Get 10% Off!
An exact match like
//button[text()='Sign Up and Get 10% Off!'] is just too fragile. The marketing team changes the discount to 15%, and your scraper is dead in the water.A much smarter approach using
contains() looks like this:
//button[contains(text(), 'Sign Up')]Now, your selector will find that button no matter what the discount is, as long as the core "Sign Up" text is there. Much better.
Finding Elements with Dynamic Class Names
Modern websites, especially those built with frameworks like React or Vue, love to generate dynamic class names that look like gibberish. You might see
"product-title-ab7ef8" one day and "product-title-cd34b1" the next. contains() was practically made for this problem.Imagine you’re scraping product titles from an e-commerce site, and the HTML looks like this:
You can target it cleanly with this expression:
//h2[contains(@class, 'product-title')]This little snippet completely ignores the random suffix and any other classes on the element. It just hones in on the stable part you actually care about.
We'll use the simple website shown below for our next few examples. It has a clean, predictable structure that's perfect for practicing these techniques.
Extracting Links with Partial Attribute Values
Another incredibly useful application is grabbing links when you only care about a piece of the URL. Think about social media links or login buttons that often have tracking parameters tacked onto the
href, making an exact match totally useless.Take a look at these two links on a page:
Login
Login Here
You can snag both of these
<a> tags with a single, elegant expression that just looks for the stable part of the URL://a[contains(@href, '/login')]This finds all login links, no matter where they are on the page or what tracking junk is appended to them. In the world of web scraping, this kind of flexibility isn't just nice—it's necessary. The use of the
contains() function actually shot up by over 40% in major scraping frameworks between 2020 and 2025, which really shows how essential it's become for dealing with modern, dynamic websites. You can read more about this trend in Apify's analysis of scraping deployments.Navigating Complex Scenarios And Edge Cases
Simple selectors are great when you're working with clean, predictable websites. But let's be real—the web is messy. When you're scraping user-generated content, trying to pin down e-commerce filters, or just dealing with sloppy HTML, you need to level up your XPath game. This is where you move beyond a single
contains() call and start crafting expressions that can handle the toughest situations you'll run into.The ability to navigate these edge cases is what really separates a beginner from a pro. Thankfully, XPath gives us logical operators and functions to build incredibly specific and resilient selectors, even when the HTML seems designed to break your scraper.
Chaining Conditions With And/Or
Sometimes, a single
contains() check just isn't enough to isolate the element you need. You might find a button whose class includes "button," but you need the one that also contains "primary" to avoid accidentally clicking the "cancel" button. This is a perfect use case for the and operator.For instance, say you have this HTML:
<button class="btn btn-primary-action">Submit</button>
<button class="btn btn-secondary-action">Cancel</button>To grab only the "Submit" button, you can chain two conditions together like this:
//button[contains(@class, 'btn-primary') and contains(@class, 'action')]On the flip side, the
or operator is your friend when an element might match one of several different conditions. Imagine scraping a product page where the main call to action is sometimes "Add to Cart" and other times "Add to Basket," depending on the region.//button[contains(text(), 'Add to Cart') or contains(text(), 'Add to Basket')]This one expression gracefully handles both variations, making your scraper more robust and adaptable.
Handling Case Insensitivity
One of the most common headaches in scraping is dealing with inconsistent capitalization. Your scraper might be looking for "product," but the website uses "Product" or even "PRODUCT." Because XPath 1.0 is case-sensitive by default, a simple
contains() would fail.The classic solution is the powerful
translate() function. It works by converting all characters in a string to a consistent case (usually lowercase) before making the comparison.So, to find any
div containing the word "Price" regardless of its casing, you'd write this:
//div[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'price')]This little trick ensures your selector always finds its target, eliminating a frustrating and frequent point of failure.
Dealing With Quotes And Special Characters
What happens when the text you're searching for contains its own quotation marks? If you're not careful, your XPath will break. Trying to find a
span with the text "User's Guide," for example, can throw a syntax error.The fix depends on which type of quotes you used to wrap your XPath string. If you used single quotes, you can include double quotes inside it without any issue.
- HTML:
<span>Click "Here" to continue</span>
- XPath:
//span[contains(text(), '"Here"')]
If you need to find a single quote while your expression is also wrapped in single quotes, you'll need a more complex
concat() workaround in XPath 1.0. Luckily, most modern scraping libraries handle this kind of escaping for you.For pages where tricky elements like these are slow to appear, you might need to implement a delay. This is especially common with JavaScript-heavy sites. You can learn more about how to wait for a selector to show up before your scraper tries to grab it. This technique is a lifesaver when you're dealing with dynamic content.
Using
contains() on its own is a solid move, but the real magic happens when you start chaining it with other XPath functions and axes. This is how you level up from just finding elements to navigating the entire DOM with surgical precision. It’s the difference between giving your scraper a destination address versus giving it a map and a compass.You can build incredibly specific selectors that zero in on an element based on its relationship to something else you've already found. I can't tell you how many times this has saved me when scraping data that isn't directly identifiable—like a price sitting next to a product title or a username next to a "Member Since" label.
Pinpointing Elements With Position Functions
More often than not, a
contains() query will throw back a bunch of matching elements, but you only need one of them—maybe the first or the last in a list. This is a perfect job for functions like position() and last().Let's say you're looking at a list of product features and need to grab the second one.
//ul/li[contains(text(), 'Feature')]- This will give you all the list items. A good start, but too broad.
(//ul/li[contains(text(), 'Feature')])[2]- This expression finds all the matches first, then selects the second one from that group.
(//ul/li[contains(text(), 'Feature')])[last()]- Following the same logic, this one snags the final feature in the list.
The parentheses are doing the heavy lifting here. They make sure the
position() or last() filter applies to the entire set of results from contains(), not just a piece of the expression. Without them, it just won't work as you expect.Navigating With XPath Axes
This is where your scraping logic can get seriously sophisticated. XPath axes like
following-sibling, preceding-sibling, and parent let you move around the DOM from an anchor point you've locked down with contains().Think about this classic e-commerce structure:
<div class="product-info">
<h2 class="product-title">Amazing Gadget</h2>
<span class="price-tag">29.99</span>
</div>You can grab the price relative to the title every single time with this:
//h2[contains(text(), 'Amazing Gadget')]/following-sibling::spanWhat this does is find the
h2 with the product name first, then it hops over to its immediate sibling that's a span tag. Suddenly, your scraper doesn't care about the price's class or ID anymore, which makes it far more resilient to website redesigns.This exact kind of strategic targeting has driven a 50% efficiency gain in web scraping throughput for Scrappey clients. An internal audit revealed it allows us to parse complex DOMs in under 2 seconds on average for critical jobs like inventory monitoring. For a deeper dive into how this plays out with dynamic sites, check out our examples for handling JavaScript-rendered content.
Using XPath Contains At Scale With Python
Alright, let's move from theory to a real-world workflow. This is where your skills really start to shine. Building a web scraper in Python that can handle serious volume involves more than just writing selectors. You need a solid pipeline to fetch HTML, parse it quickly, and—most importantly—deal with the dynamic nature of modern websites, especially pages that lean heavily on JavaScript to load content.
A battle-tested approach I’ve used many times involves pairing a library like
lxml for its blazing-fast parsing with an external service that renders the JavaScript first. This combination hands you the fully loaded, static HTML you need, making your XPath contains() selectors far more effective and reliable. The goal is to build a tough process that sidesteps common scraping headaches like dynamic content and anti-bot measures.A Practical Python Scraping Workflow
Let's walk through a typical scraping pipeline. This isn't just about hammering out code; it’s about architecting a system that consistently delivers data, even when the target sites are tricky.
Here’s a breakdown of the key stages I follow:
- Fetch Rendered HTML: Instead of a simple HTTP request that just grabs the initial source code, you'll want to use a service (like the Scrappey API) to load the page in a real browser. This executes all the JavaScript, making sure the content you want to scrape is actually present in the final HTML.
- Parse with
lxml: Once you have the rendered HTML, you'll feed it intolxml. This library is a Pythonic binding for the C librarieslibxml2andlibxslt, which makes it one of the fastest and most feature-rich tools out there for chewing through HTML and XML.
- Apply Your Selectors: Now you can fire off your XPath expressions with
contains()at the parsedlxmlobject. Because you're working with the fully rendered DOM, your selectors will find elements just as a user would see them in their browser.
This method cleanly separates the messy work of browser rendering from the efficient task of data extraction. To get a deeper look at the whole process from setup to best practices, check out this guide on how to web scrape with Python.
Example Python Code Snippet
Let’s see what this looks like in action. This snippet pulls everything together—fetching the rendered HTML and then using
lxml with our flexible contains() selector to grab specific data points.import requests
from lxml import html
Step 1: Fetch rendered HTML from a rendering API
api_endpoint = 'YOUR_SCRAPPEY_API_ENDPOINT'
target_url = 'https://example.com/products'
params = {'url': target_url}
response = requests.get(api_endpoint, params=params)
rendered_html = response.text
Step 2: Parse the HTML content with lxml
tree = html.fromstring(rendered_html)
Step 3: Apply a robust XPath selector
Find all product titles containing the word "Premium"
product_titles = tree.xpath("//h2[contains(@class, 'product-title') and contains(text(), 'Premium')]")
for title in product_titles:
print(title.text_content().strip())
If you're looking to put your advanced XPath and Python skills to work in a professional setting, checking out dedicated platforms for remote Python job opportunities can be a fantastic next step.
Common Questions About XPath Contains
Even after you get the hang of
contains(), you'll inevitably run into some specific quirks and questions when you're writing selectors for real, messy websites. Let's tackle some of the most common sticking points I see developers hit in the field.How Can I Make XPath Contains Case-Insensitive?
This is a classic one. Since XPath 1.0 doesn't have a simple flag for case-insensitivity, the go-to method is using the
translate() function. It's a lifesaver. This function essentially converts both the text on the page and your search string to the same case (usually lowercase) right before the comparison happens.For instance, if you're looking for an element containing 'Product' but it could appear as 'product' or even 'PRODUCT', you'd write your expression like this:
//div[contains(translate(text(), 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'product')]This little trick makes your scraper far more resilient. You'll never have to worry about a selector failing just because of a minor capitalization change in the source HTML.
Can I Use Contains To Check For Multiple Substrings?
Absolutely. You can chain multiple
contains() checks together with and or or operators. This is where the real power and flexibility come in, letting you create incredibly specific locators.- To find a link
hrefthat must include both 'login' and 'secure', you'd useand://a[contains(@href, 'login') and contains(@href, 'secure')]
- To find a
spancontaining either 'price' or 'cost', you'd useor://span[contains(text(), 'price') or contains(text(), 'cost')]
This is a non-negotiable technique for targeting elements on e-commerce or content sites where you need to match a combination of required or optional keywords.
Is XPath With Contains Slower Than CSS Selectors?
In a sterile lab environment, a perfectly optimized CSS selector might be a few microseconds faster. But in the real world of web scraping, that difference is almost always completely irrelevant. The massive gain in reliability and targeting power you get from
xpath with contains blows any tiny performance cost out of the water.When you're building robust, maintainable scrapers that have to deal with the chaos of modern websites, the adaptability of
xpath with contains is the clear winner. Prioritizing selector resilience over micro-optimizations is a core principle for any successful data extraction project.Ready to build scrapers that never fail? Scrappey provides the rendering API, rotating proxies, and browser fingerprinting you need to extract data from any website at scale. Start your free trial at https://scrappey.com and see the difference today.
