Advanced Crawling Actions
Master the available crawler actions to efficiently extract data from complex websites.

Understanding Crawler Actions
Scrapify allows you to create sophisticated crawler sequences by combining different actions. This tutorial will cover all available actions and how to use them effectively to extract data from even the most complex websites.
Important Note
When creating crawlers, always practice ethical web scraping by respecting robots.txt files, implementing reasonable delays between requests, and avoiding excessive server load. Scrapify provides built-in tools to help you scrape responsibly.
Navigation Actions
Click Button Action
The Click Button action simulates a user clicking on an element on the webpage. This is essential for navigating through pages, submitting forms, or interacting with the website.
- When you add a Click Button action, you'll be prompted to select the element to click
- The crawler identifies clickable elements (buttons, links, etc.) as you hover over them
- Once selected, the element's XPath and other identifying attributes are stored for reliable selection during crawling
- This action is useful for navigating through pagination, opening dropdown menus, or clicking on product details
Hover Action
The Hover action simulates a user hovering over an element, which can be essential for websites that reveal content or navigation options on hover.
- Select any element on the page to hover over
- This action is useful for dropdown menus that only appear on hover
- Can be combined with a subsequent Click action to navigate multi-level menus
Scroll to Element Action
This action scrolls the viewport until the selected element is visible. Useful for websites with lazy-loading content.
- Select any element on the page to scroll to
- Especially useful for "infinite scroll" websites where content loads as you scroll down
- Ensures that elements are in view before attempting to interact with them
Scroll to Bottom Action
This action scrolls the page all the way to the bottom, useful for triggering lazy-loading or infinite scrolling mechanisms.
- No element selection needed - simply scrolls to the bottom of the current page
- Particularly useful for social media feeds or product listings that load more items when you reach the bottom
Input Actions
Enter Text Action
The Enter Text action allows you to input text into form fields, search boxes, or any text input element.
- Select an input element on the page
- Specify the text you want to enter
- This action is essential for search forms, login credentials, or filter inputs
- Combine with Click Button to submit forms after entering text
Keypress Action
The Keypress action simulates pressing a specific key on the keyboard, which can trigger various website behaviors.
- Specify which key to press (e.g., Enter, Escape, Space)
- Useful for submitting forms with Enter, closing modals with Escape, or advancing carousels with arrow keys
- No element selection is needed as this action applies to the entire page
Timing and Control Actions
Wait Action
The Wait action pauses the crawler for a specified amount of time. This is crucial for several reasons:
- Specify a duration in seconds (up to 30 seconds)
- Allows time for dynamic content to load after actions like clicking or scrolling
- Helps the crawler appear more human-like and avoid detection
- Reduces server load by spacing out requests
- Essential when navigating single-page applications where content loads dynamically
Pro Tip
Add Wait actions with varying durations (2-5 seconds) between navigation actions to make your crawler behave more like a human user. This both improves reliability by giving pages time to load and helps avoid triggering anti-bot measures on websites.
Loop Action
The Loop action allows you to repeat a sequence of actions over a set of similar elements or URLs, essential for handling pagination or processing lists of items.
Scrapify supports several loop types:
- Single Element Loop: Repeatedly perform actions on a single element (useful for clicking "Load More" buttons multiple times)
- Fixed List Loop: Iterate through a pre-defined list of similar elements (like product cards)
- Variable List Loop: Dynamically identify and loop through similar elements based on a reference element
- URL List Loop: Crawl multiple pages from a list of URLs
Loop actions can contain nested actions that will be executed for each iteration of the loop.
Data Extraction
Scrape Action
The Scrape action is at the core of data extraction, allowing you to select and extract content from elements on the page.
- Select individual elements or tables to extract data from
- Use "Similar Elements" mode to automatically identify and scrape similar elements across the page
- Choose from different selection strategies for similar elements:
- Highest Frequency: Select elements with the most common pattern (default)
- Lowest Frequency: Select elements with less common patterns
- Longest Path: Prioritize elements with more specific XPaths
- Shortest Path: Use broader XPaths for selection
- ClassList: Match elements with identical CSS classes
- Specify custom column headers for the extracted data
- Extract data from tables with automatic row/column detection
Building Advanced Crawler Workflows
Combining Actions for Complex Navigation
Real-world crawling tasks often require combining multiple actions into a logical sequence. Here's an example workflow for extracting product data across multiple pages:
- Page Navigation Setup:
- Start with a Loop (URL List) to process multiple starting URLs
- Within the loop, add a Wait action (3 seconds) to ensure page loads completely
- Category Selection:
- Add a Click Button action to select a product category
- Add a Wait action (2 seconds) for the category page to load
- Filter Application:
- Add a Click Button action to open a filter dropdown
- Add a Click Button action to select a specific filter option
- Add a Wait action (3 seconds) for filtered results to load
- Data Extraction:
- Add a Scrape action with Similar Elements enabled to extract all product listings
- Pagination:
- Add a Loop (Fixed Number) to iterate through pagination
- Inside the loop, add a Click Button action targeting the "Next Page" button
- Add a Wait action (3 seconds) for the new page to load
- Add another Scrape action to extract products from each page
Best Practice
When building complex workflows, always test with a small subset of pages first. This allows you to identify and fix any issues before scaling up to the full dataset. Remember to include adequate Wait actions between steps to ensure reliable crawling.
Anti-Detection Strategies
Many websites implement measures to detect and block automated crawlers. Here are strategies to make your crawler more human-like and avoid detection:
- Variable wait times: Add Wait actions with different durations (2-7 seconds) between actions
- Natural navigation paths: Include occasional clicks on non-target elements before returning to your main crawling path
- Scroll actions: Add scroll actions between clicks to simulate human reading behavior
- Session management: Maintain consistent sessions rather than creating new ones for each request
- Rate limiting: Limit the number of pages you crawl per minute
Troubleshooting Common Crawler Issues
Even well-designed crawlers can encounter issues. Here are solutions to common problems:
- Elements not found: Use more robust element identification by combining XPath with other attributes like ID, text content, or class lists
- Dynamic content not loading: Increase the duration of Wait actions to allow more time for JavaScript rendering
- Inconsistent scraping results: Try different selection strategies for Similar Elements to find the most reliable pattern
- Navigation failures: Add conditional checks and retry logic using Loop actions to handle unexpected site behaviors
- Being blocked or rate-limited: Implement longer Wait times and more human-like browsing patterns
Conclusion
By mastering the various crawler actions in Scrapify and combining them effectively, you can build powerful data extraction workflows for even the most complex websites. Remember to maintain ethical scraping practices by:
- Respecting robots.txt files and website terms of service
- Implementing reasonable delays between requests
- Only extracting publicly available data
- Limiting the frequency and volume of your crawling
In our next tutorial, we'll cover how to effectively organize and export the data you've collected using Scrapify's data processing features.