Training Platforms

Adding new Platforms to ScrapeBox

Platform Guide

  • Train new Platforms
  • Enhance Existing Platforms
  • Detect Different Captchas
  • Modify Search Footprints
  • Multi-step Forms
  • Change Success/Fail Footprints

Notepad ini

Since 2011 ScrapeBox has had the ability to learn new platforms, and ScrapeBox can post to virtually any platform or form that doesn’t require a user account to be created on the site website. So it can post to blog platforms, guestbooks, contact forms, trackbacks, some open forums and wiki’s.

In order to work with a platform, you will need to create a definition file which is just a plain text file using the Microsoft .ini format like the screenshot above. This consists of [Sections] which contain a number of Name=value keys. The first section in the ScrapeBox platform files is…

[Setup]

The setup decides the basics on what footprints are used in the harvester to find the platform, how ScrapeBox can identify this platform once it loads a page, how can ScrapeBox detect if a comment to this platform was successful or failed and things like how to handle the URL’s and navigating the pages. Below are the available Name= entries that are valid for the setup.

FriendlyName= Any name you want to call the platform, will be used in the GUI.
UseBlackList= Values can be 1 to use a blacklist or 0 not not use the blacklist. This is the bad words list you can edit in the poster.
UseWhiteList= Values can be 1 to use a whitelist or 0 not not use the whitelist. This is the bad words list you can edit in the poster.
Platform= This is the type of platform it is, such as Blog, GuestBook, Image, Forum, Contact Form, Trackback and used used to group similar platforms.
Markup= How to handle links and code, values can be HTML or BB
PageMustContain= If any of the given strings can be found in the pagecode, the page is valid. | is interpreted as OR, * is interpreted as AND
Success= If any of the given strings can be found in the resultpage after post, the submission was a success. | is interpreted as OR, * is interpreted as AND
Failed= If any of the given string can be found in the resultpage after post, the submission failed. | is interpreted as OR, * is interpreted as AND

All platform definition files should have the above fields added and set, they are essentially the minimum “Required” fields to form the [Setup] for a platform platform file. The fields below are not required, but often must be used to perform more advanced functions in order to post to some platforms.

PageMustNotContain= If any of the given strings can be found in the pagecode, the page is invalid. | is interpreted as OR, * is interpreted as AND
Enctype= The Encoding type if you wish to override the forms default encoding such as application/x-www-form-urlencoded
LoadUrl= Locate the given url and load the target page. Will skip RemovefromUrl, RemoveFromUrlAfter, and ModifyUrl
LoadUrlFromAnchor= Locate the given anchor, grab the url and load the target page. Will skip RemovefromUrl, RemoveFromUrlAfter, and ModifyUrl
RemoveFromUrl= Remove given strings from the baseurl. Multiple strings are separated with |
RemoveFromUrlAfter= Remove everything from the position of given strings in the baseurl. Multiple strings are separated with |
ModifyUrl= Add something to the baseurl. variables %host% and %path% can be used to rebuild the baseurl.
DeleteCookies= List of cookie names to delete

Guestbook Example

Bella GuestbookHere you can see a basic example of the [Setup] for Bella Guestbook.

For the PageMustContain, PageMustNotContain, Success and Failed values this scans the page contents for the markers you add so you can add text, html, javascript or anything in the page content.

This platform also uses 2 optional values RemoveFromUrl and ModifyUrl. This tells ScrapeBox when it lands on the guestbook, no matter what the page it should trim index.php and sign.php and everything after these like querystrings from the URL, then load %host%%path%sign.php so if it landed on scrapebox.com/guestbook/index.php?page=123 it would strip the last part and load scrapebox.com/guestbook/sign.php

This is used when the page you need to post the comment on is different then the page you load. So you can train ScrapeBox to navigate to the correct page to make the post.

Once the [Setup] has been created, next is the [Step] which deals with making the post. The following are the available options and variables for the Step sections.

DoStepIf= Process this step only when any of the given strings can be found in the page code. | is interpreted as OR, * is interpreted as AND. If not set, the step will be processed always.
FormMustContain= The form is valid when any of the given strings can be found in the form. | is interpreted as OR, * is interpreted as AND
FormMustNotContain= If the form contains any of the given strings, the form is invalid. | is interpreted as OR, * is interpreted as AND
PostUrl= A | separated list of url parts used to grab the post url. It looks between <form and >
AddToPostUrl= A value added to post url. Masks (%…%) can be used.
DelayPost= Delay post by the given number of seconds. The variable %rndnum-x-y% can be use too.
DelayPostIf= Only delay the post when any of the listed strings can be found. Multiple strings are separated with |
AddToPostDataIfInpage= Will add all AddToPostData= fields when any of the with | separated strings is found in the pagecode.
AddToPostData= fieldname=variable will be added to the postdata when the AddToPostdataIfInPage condition is true. When no AddToPostDataIfInpage if set, AddToPostData will be added always.
EncodeFieldNames= 1 will url encode fieldnames.

Fieldnames can contain * as a wildcard. So if fieldname is captcha_code123 where 123 is different on each blog/post then captcha_code*=%captcha% will match.

Variables:
All ini setting using variables allow spintax, for example thename={%rnd-name%|%rnd-email%} is valid. Values assigned to variables also allow spintax.
%host% Represents the host name of the target url
%path% Represents the path of the target url
%rnd-name% Returns a random name from the file ~cpn.txt. Spintax allowed.
%rnd-email% Returns a random email from the file ~cpe.txt Spintax allowed.
%rnd-website% Returns a random website from the file ~cpw.txt Spintax allowed.
%rnd-comment% Returns a random comment from the file ~cpc.txt Spintax allowed.
%rnd-option% Return a random option. Values are grabbed from the <select/option tags of the form
%rnd-location% Spintax allowed.
%rndnum-x-y% Returns a random number between x and y.
%ignore% Just use the original value represented in the form.
%user-domain% Extract the domain of the user’s website generated previously by %rnd-website%
%user-name% Previously by %rnd-name% generated username
%user-email% Previously by %rnd-email% generated email
%user-comment% Previously by %rnd-comment% generated comment
%user-location% Previously by %rnd-location% generated location
%user-website% Previously by %rnd-website% generated website
%wphashcash% Result of WPHashCash processing (internal code)
%captcha% Image captcha result
%question% text captcha result
%serverstatus-200% Represents server status code 200
%serverstatus-302% Represents server status code 302
%header-xxxx% Checking the post header for the presence of xxxx in it.
%unixtimestamp% returns the current unix timestamp
%unixtimestampms% returns the current unix timestamp in milli seconds
%xxxxxx% Executing a section with the name xxxxx

You can have multiple [Step] configured for multi-step forms that may require you to fill out info on 2 or more pages.

Comment Poster Tutorial

View our video tutorial showing the Comment Poster in action. This feature is included with ScrapeBox, and is also compatible with our Automator Plugin.

We have hundreds of video tutorials for ScrapeBox.

View YouTube Channel