Welcome! Log In Create A New Profile

Advanced

[PHP] Retrieve pages from an ASP driven site

Posted by EPA WC 
EPA WC
[PHP] Retrieve pages from an ASP driven site
May 03, 2012 06:40AM
Hi List,

I am trying to write a crawler to go through web pages at
http://www.freebookspot.es/CompactDefault.aspx?Keyword=. But I am not
quite familiar with how asp uses _doPostBack function with the "next"
button below the book list to advance to the next page. I hope someone
who knows ASP well can help out here. I need to know how to retrieve
next page with PHP code.

Kind regards,
Tom

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Terry Ally (Gmail)
Re: [PHP] Retrieve pages from an ASP driven site
May 03, 2012 09:40AM
Tom,

Here is how you would paginate in PHP.

/****************************************/
// Number of records to show per page:
$display = 4;
// Determine how many records there are.
if (isset($_GET['np'])) {
$num_pages = $_GET['np'];
} else {
$query = "SELECT * FROM mytable";
$query_result = mysql_query ($query) or die (mysql_error());
$num_records = @mysql_num_rows ($query_result);
if ($num_records > $display) {
$num_pages = ceil ($num_records/$display);
} else {
$num_pages = 1;
}
}
// Determine where in the database to start returning results.
if (isset($_GET['s'])) {
$start = $_GET['s'];
} else {
$start = 0;
}


// Number of records to show per page:
$display = 4;
// Determine how many records there are.
if (isset($_GET['np'])) {
$num_pages = $_GET['np'];
} else {
$query3 = "SELECT * FROM mytable";
$query_result = mysql_query ($query3) or die (mysql_error());
$num_records = @mysql_num_rows ($query_result);
if ($num_records > $display) {
$num_pages = ceil ($num_records/$display);
} else {
$num_pages = 1;
}
}
// Determine where in the database to start returning results.
if (isset($_GET['s'])) {
$start = $_GET['s'];
} else {
$start = 0;
}
/****************************************/




On 3 May 2012 05:37, EPA WC <[email protected]> wrote:

> Hi List,
>
> I am trying to write a crawler to go through web pages at
> http://www.freebookspot.es/CompactDefault.aspx?Keyword=. But I am not
> quite familiar with how asp uses _doPostBack function with the "next"
> button below the book list to advance to the next page. I hope someone
> who knows ASP well can help out here. I need to know how to retrieve
> next page with PHP code.
>
> Kind regards,
> Tom
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


--
*Terry Ally*
Twitter.com/terryally
Facebook.com/terryally
~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~
To print or not to print this email is the environmentally-searching
question!
Which has the highest ecological cost? A sheet of paper or constantly
switching on your computer and connecting to the Internet to read your
email?
Lester Caine
Re: [PHP] Retrieve pages from an ASP driven site
May 03, 2012 10:00AM
Terry Ally (Gmail) wrote:
> Here is how you would paginate in PHP.

Terry - Tom is not trying to create this in PHP, but read existing ASP pages.

Tom - I don't think that it's simply a matter of the ASP code here, but rather
how they have constructed the set of information they are sending back. That is
done in javascript, but the navigation buttons are simple form submit. BNext is
submitted for 'next'.

Interestingly, the sales side seems to be .php ;)

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Thanks Lester.

On Thu, May 3, 2012 at 3:49 AM, Lester Caine <[email protected]> wrote:
> Terry Ally (Gmail) wrote:
>>
>> Here is how you would paginate in PHP.
>
>
> Terry - Tom is not trying to create this in PHP, but read existing ASP
> pages.
>
> Tom - I don't think that it's simply a matter of the ASP code here, but
> rather how they have constructed the set of information they are sending
> back. That is done in javascript, but the navigation buttons are simple form
> submit. BNext is submitted for 'next'.
>
> Interestingly, the sales side seems to be .php ;)
>
> --
> Lester Caine - G8HFL
> -----------------------------
> Contact - http://lsces.co.uk/wiki/?page=contact
> L.S.Caine Electronic Services - http://lsces.co.uk
> EnquirySolve - http://enquirysolve.com/
> Model Engineers Digital Workshop - http://medw.co.uk//
> Firebird - http://www.firebirdsql.org/index.php
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
tamouse mailing lists
Re: [PHP] Retrieve pages from an ASP driven site
May 05, 2012 04:10AM
On Wed, May 2, 2012 at 11:37 PM, EPA WC <[email protected]> wrote:
> Hi List,
>
> I am trying to write a crawler to go through web pages at
> http://www.freebookspot.es/CompactDefault.aspx?Keyword=. But I am not
> quite familiar with how asp uses _doPostBack function with the "next"
> button below the book list to advance to the next page. I hope someone
> who knows ASP well can help out here. I need to know how to retrieve
> next page with PHP code.
>
> Kind regards,
> Tom


Looking at that page source, I think this might be a bit problematic.

Notice that practically the whole page is inside a form. When you get
down to the "Next> " button, that is going to sumbmit the form with
it's appropriate fields set. If you look at the beginning of the form,
you'll see some interesting fields, one in particular, __VIEWSTATE is
pretty clearly an encoded value of some sort.

When your crawler parses the page, it will have to stash the field
values that the form sets in order to process the form correctly to
get the next page of entries by simulating a POST-data submit. This is
(probably?) most easily handled via libcurl.

Unsolicited advice: Many sites do not appreciate scraping activity;
make sure your crawler obeys robots.txt rules, and do not overtax the
site with crawler activity.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Sorry, only registered users may post in this forum.

Click here to login