TCP Clients with IO::Socket
For those preferring a higher-level interface to socket programming, the IO::Socket module
provides an object-oriented approach. IO::Socket is included as part of the standard Perl
distribution as of the 5.004 release. If you're running an earlier version of Perl, just fetch
IO::Socket from CPAN, where you'll also find modules providing easy interfaces to the
following systems: DNS, FTP, Ident (RFC 931), NIS and NISPlus, NNTP, Ping, POP3, SMTP, SNMP,
SSLeay, Telnet, and Time--just to name a few.
Here's a client that creates a TCP connection to the "daytime" service at port 13
of the host name "localhost" and prints out everything that the server there cares
to provide.
#!/usr/bin/perl -w
use IO::Socket;
$remote = IO::Socket::INET->new(
Proto => "tcp",
PeerAddr => "localhost",
PeerPort => "daytime(13)",
)
or die "cannot connect to daytime port at localhost";
while ( <$remote> ) { print }
|
|
When you run this program, you should get something back that looks like this:
Wed May 14 08:40:46 MDT 1997
|
|
Here are what those parameters to the new constructor mean:
Proto
- This is which protocol to use. In this case, the socket handle returned will be
connected to a TCP socket, because we want a stream-oriented connection, that is, one that
acts pretty much like a plain old file. Not all sockets are this of this type. For
example, the UDP protocol can be used to make a datagram socket, used for message-passing.
PeerAddr
- This is the name or Internet address of the remote host the server is running on. We
could have specified a longer name like
"www.perl.com", or an
address like "204.148.40.9". For demonstration purposes, we've used
the special hostname "localhost", which should always mean the
current machine you're running on. The corresponding Internet address for localhost is "127.1",
if you'd rather use that.
PeerPort
- This is the service name or port number we'd like to connect to. We could have gotten
away with using just
"daytime" on systems with a well-configured
system services file,[FOOTNOTE: The system services file is in /etc/services under
Unix] but just in case, we've specified the port number (13) in parentheses. Using just
the number would also have worked, but constant numbers make careful programmers nervous.
Notice how the return value from the new constructor is used as a filehandle
in the while loop? That's what's called an indirect filehandle, a scalar variable
containing a filehandle. You can use it the same way you would a normal filehandle. For
example, you can read one line from it this way:
all remaining lines from is this way:
and send a line of data to it this way:
print $handle "some data\n";
|
|
Here's a simple client that takes a remote host to fetch a document from, and then a list
of documents to get from that host. This is a more interesting client than the previous one
because it first sends something to the server before fetching the server's response.
#!/usr/bin/perl -w
use IO::Socket;
unless (@ARGV > 1) { die "usage: $0 host document ..." }
$host = shift(@ARGV);
$EOL = "\015\012";
$BLANK = $EOL x 2;
foreach $document ( @ARGV ) {
$remote = IO::Socket::INET->new( Proto => "tcp",
PeerAddr => $host,
PeerPort => "http(80)",
);
unless ($remote) { die "cannot connect to http daemon on $host" }
$remote->autoflush(1);
print $remote "GET $document HTTP/1.0" . $BLANK;
while ( <$remote> ) { print }
close $remote;
}
|
|
The web server handing the "http" service, which is assumed to be at its standard
port, number 80. If the web server you're trying to connect to is at a different port (like
1080 or 8080), you should specify as the named-parameter pair, PeerPort => 8080.
The autoflush method is used on the socket because otherwise the system would
buffer up the output we sent it. (If you're on a Mac, you'll also need to change every "\n"
in your code that sends data over the network to be a "\015\012"
instead.)
Connecting to the server is only the first part of the process: once you have the
connection, you have to use the server's language. Each server on the network has its own
little command language that it expects as input. The string that we send to the server
starting with "GET" is in HTTP syntax. In this case, we simply request each
specified document. Yes, we really are making a new connection for each document, even though
it's the same host. That's the way you always used to have to speak HTTP. Recent versions of
web browsers may request that the remote server leave the connection open a little while, but
the server doesn't have to honor such a request.
Here's an example of running that program, which we'll call webget:
% webget www.perl.com /guanaco.html
HTTP/1.1 404 File Not Found
Date: Thu, 08 May 1997 18:02:32 GMT
Server: Apache/1.2b6
Connection: close
Content-type: text/html
<HEAD><TITLE>404 File Not Found</TITLE></HEAD>
<BODY><H1>File Not Found</H1>
The requested URL /guanaco.html was not found on this server.<P>
</BODY>
|
|
Ok, so that's not very interesting, because it didn't find that particular document. But a
long response wouldn't have fit on this page.
For a more fully-featured version of this program, you should look to the lwp-request
program included with the LWP modules from CPAN.
Well, that's all fine if you want to send one command and get one answer, but what about
setting up something fully interactive, somewhat like the way telnet works? That way
you can type a line, get the answer, type a line, get the answer, etc.
This client is more complicated than the two we've done so far, but if you're on a system
that supports the powerful fork call, the solution isn't that rough. Once you've
made the connection to whatever service you'd like to chat with, call fork to
clone your process. Each of these two identical process has a very simple job to do: the
parent copies everything from the socket to standard output, while the child simultaneously
copies everything from standard input to the socket. To accomplish the same thing using just
one process would be much harder, because it's easier to code two processes to do one
thing than it is to code one process to do two things. (This keep-it-simple principle a
cornerstones of the Unix philosophy, and good software engineering as well, which is probably
why it's spread to other systems.)
Here's the code:
#!/usr/bin/perl -w
use strict;
use IO::Socket;
my ($host, $port, $kidpid, $handle, $line);
unless (@ARGV == 2) { die "usage: $0 host port" }
($host, $port) = @ARGV;
# create a tcp connection to the specified host and port
$handle = IO::Socket::INET->new(Proto => "tcp",
PeerAddr => $host,
PeerPort => $port)
or die "can't connect to port $port on $host: $!";
$handle->autoflush(1); # so output gets there right away
print STDERR "[Connected to $host:$port]\n";
# split the program into two processes, identical twins
die "can't fork: $!" unless defined($kidpid = fork());
# the if{} block runs only in the parent process
if ($kidpid) {
# copy the socket to standard output
while (defined ($line = <$handle>)) {
print STDOUT $line;
}
kill("TERM", $kidpid); # send SIGTERM to child
}
# the else{} block runs only in the child process
else {
# copy standard input to the socket
while (defined ($line = <STDIN>)) {
print $handle $line;
}
}
|
|
The kill function in the parent's if block is there to send a
signal to our child process (current running in the else block) as soon as the
remote server has closed its end of the connection.
If the remote server sends data a byte at time, and you need that data immediately without
waiting for a newline (which might not happen), you may wish to replace the while
loop in the parent with the following:
my $byte;
while (sysread($handle, $byte, 1) == 1) {
print STDOUT $byte;
}
|
|
Making a system call for each byte you want to read is not very efficient (to put it
mildly) but is the simplest to explain and works reasonably well.
|
|