connect function.
IP (Internet Protocol) is the network-layer protocol we'll
be using. It is: connectionless, and unreliable.
TCP (Transmission Control Protocol) is one of the
tranport layer protocols we'll be using. It is:
connection-oriented, reliable, full-duplex, byte-stream.
UDP (User Datagram Protocol) is another transport
layer protocol we'll be using. It is: connectionless,
unreliable, full-duplex.
socket.
int socket(int domain, int type, int protocol);The
protocol parameter is usually set to zero,
which allows the actual protocol to be deduced from the given
domain and type.
The domain will be PF_INET (for
IPv4) in our
examples, although others are possible, like
PF_INET6 for IPv6.
The argument type will depend on what you want to
do: the constant SOCK_STREAM gives us a TCP
connection (reliable, connection-oriented bytestream)
and SOCK_DGRAM gives us UDP (unreliable,
connectionless). The choice of which to use really depends on
the application.
A socket on a host is
addressed by two things: a hostname and a port
number.
We say the socket is "bound" to a particular port number on
a host.
Only one socket can be bound to a given port
at any one time on a host, so that address is unique.
However, many file descriptors may
be referring to that socket simultaneously. Sockets are
numbered by a 16-bit non-negative number (C's unsigned
short int ). Some port numbers are dedicated to
different services -- like port 22 for ssh, or imap on port
143, or WoW on port 3724.
This is controlled by ICANN.
Higher-numbered ports are there for whatever you need to do.
So, to connect in order to communicate with a process on a
remote host,
you need to know both the host name / IP address and the port
number.
A socket can be bound to a port number in different ways,
depending on whether the socket is being used as a "client"
or as a "server".
The netstat utility can be used to show the
current socket/port bindings. Here's an excerpt (I've cut
out lots of stuff!):
michcsdbrownu$ netstat -a TCP: IPv4 Local Address Remote Address Swind Send-Q Rwind Recv-Q State -------------------- -------------------- ----- ------ ----- ------ ----------- michcsdbrownu.ssh 131.122.90.34.40215 135424 0 49232 0 ESTABLISHED michcsdbrownu.40923 nwtime.usna.edu.ldap 6064 0 49640 0 ESTABLISHED michcsdbrownu.893 chessie.cs.usna.edu.nfsd 49640 0 49640 76 ESTABLISHED localhost.6010 localhost.41013 49152 0 49152 0 ESTABLISHED michcsdbrownu.6000 hercules1.usna.navy.mil.3833 65535 0 49640 0 CLOSE_WAIT michcsdbrownu.41055 chessie.cs.usna.edu.ssh 49640 0 49640 0 TIME_WAIT michcsdbrownu.32774 chessie.cs.usna.edu.54831 49640 0 49640 0 ESTABLISHEDYou see that we're looking at entries for sockets in the IPv4 domain of type TCP. You get the hostname.portnum for both ends of the sockets. When you see a name instead of a port number, like the .ssh in
michcsdbrownu.ssh, that means that the port (22
in this case) is well-known so that the system has a name for
it, and uses the name instead of the number. The last column
is the "state" and that tells you something about the state of
the current connection.
~wcbrown/courses/IC221/labs/L12/server that
sits and listens at port 10000. What it does we'll see later.
If we run that server (in the background with an &
please), and try netstat again, we see a change:
michcsdbrownu$ ~wcbrown/courses/IC221/labs/L12/server &
[2] 1774
michcsdbrownu$ netstat -a
TCP: IPv4
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -----------
michcsdbrownu.ssh 131.122.90.34.40215 135424 0 49232 0 ESTABLISHED
michcsdbrownu.40923 nwtime.usna.edu.ldap 6064 0 49640 0 ESTABLISHED
michcsdbrownu.893 chessie.cs.usna.edu.nfsd 49640 0 49640 76 ESTABLISHED
localhost.6010 localhost.41013 49152 0 49152 0 ESTABLISHED
michcsdbrownu.6000 hercules1.usna.navy.mil.3833 65535 0 49640 0 CLOSE_WAIT
michcsdbrownu.41055 chessie.cs.usna.edu.ssh 49640 0 49640 0 TIME_WAIT
michcsdbrownu.32774 chessie.cs.usna.edu.54831 49640 0 49640 0 ESTABLISHED
michcsdbrownu.10000 *.* 0 0 49152 0 LISTEN
Notice that we know have en entry for port 10000, where our
simple server is in the LISTEN state. There's no "Remote
Address" yet, because no client has connected to it.
I also have a simple client written, and you'll see how to
write it momentarily. But if I launch the client, I should
see a connection. The client knows to try to connect to port
10000, but you must give it the hostname on the command-line.
So let's login on another machine and run the client:
michcsdbrownu$ ssh mich302csd07d Password: ********** mich302csd07d$ ~wcbrown/courses/IC221/labs/L12/client michcsdbrownu the THE |
So, now if we look at netstat again, we see a change in the
line for michcsdbrownu.10000 --- now there's a
remote address, and the "state" is ESTABLISHED. Notice how
the client's port number is some
random big number ... it's actual value is pretty much irrelevent.
michcsdbrownu$ netstat -a
TCP: IPv4
Local Address Remote Address Swind Send-Q Rwind Recv-Q State
-------------------- -------------------- ----- ------ ----- ------ -----------
michcsdbrownu.ssh 131.122.90.34.40215 135424 0 49232 0 ESTABLISHED
michcsdbrownu.40923 nwtime.usna.edu.ldap 6064 0 49640 0 ESTABLISHED
michcsdbrownu.893 chessie.cs.usna.edu.nfsd 49640 0 49640 76 ESTABLISHED
localhost.6010 localhost.41013 49152 0 49152 0 ESTABLISHED
michcsdbrownu.6000 hercules1.usna.navy.mil.3833 65535 0 49640 0 CLOSE_WAIT
michcsdbrownu.41055 chessie.cs.usna.edu.ssh 49640 0 49640 0 TIME_WAIT
michcsdbrownu.32774 chessie.cs.usna.edu.54831 49640 0 49640 0 ESTABLISHED
michcsdbrownu.10000 mich302csd07d.cs.usna.edu.32881 49640 0 49640 0 ESTABLISHED
So, to summarize: Sockets are kernel resources, which we
request via the socket system call. To
communicate across a network, sockets are bound to ports
so that the hostname/port-number combination gives you an address
for a particular socket --- only one socket has that
hostname/portnumber combination, so the address is unique. Servers have to "listen"
at a port number that's known ahead of time. Clients
connect to a server at a well-known hostname/portnumber, and
though the client uses a socket bound to some port, the
actual port number is more or less irrelevent. Next, what
does the client do?
socket to create a socket, and a system call
conveniently named connect to both
connect:
#include <sys/types.h> #include <sys/socket.h> int connect(int s, const struct sockaddr *name, int namelen);While conceptually this is quite simple, the code for it is a bit baroque because the "host + port" information is stored in a struct that isn't easy to use.
struct sockaddr_in {
...
sa_family_t sin_family;
in_port_t sin_port;
struct in_addr sin_addr;
...
};
struct in_addr {
...
in_addr_t s_addr; ← this is just the 32-bit network byte order IP!
...
};
The "family" is easy for us: IF_INET to indicate IPv4. The
port is easy, port 10000, though we must remember to put that
number into network byte order: htns(10000). The
last issue is the IP address itself, which is made difficult
by the nesting struct's. All together, we need to to
something like this:
struct sockaddr_in mysa;
mysa.sin_family = AF_INET;
mysa.sin_port = htons(10000);
mysa.sin_addr.s_addr = ???;
But what do we do to get the address? We need the 32-bit int,
network byte order address? Well, we could hardcode the IP
address as a 32-bit unsigned int, we could hardcode / get
from argv the IP address as a dotted quad and use
inet_addr, or we could get the symbolic name
(from argv) and use gethostbyname like last
class.
Another piece of nastiness is that the second argument to
connect is a struct sockaddr*, not a
struct sockaddr_in*, which is what we have.
And what about that third argument? Well,
connect is supposed to work for several
domains, not just IPV4 (a.k.a. IF_INET). In object oriented
languages, "sockaddr" would be a base class and
"sockaddr_in" would be a derived class --- one among
several. This is C, and there's no OOP. So, we cast the
struct sockaddr_in* to a struct
sockaddr*, evan though the cast is not really
valid. Connect figures out what it's gotten passed, by the
third arguement, which is the sizeof the actual struct being
passed, and as long as each sockadd_? type has a different
sizee, everything's good. So connect looks like:
connect(sfd,(struct sockaddr*)&mysa,sizeof(struct sockaddr));If successful in connecting,
connect returns
0. Anything else indicates failure. Putting it all together:
/***********************************************************
* Simple TCP Client.
* This client connects to the server and sends the user
* input to the server and echos back what the server sends it.
* Compile like this: gcc -o client client.c -lnsl -lsocket
***********************************************************/
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
int main(int argc, char **argv)
{
// Print usage
if (argc == 1) { fprintf(stderr,"usage: %s <hostname>!\n",argv[0]); exit(1); }
// Set up socket
int sfd = socket(AF_INET,SOCK_STREAM,0);
if (sfd == -1) { fprintf(stderr,"Socket not created!\n"); exit(2); }
// Get IP address from symbolic name
struct hostent *p;
p = gethostbyname(argv[1]);
if (p == NULL) { fprintf(stderr,"Name not found!\n"); exit(1); }
unsigned int *ip = (unsigned int*)(p->h_addr_list[0]);
// Set up address structure
struct sockaddr_in mysa;
mysa.sin_family = AF_INET;
mysa.sin_addr.s_addr = *ip;
mysa.sin_port = htons(10000);
// Connect!
if (connect(sfd,(struct sockaddr*)&mysa,sizeof(mysa)) != 0)
{
fprintf(stderr,"Client could not connect!\n");
exit(3);
}
// Communicate with server
char inc, outc;
while(scanf("%c",&inc) == 1)
{
write(sfd,&inc,1);
read(sfd,&outc,1);
printf("%c",outc);
}
close(sfd);
return 0;
}
connect is
int connect(int s, const struct sockaddr *name, int namelen);... and calling connect always requires some casting.
If we were in an object oriented world, we'd have the following class hierarchy:However, C is not an object oriented language, so the--- sockaddr_in / sockaddr -< \ --- sockaddr_in6andconnect's prototype would beconnect(int s, sockaddr name)and we'd simply call like this:connect(sfd,mysa). Themysaargument could be a either asockaddr_in(for IPv4) or asockaddr_in6(for IPv6), because inheritance says both aresockaddrobjects.
struct sockaddr* argument can only point to
a struct sockaddr object. Therefore, to allow
connect to take different kinds of objects as
a second parameter, we cast whatever pointer we really have to
a struct sockaddr* (tricking the compiler!).
Then we need to somehow let the connect function
know what type of object the second argument is really
pointing to, and we do that by passing it the true size of the
object as a third parameter.