Introducing YippieMove '09. Easy email transfers. Now open for all destinations.

Today when I logged into Google Docs I discovered that I had a new document. With Google Doc’s collaboration feature, this is nothing shocking. What was weird was the following:

  • I don’t know the person who shared the document
  • I never received any notification regarding the document

When looking closer I saw something really interesting. The document was shared with ‘Everyone.’ I immediately became curious about this and tried to replicate this myself, which I unfortunately failed.

I have no idea what the person sharing the document did to accomplish this, but it’s a quite serious problem. Somehow I do not think the person sharing the document intended to share it with ‘Everyone.”

A serious Google Docs sharing bug?

A serious Google Docs sharing bug?


After talking to a few other Google Docs users, the document does not appear to be shared with everyone. Please let us know if you see the same document in your Docs too.

Author: Tags: , ,
Introducing YippieMove '09. Easy email transfers. Now open for all destinations.

Recently we hooked up with Zendesk for our support system. We hooked up its Remote Authentication API so that our customers could use their YippieMove logins in our support system as well. Jon Gales provided a great recipe for doing this in Django which we kicked off with, making some minor updates too. Only one problem remained: if an account had international characters in the first or the last name, Chinese, Japanese, and so on, the code would error out.

Here’s what the error looks like in practice:

Traceback (most recent call last):

File "/usr/local/lib/python2.5/site-packages/django/core/handlers/base.py", line 86, in get_response
response = callback(request, *callback_args, **callback_kwargs)
...
hash = md5('%s %+s%s%s%s' % (first, last, u.email, settings.ZENDESK_TOKEN, timestamp)).hexdigest()

UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 2: ordinal not in range(128)

The problem is that md5().hexdigest() only works on byte strings. If you give it a unicode string it will try to .encode('ascii') it, which works fine as long as there are no international characters in the string.

The solution is simple: just explicitly UTF encode the source string. The Zendesk documentation did not mention what encoding they use themselves but my first guess would have been UTF-8, which turned out to be right. So with that in mind here is our current version of Jon Gales’ Zendesk Remote Authentication snippet for Django:

@never_cache
@login_required
def authorize(request):
  try:
    timestamp = request.GET['timestamp']
  except KeyError:
    raise Http404

  u = request.user

  from hashlib import md5

  first = u.first_name
  last = u.last_name
  if not first and not last:
    first = "Yippie"
    last = "Mover"

  data = u'%s %+s%s%s%s' % (first, last, u.email, settings.ZENDESK_TOKEN, 
    timestamp)
  hash = md5(data.encode('utf-8')).hexdigest()

  url = u"%s/access/remote/?name=%s %s&email=%s&timestamp=%s&hash=%s" % (
    settings.ZENDESK_URL, first, last, u.email, timestamp, hash)

  return HttpResponseRedirect(url)

Notice that we feed a unicode URL to HttpResponseRedirect – Django is smart enough to do the right thing when encoding the URL.

Author: Tags: , , , ,
Introducing YippieMove '09. Easy email transfers. Now open for all destinations.

New Playing With Wire writer Lore Dionne Candelaria takes a look at the history of email and what led to its enormous impact on our modern lives.


Old meets new: early version of Hotmail displayed in Safari 3

Old meets new: early version of Hotmail displayed in Safari 3

Ever since the dawn of human existence, it has been an instinct of man to try to find ways to reach out and to be able to be heard. Speech and language proved to be the most important tools of the ancient human. Humankind has used these tools to survive and to exchange thoughts; which later evolved into other forms that made use of written symbols, drawings, runes and many other variations. The natural next step in this desire to innovate was the attempts to conquer distance — man wanted to still be in touch besides being apart. Carrier pigeons and smoke signals paved the way for a line of development aiming for an infinite means of communication. Thus came the discovery and birth of technologies like the telegraph, telephone, and ultimately, the e-mail system.

Before we can talk about the creation of electronic mail, we should first understand the beginnings of the internet itself. Many people have probably heard that the Internet began in some military computers in the famous Pentagon, that it was called Arpanet, and that the year was 1969. The theory goes on to suggest that the network was designed to survive a nuclear attack. True enough, the Internet was designed in part to provide a communications network that would work even if some of the sites were destroyed by nuclear attack. In 1969, the US Department of Defense wanted a communication system that could not be destroyed in the event of an emergency. They linked computers over telephone lines so that if one computer failed to work, the others could still communicate with each other. This system was called then as ARPANET.

ARPANET (Advanced Research Projects Agency Networks) was the network that became the basis for the development of the INTERNET. It was developed under the direction of the U.S. Advanced Research Projects Agency (ARPA). In 1969, the idea became a modest reality with the interconnection of four university computers. The initial purpose was to communicate with and share computer resources among mainly scientific users at the connected institutions. ARPANET took advantage of the new idea of sending information in small units called packets that could be routed on different paths and reconstructed at their destination. The main concept in packet switching was the idea of making use of circuits that are switched like in the old type of typical telephone circuit, where a dedicated circuit is tied up for the duration of the call and communication is only possible with the single party on the other end of the circuit.

ARPANET logic map, March 1977

ARPANET logic map, March 1977 (From Computer Science Museum)

The starting point for host-to-host communication on the ARPANET was the 1822 protocol which defined the way that a host sent messages to an ARPANET IMP. The message format was designed to work unambiguously with a broad range of computer architectures. Essentially, an 1822—message consisted of a message type, a numeric host address, and a data field. To send a data message to another host, the sending host would format a data message containing the destination host’s address and the data to be sent, and transmit the message through the 1822 hardware interface. The IMP would see that the message was delivered to its destination, either by delivering it to a locally-connected host or by delivering it to another IMP. When the message was ultimately delivered to the destination host, the IMP would send an acknowledgment message (called Ready for Next Message or RFNM) to the sending host.

So, how exactly did email evolve from the classic ARPANET? The answer comes from the name of Raymond Tomlinson. Tomlinson, born in 1941, is a programmer who implemented an email system in 1971 on the ARPANet. Email had been previously sent on other networks. Before internetworking began, email could only be used to send messages to various users of the same computer. Once computers began talking to each other over networks, however, the problem became a little more complex—they needed to be able to put a message in an envelope and address it. To do this, they needed a means to indicate which mails go to whom in a way that the electronic posts understood. This is the same as the conventional postal system: they need a way to indicate an address for a particular mail. The AUDOTIN was the first system able to send mail between users on different hosts connected to the Arpanet. To achieve this, Tomlinson used the @ sign to separate the user from their machine, the “commercial at” symbol to combine the user and host names, providing the naturally meaningful notation “user@host”—that is the standard for email addressing today. This has been used in email addresses ever since. These early programs had simple functionality and were command line-driven, but established the basic transactional model that still defines the technology: email gets sent to someone’s mailbox.

The first important email standard was called SMTP, or simple message transfer protocol. SMTP was very simple and is still in use – however, as we will hear later in this series, SMTP was a fairly naïve protocol, and made no attempt to find out whether the person claiming to send a message was the person they purported to be. Forgery was (and still is) very easy in email addresses. These basic flaws in the protocol were later exploited by viruses and worms, and by security frauds and spammers forging identities. But as it developed, email started to take on some pretty neat features. One of the first good commercial systems was Eudora, developed by Steve Dorner in 1988. When Internet standards for email began to mature the POP (or Post Office Protocol) servers began to appear as a standard; before that, each server was a little different. POP was an important standard which allowed users to develop mail systems. These were the days of per-minute charges for email of individual dialup users. For most people on the Internet in those days, email and email discussion groups were its main uses. There were many hundreds of these on a wide variety of topics, and as a body of newsgroups, they became known as USENET.

Raymond Tomlinson

Raymond Tomlinson

With the World Wide Web, email started to be made available with friendly web interfaces by providers such as Yahoo and Hotmail. Usually this was without charge. Now that email was affordable, everyone wanted at least one email address, and the medium was adopted by not just millions, but hundreds of millions of people. This only proves how emailing has reached new horizons in helping people to connect with the virtual world.

Though it is undeniable that emailing has gone a long way since it was first conceptualized, conceived and born, it now faces more problems than ever. While one cannot question the importance of email and instant messaging nowadays, one question remains: what is in store for electronic mailing in the future? In this age when everyone is aware of the emergence of so many communication options, does email still makes sense? What’s worse, e-mail has become a tool for criminal hackers ready to show off their technical skills. Recently, organized crime has become more of a force in the spam arena. They developed a series of get-rich-quick schemes and have also leveraged spam as an entry point into collecting and then misusing individuals’ personal financial information. As a result, it is estimated that spam represents 80% to more than 90% of all e-mail messages. Consequently, some businesspeople are flocking to these new communications options to rid themselves of the tedious task of constantly hitting the delete button. Can one still trust the reliability of emailing in connecting and communicating with other people?

The answer lies on the platform that emailing has founded. The flaw that makes email so easy to abuse for spam and other nefarious activities is also it’s strength: it’s easy to get an e-mail address and nearly everyone has one. E-mail will continue to be a popular communication option precisely because it is so popular. Emailing continues to be a communications option that generates billions of messages each year.

Yes, new communication channels have emerged, they appeal to different sort of users, and they will be in use at times instead of e-mail. However, e-mail still has the broadest range of supporters and will continue to be the primary communications media for most businesspersons for the foreseeable future; especially now that there is a boom in the online business industry. Email has proven to be an efficient marketing and advertising medium. The increase in electronic commerce or e-commerce has once again pressured the mail developers to improve and better the functionalities of email systems.

For sure things will change and just like all else, e-mail will surely evolve, but its use as a communication medium is still unparalleled. It was after all created on top of technology built to survive.

Author: Tags: , ,
Introducing YippieMove '09. Easy email transfers. Now open for all destinations.

Our email transfer service YippieMove is essentially software as a service. The customer pays us to run some custom software on fast machines with a lot of bandwidth. We initially picked VMware virtualization technology for our back-end deployment because we desired to isolate individual runs, to simplify maintenance and to make scaling dead easy. VMware was ultimately proven to be the wrong choice for these requirements.

Ever since the launch over a year ago we used VMware Server 1 for instantiating the YippieMove back-end software. For that year performance was not a huge concern because there were many other things we were prioritizing on for YippieMove ’09. Then, towards the end of development we began doing performance work. We switched from a data storage model best described as “a huge pile of files” to a much cleaner sqlite3 design. The reason for this was technical: the email mover process opened so many files at the same time that we’d hit various limits on simultaneously open file descriptors. While running sqlite over NFS posed its own set of challenges, they were not as insurmountable as juggling hundreds of thousands of files in a single folder.

The new sqlite3 system worked great in testing – and then promptly bogged down on the production virtual machines.

CPU usage on one of our core servers running VMWare

Tough CPU week on a server running VMWare

We had heard before that I/O performance and disk performance are the weaknesses of virtualization but we thought we could work around that by putting the job databases on an NFS export from a non virtualized server. Instead the slowness we saw blew our minds. The core servers spent a constant 70% of CPU time with system tasks and despite an uninterrupted 100% CPU usage we could not transfer more than 400KBit/s worth of IMAP traffic per physical machine. This was off by a magnitude from our expected throughput.

Obviously something was wrong. We doubled the amount of memory per server, we quadrupled sqlite’s internal buffers, we turned off sqlite auto-vacuuming, we turned off synchronization, we added more database indexes. These things helped but not enough. We twiddled endlessly with NFS block sizes but that gave nothing. We were confused. Certainly we had expected a performance difference between running our software in a VM compared to running on the metal, but that it could be as much as 10X was a wake-up call.

At this point we realized that no amount of tweaking was likely to get  our new sqlite3 version out of its performance hole. The raw performance just wasn’t there. We suspected at least part of the problem was that we were running FreeBSD guests in VMware. We checked that we were using the right network card driver (yes we were). We checked the OS version – 7.1, yep that one was supposedly the best you could get for VMware. We tuned various sysctl values according to guides we found online. Nothing helped.

We had the ability to switch to a more VM friendly client OS such as Ubuntu and hope it would improve performance. But what if that wouldn’t resolve the situation? That’s when FreeBSD jails came up.

Jails are a sort of lightweight virtualization technique available on the FreeBSD platform. They are like a chroot environment on steroids where not only the file system is isolated out but individual processes are confined to a virtual environment – like a virtual machine without the machine part. The host and the jails use the same hardware but the operating system puts a clever disguise on the hardware resources to make the jail seem like its own isolated system.

Since nobody could think of an argument against using jails we gave them a shot. Jails feature all the things we wanted to get out of VMware virtualization:

  • Ease of management: you can pack up a whole jail and duplicate it easily
  • Isolation: you can reboot a jail if you have to without affecting the rest of the machine
  • Simple scaling: it’s easy to give a new instance an IP and get it going

At the same time jails don’t come with half the memory overhead. And theoretically IO performance should be a lot better since there was no emulated harddrive.

And sure enough, system CPU usage dropped by half. That CPU time was immediately put to good use by our software. And so even that we still ran at 100% CPU usage overall throughput was much higher – up to 2.5MBit/s. Sure there was still space for us to get closer to the theoretical maximum performance but now we were in the right ballpark at least.

More expensive versions of VMware offer process migration and better resource pooling, something we’ll be keen to look into when we grow. It’s very likely our VMware setup had some problems, and perhaps they could have been resolved by using fancier VMware software or porting our software to run in Ubuntu (which would be fairly easy). But why cross the river for water? For our needs today the answer was right in front of us in FreeBSD: jails offer a much more lightweight virtualization solution and in this particular case it was a smash hit performance win.

Author: Tags: , , , , ,
Introducing YippieMove '09. Easy email transfers. Now open for all destinations.

As you guys have noticed by now we have done a little refresh of Playing With Wire. At the same time we choose to upgrade to WordPress 2.7 from WordPress 2.2.3.

Unfortunately early versions of WordPress did not specify UTF-8 encoding for the tables created in the database. After the upgrade, UTF-8 was in WordPress but our tables were still in Latin 1 and we got quite a collection of funny characters in some of our postings. Examples include “’” instead of a quotation mark, or  in the middle of some whitespace.

After searching for a while we found the solution at bawdo2001’s blog:

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset DBNAME > DBNAME.sql
sed -e 's/latin1/utf8/g' -i ./DBNAME.sql
mysql -p --default-character-set=utf8 DBNAME < DBNAME.sql

In other words, just dump the database in latin1, swap out latin1 for utf8 in the output SQL and then reimport in utf8. Just make sure you get a good backup of your database in a separate file before you start reimporting.

Author: Tags: , , , ,

© 2006-2009 WireLoad, LLC.
Logo photo by William Picard. Theme based on BlueMod © 2005 - 2009 FrederikM.de, based on blueblog_DE by Oliver Wunder.
Sitemap