Scala Day Three

The third day of Scala studies gives me a luke-warm feeling. Since Seven Languages in Seven Weeks was written, Scala’s actors have been deprecated in favour of using Akka, a concurrency library.

Scala treats XML as a first-class construct, but I find it hard to care about that. Given my JavaScript background, I generally avoid XML where possible in favour of JSON. XML was popular and useful in the dinosaur age of the Internet, but since JSON rose to ubiquity I think XML (and by effect, Scala) seems dated.

  1. XML parsing and concurrency
  2. Thoughts

XML parsing and concurrency

Take the sizer application and add a message to count the number of links on the page.

The first issue I faced was a problem with encoding. Java/Scala seems to struggle when reading an input stream with an unexpected encoding. I’m not exactly sure, but I think Scala was expecting a UTF-8 stream because my Scala file has that encoding. For half of the URLs I tried, the compiler threw this exception:

The fromURL method takes an optional encoding parameter which we can use to mitigate this issue.

Finding the number of anchor tags on a given page would have been an excellent opportunity to make use of Scala’s XML parser, but sadly it’s not robust enough to handle malformed XML (like real HTML). There are some libraries that apparently do a better job of parsing HTML, but I didn’t manage to make any of them work with Scala. At this point, I’ll fall back to using a regular expression.

For some reason, my parser seems to think there is only one anchor on the Google homepage, so something is not working but I don’t feel it’s worth investigating at this point. What we have is close enough. Here’s the method for finding anchor tags in a HTML string:

In the end, our final chunk of code looks like this:

Thoughts

I still have mixed feelings about Scala. Some of the ideas feel elegant, and I think it’s worth incorporating these ideas into my work in other languages. One of Scala’s supposed strengths is its extensibility; everything is available in libraries. I’m sure this is a good thing for a number of reasons, but it doesn’t feel user-friendly to me. The language feels possibly too big, with too many idioms. I guess I’ll have to learn to love it.