Showing posts with label php. Show all posts
Showing posts with label php. Show all posts

Wednesday, April 10, 2013

Outputting iCal with PHP

I'm a big Google Calendar user. If I don't have it on my calendar, then it's probably *not* going to happen. If I'm trying to schedule something into my week, then I'm always consulting my calendar to see how it fits in with everything else, or if its making my week too busy. And, hey, I'm pretty sure I'm not the only GCal addict out there. (Oh, and before GCal, I was totally a Yahoo! Calendar user. Retro!)

So, when I first joined Coursera, I brought with me a list of ways I wanted to improve the student experience, and one of those was "Create a Google calendar of deadlines."

I was hoping this would be an easy thing, something I'd do in my first month. Of course, I didn't realize then that our legacy codebase was a tangle of PHP, that it was split across 5 git repositories, and that it was largely untested. So I repressed my dreams and worked on improving our architecture so that features like that *would* be an easy thing.

Well, as I just announced on the Coursera blog, I finally got to a place where I could write and test the feature, and we've started surfacing it on our classes.

I still had to write it in our legacy PHP codebase, but I don't actually mind PHP when it's written relatively cleanly and testable. I found the hardest part was figuring out exactly how to format my ICS files, and I spent a while going back and forth between this handy iCal Validator and the rather boring iCalendar specification.

I started by writing 2 general classes - CalendarEvent for generating VEVENTs, and Calendar for generating VCALs. Here's the most important function of the CalendarEvent class, the one that generates the string based on the event data:

public function generateString() {
  $created = new DateTime();
  $content = '';

  $content = "BEGIN:VEVENT\r\n"
           . "UID:{$this->uid}\r\n"
           . "DTSTART:{$this->formatDate($this->start)}\r\n"
           . "DTEND:{$this->formatDate($this->end)}\r\n"
           . "DTSTAMP:{$this->formatDate($this->start)}\r\n"
           . "CREATED:{$this->formatDate($created)}\r\n"
           . "DESCRIPTION:{$this->formatValue($this->description)}\r\n"
           . "LAST-MODIFIED:{$this->formatDate($this->start)}\r\n"
           . "LOCATION:{$this->location}\r\n"
           . "SUMMARY:{$this->formatValue($this->summary)}\r\n"
           . "SEQUENCE:0\r\n"
           . "STATUS:CONFIRMED\r\n"
           . "TRANSP:OPAQUE\r\n"
           . "END:VEVENT\r\n";
  return $content;
}
And the function for the Calendar Class that generates the string of events:
public function generateString() {
  $content = "BEGIN:VCALENDAR\r\n"
             . "VERSION:2.0\r\n"
             . "PRODID:-//" . $this->author . "//NONSGML//EN\r\n"
             . "X-WR-CALNAME:" . $this->title . "\r\n"
             . "CALSCALE:GREGORIAN\r\n";

  foreach($this->events as $event) {
    $content .= $event->generateString();
  }
  $content .= "END:VCALENDAR";
  return $content;
}

Here's an example of using those classes to create a calendar with one event:

$event_parameters = array(
            'uid' =>  '123',
            'summary' => 'Introduction Quiz Deadline',
            'description' => 'Make sure you check the website for the latest information',
            'start' => new DateTime('@'.($time - (60*60))),
            'end' => new DateTime('@'.$time),
            'location' => 'http://class.coursera.org/ml/quiz/index?id=2'
        );
$event = new CalendarEvent($event_parameters);

$calendar = new Calendar();
$calendar->events = array($event);
$calendar->title  = 'Machine Learning Deadlines';
$calendar->author = 'Coursera Calendars';
$calendar->generateDownload();

In our own code, I wrote two more classes to help with generating those events for our own data, CourseItem and CourseCalendar (a subclass of Calendar).

You can check out the Calendar classes in this gist. If you've worked with iCalendar files in the past and know anything that we should be tweaking about what we're outputting, let me know more in the comments.

Sunday, March 31, 2013

Rewriting our Forums with Backbone

When we rolled out the redesign of the Coursera class platform back in January, I put up a prominent message asking for feedback, and as can be expected, we got a lot of feedback. Much of the feedback was on the forums, where we had improved the aesthetics but neglected to improve the core usability. We had feedback like:

  • "I want to be able to link to a post."
  • "I can only comments, I can't edit posts."
  • "Whenever I do anything, the whole page reloads and I lose my place."

When I started to tackle the problems, I was faced with our legacy codebase, a spaghetti code of PHP outputting HTML and being manipulated in JavaScript. Some of the actions were done via API-like calls and some were done via form POSTs and server redirects. There was no consistent architecture, and that made it hard for me to make things that felt like they should be minor improvements. There was also an increasing amount of JavaScript, but it wasn't written in a clean way, and it worried me every time I added to it.

So I did the thing that everyone tells you not to do: rewrite the codebase. I knew it would be a lot of work, and that I may risk introducing regressions, but I decided it would be worth it, if it would enable us to iterate faster and innovate more in the future.


The Process

I started by turning the backend into a RESTful API, with logically organized, object-oriented classes representing the routes and the database models. Once I had enough of an API to give me thread data, I started on the frontend, a single-page Backbone web app following the style of our other Backbone apps (which I've spoken about in the past). From there, I just kept iterating, building back the features that we had before and figuring out the best way to approach our top usability bugs.

At a certain point, I realized that I couldn't handle this rewrite myself (a hard thing for me to admit, I may be a bit of a cowboy coder) and I enlisted the help of my colleague Jacob (and his expertise as an avid Reddit moderator and user).

Once we had it 80% done, I started writing tests for the frontend. When we were 95% done, we enabled it via a per-class feature flag for our Community TAs class, and spent a week addressing feedback from the TAs and from our QA team. Then we started enabling it on classes, and after addressing the biggest concern from students (lack of Markdown support in the editor), we've enabled it for all our classes. From start to end, the rewrite took us about 6 weeks - three times as long as I hoped. One day I'll learn that most things take 3x as long as I expect them to. ☺


The Database

Since I wanted to be able to introduce the new forums in old classes - and also because I wanted to scope my rewrite down - I decided to stick with the same database design and model relations.

We use MySQL (in an effectively sharded way because each class has its own database), and this is my not so technical diagram of what our tables look like for forums:

A big thing to note is that each of our threads are always related to a forum, and that we do not have infinite nesting of comments like Disqus or Reddit, we instead have top-level posts which can each have associated comments. We may change this in a future rewrite to allow more levels of nesting with arbitrary comment levels, but for now, the post/comment relation is ingrained into our database design.


The Backend

Our class platform is currently written in PHP, and much of it are custom libraries, but, hey, if you're interested, here's how the new forums backend works:

  • We model the data with the PHP Active Record library, and use class functions and static functions to capture model-specific functionality. We use the Active record functions as much as possible, but sometimes use our own SQL query system (like for INSERT IGNORE, which it doesn't handle).
  • We have a simple Rest_Router class which can recognize registered routes and pass the requests to the appropriate class for processing.
  • We have a routes.php file which lists all of the forum API related routes.
  • We have a file of classes that extend our Rest_Controller class and handle the routes, defining get/patch/post/delete as needed. (We prefer patch instead of put, since partial updates are easy via Backbone and preferable.)

For example, this URL in the routes file is for deleting a user's subscription to a thread:

$router->delete('threads/:thread_id/subscriptions', 'Subscriptions#delete');

This class in the controller file handles that URL:

class Subscriptions extends \Rest_Controller {
    
  public function delete($params) {
    $response = new \Rest_Http_Response();
    try {
      $request_body = $this->get_request_body();
      $data = json_decode($request_body, true);
      $data['thread_id'] = $params['thread_id'];
      $data['user_id']   = _current_user('id');
      $subscription_data = \Forum\Thread_Subscription::delete_subscription($data);
      $response->set_json_body(json_encode($subscription_data));
    } catch (\Exception $e) {
      return $this->error_request($e);
    }
    return $response;
  }
}

And this is the Active Record model that is called:

class Thread_Subscription extends \ActiveRecord\Model {
    
  static $table_name = 'thread_subscriptions';
    
  public static function delete_subscription($data) {
    $subscription = self::get_for_user($data['thread_id'], $data['user_id']);
    $subscription->delete();
    return null;
  }
}

The Frontend

We're a bit of a Backbone shop at Coursera now. We're not absolutely in love with it, but we've built up a lot of internal knowledge and best practices around it, so it makes sense for us to build our new apps in Backbone to keep our approach consistent. However, we do like to experiment in each app with different ways of using Backbone - like using Backbone-stickit for data binding in our most recent app. Sometimes those ways stick and become part of our best practices, and sometimes they fade away into oblivion.

Saying all that, here's a breakdown of how the forum Backbone app works. It's not perfect, but hey, it's a start.


The "Routes"

Most Backbone single-page web apps start with a routes file that maps URLs to views, and Backbone looks at the URL to figure out what view function to kick off. In this case, however, I wanted to code it so I could easily embed a forum thread on any page, regardless of URL. I want widgets, not routes.

To accomplish widget-like functionality, I wrote it so that the main JS file for the forum app looks for DIVs on the page with particular data attributes and replaces them with the relevant view. For example, here's our code for loading in a thread widget:

$('[data-forum-thread]').each(function() {
  var threadId = Number($(this).attr('data-thread-id'));
  var thread = new ThreadModel(id: threadId});
  new ThreadView(_.extend(opt, {
    el: $(this)[0], 
    model: thread,
    mode: threadMode
  }));
});
  

The Views

All of our views use Jade templates for HTML generation and separate Stylus files for CSS. Many of them listen to "change" or "sync" events on their respective models and then check the changedAttributes() array to see if they care about what attribute changed. That minimizes the amount of re-rendering that has to happen.

As an example, let's walk through ThreadView and its nested views. First, a diagram:

  • ThreadView is responsible for handling infinite loading and scrolling to permalinks. It defers all other rendering and event handling to one of its nested views, which each know to only re-render themselves when relevant properties of the thread change:
    • ThreadHeaderView: manages the title, subscription and thread admin controls.
    • ThreadTagsView: shows tags and handles adding tags.
    • PostContainerView: creates containers for each post using PostView and each comment using CommentView.
      • PostView and CommentView both extend EntryView with no modification. The slight differences between them are handled with if checks inside EntryView (e.g., only a post can be pinned, not a comment).
      • EntryView handles rendering an entry in view mode with its admin controls and voting controls, and it knows how to render an edit mode when the user wants it.

The Models

Most of our models use BackboneRelational, an extension of Backbone that knows how to take a JSON and turn keys into related Collections of Models. Our models also use our custom API wrapper, which takes care of CSRF tokens and displaying AJAX loading messages at the top of the page.

For example, let's look at ThreadModel and its related models. First, a diagram:

  • ThreadModel extends Backbone.RelationalModel, turning its "posts" and "comments" keys into PostCollection and CommentCollection, respectively. It is responsible for fetching thread JSON from the server and for figuring out how to fetch previous/next pages of the JSON. We debated how best to do this, and settled on always passing down a "post skeleton" where each post has an "id" and "order", and then we track which parts of the skeleton we've filled in (based on "post_text" existing), and fill in above/below. ThreadModel also must keep track of which user IDs it's seen on posts, and it fetches user profiles for any new user IDs from our main user database.
    • PostModel and CommentModel both extend EntryModel, and they differ only by their url (as the APIs distinguish between post and comment). PostCollection and CommentCollection are just collections of those models.
    • EntryModel extends Backbone.RelationalModel and is used for saving individual posts and comments - creating new ones and editing existing ones. EntryModel is never used for fetching JSON, because we always fetch on the Thread level, but it theoretically could be if we wanted a standalone entry view one day.

The Tests

We can't reasonably write so much logic in our JavaScript without also writing tests to verify that our logic is sound. We write our tests using the Mocha test runner framework and Chai assertion framework. We use Sinon to mock out our API responses with local test JSON. When we want to test our views, we use JSDOM to render a fake dom and react to fake events, and then we can test that the resulting DOM looks like what we expect. JSDOM does not do everything the browser dom (notably, it's missing content editable support), but it does an awful lot and is much faster than spinning up an actual browser.

For example, here's a snippet of a test for checking that save works as expected:

it('should save new post and render in view mode', function() {
  postView.render();

  server.respondWith("POST", getPath('/api/forum/threads/2703/posts'), 
    [200, {"Content-Type":"application/json;charset=utf-8"}, JSON.stringify(postJSON)]);

  postView.$('.course-forum-post-edit-link').click();
  postView.$('button.course-forum-post-edit-save').click();

  chai.expect(postView.$('.course-forum-post-edit-save').attr('disabled'))
    .to.be.equal('disabled');
  server.respond();
  chai.expect(postView.$('.course-forum-post-text').text())
    .to.be.equal(postJSON.post_text);
});

The Result

You can try out our forums by enrolling in a class and participating, and if you're really curious to learn more about the frontend Backbone code, you can browse our source snapshot here. We accomplished what I set out to do: be able to fix our major issues without feeling like I was hacking the code horribly, and making it easy for me to add new features on top of the codebase, and be able to test them. I imagine we have a forum rewrite v3 in the future, like if we decide to do truly real-time forums or do infinitely nested comments, but I hope that we will be able to re-use much of what we developed for this version. Was the rewrite worth it? I think so, but ask me again next year. ☺

Tuesday, December 4, 2012

Unit Testing our PHP Templates with Selector

I'm finally digging into the codebase that powers class.coursera.org, and it's a wild ride. The original Coursera prototype was built by a few grad students working in the co-founder's machine learning research labs, and like all scrappy prototypes, it was just meant to test whether the whole massively online class idea had any merit to it. As it turns out, it did, and that prototype went on to serve the next class, then the next class, until finally today, it's turned into the code that's serving 32 live courses. Needless to say, those grad students didn't realize when they were first building the codebase that it would one day be handed over to a team of bright and eager engineers who had never seen it before, so its not the most built on the most beautiful architecture or built around an open source framework.

Well, what is it then? It's a PHP codebase, with a custom built router, templating, and SQL querying engine, and it's a fair bit of code. When I first started here, I figured it wouldn't be that much code, based on what I'd seen as a student - but I didn't take into account just how many administrative interfaces power what students see. The professors need to upload their lectures, design quizzes (with variations and pattern-based grading), create peer assessments, view statistics, calculate grades, issue certificates, ban students for cheating, etc, etc. Now that I've dug into it, I realize how much we enable on our platform - and because of that, I realize how important it is to test our platform when we make changes.

Unfortunately, our legacy codebase didn't exactly have a lot of testing when I arrived (I will leave it as an exercise to the reader to figure out how much it did have), so now that I am making changes in it, I'm adding unit tests as I go along, which includes figuring out how to test the different aspects of the codebase, plus how to mock out functions and data.

Testing Templates

My most recent changes have all revolved around fixing broken "messaging" — making sure that students understand deadlines, that they know how many submissions they have left, that they know why they got the score they did — and much of that comes down to figuring out what strings to show to users and what should be rendered in the HTML templates.

Some people may argue that you shouldn't be testing HTML output, because that's simply your presentation, but I would argue that your presentation should be tested, because if a user sees a yellow warning instead of a red one when they've passed the deadline, then that's a bug. Some could also argue that no logic whatsoever should be in your templates, but well, I've never managed to do completely logic-less templates in an app, and I didn't attempt to tackle that goal here.

I started out testing the rendered HTML by testing for string equality or containment , but of course, that was horribly brittle and broke whenever I changed the slightest thing. I soon moved on to using Selector, a library that accepts a string of HTML and lets you query its contents as a DOM, so that you can check for elements and their attributes. It's a better technique because you can check for what matters (like class names) and ignore what doesn't (like whitespace).

As an example, here's the test for our quiz start screen, to make sure that it renders the appropriate data, start button, and an alert message given the passed in parameters.

function test_quiz_start_template() {
    $fake_quiz = $this->fake_quiz(1);

    $rendered = _prepare_template('A:app:quiz:start',
                          array(
                            'quiz' => $fake_quiz,
                            'view_state' => array('time' => 'before_soft_close_time',
                                                  'message' => ''),
                            'retry_delay' => '600',
                            'can_start' => true,
    ));
    $this->verify_quiz_start_template($rendered, 'Untimed', '0/100', '10 minutes 0 seconds', true, false);
$fake_quiz['duration'] = '100';
    $rendered = _prepare_template('A:app:quiz:start',
                          array(
                            'quiz' => $fake_quiz,
                            'view_state' => array('time' => 'before_soft_close_time',
                                                  ),
                            'message' => 'Warning!',
                            'retry_delay' => '0',
                            'can_start' => false,
    ));
    $this->verify_quiz_start_template($rendered, '1 minute 40 seconds', '0/100', '1 second', false, 'Warning!');
}

Notice how that function calls another function to actually do the verifying. Since I'm usually testing the output of a particular template with multiple sets of parameters, I typically make a single verify function that can be used for verifying the desired results, to avoid repeating myself. Here's what that function looks like, and this is the function that actually uses that Selector library:

function verify_quiz_start_template($rendered, $duration, $attempts, $retry, $start_button, $message) {
    $dom = new SelectorDOM($rendered);
    $this->verify_element_text_equals($dom, 'tr:contains(Duration) td', $duration);
    $this->verify_element_text_equals($dom, 'tr:contains(Retry) td', $retry);
    $this->verify_element_text_equals($dom, 'tr:contains(Attempts) td', $attempts);
    if ($start_button) {
        $this->verify_element_exists($dom, 'input.success');
    } else {
        $this->verify_element_doesnt_exist($dom, 'input.success');
    }
    if ($message) {
        $this->verify_element_text_equals($dom, '.course-quiz-start-alert', $message);
    } else {
        $this->verify_element_doesnt_exist($dom, '.course-quiz-start-alert');
    }
}

Okay, well, now you might notice that I'm calling a lot of verify_* functions inside. Those are functions that I've defined in my base TestCase class, so that I can use them anywhere where I want to test DOM output using the Selector library. Here are all the helper functions I've written so far:

function verify_and_find_element($dom, $selector) {
    $selected = $dom->select($selector);
    if (count($selected) == 0) {
      print 'Failed to find ' . $selector;
    }
    $this->assertTrue(count($selected) > 0);
    return $selected;
}

function verify_element_exists($dom, $selector) {
    $this->verify_and_find_element($dom, $selector);
}

function verify_element_doesnt_exist($dom, $selector) {
    $selected = $dom->select($selector);
    $this->assertTrue(count($selected) == 0);
}

function verify_element_text_equals($dom, $selector, $text) {
    $selected = $this->verify_and_find_element($dom, $selector);
    $this->assertEquals(trim($selected[0]['text']), trim($text));
}

function verify_element_attribute_equals($dom, $selector, $attribute, $text) {
    $selected = $this->verify_and_find_element($dom, $selector);
    $this->assertEquals(trim($selected[0]['attributes'][$attribute]), trim($text));
}

Please know that I am not a PHP expert. I used it in college to put together websites and at Google to show developers how to use the Maps API in a few articles, but I never worked on any sizable piece of software written in it. I'm trying to wrap my head about the best practices for PHP codebases, in terms of testing, architecture, and object-oriented design, and this may not be the best way to test template output. I'd love to hear your recommendations in the comments.

Oh, and yes, yes, we don't want this codebase to be PHP-powered forever — but re-writing it will take time and will be much easier once I fully understand it from all these tests I'm writing. ☺