The Odd-yssey: My Epic Journey Chasing Down Webpage Performance Issues
| July 2, 2014 | in
No debugging adventure is boring.
As a case in point, I was recently trying to nail down the cause of some weird behavior experienced by a specific customer with a specific setup on a specific mobile device running EliteForm’s Paperless application. The majority of the work on Paperless was completed by our student-lead Design Studio team so I wasn’t fully sure of all the nuances in the code. They did an awesome job with the app and ran into many of these issues along the way, but as often happens with code there’s always another bug hiding somewhere.
I was determined to find it.
When an athlete logs into Paperless, they can be presented with a large number of photos; one for each member of the team. The more images there are, the more laggy the application becomes.
The Console Error
One of the first things I did was to run the application on my desktop with 10x as many athletes as our customer. I opened the Developer Tools in Chrome and noticed right away a bug in the third-party Fastbutton library we were using. The iPad sometimes sends in null touches on the event.
What’s Fastbutton? Many mobile browsers have a default 300ms delay for clicked links. This allows the browser to determine whether a single or double-click was performed. Unfortunately, 300ms is way too slow for users. Fastbutton can eliminate that delay.
I forked and made a pull request to the open source library and tested again.
Error gone, but still no fix for the lag.
Validate That HTML
Invalid HTML makes browsers have to guess in rendering and this can cause many issues. I ran the HTML through the W3C’s validator and noticed we had a couple issues. A duplicate Id, missing browser prefixes for ellipses, invalid HTML comments (you can’t have more than two dashes), a space in an Id, invalid attributes, missing src attributes for bound items, and a couple other minor things.
It’s valid, but still no fix for the lag.
Next, it was on to the Chrome Profiler. I ran a CPU profile and noticed it had a lot of program time.
What’s program time? It’s Chrome calling into its native code when any number of actions are performed (e.g., garbage collection, network, I/O, etc). What made this weird was it was being called even when I wasn’t doing anything.
Something’s not right.
Paint the Town Beige
I hopped over to IE to check out the UI Responsiveness (turns out Chrome also has this, under “Timeline”). See all those Paint calls? That’s the browser having to repaint part of the screen (something it won’t do unless it has to).
That’s weird. Nothing is changing or moving.
Chrome also showed me this Timer event, shown in yellow, firing (seemingly) before every Paint call.
Looking at the source, it was from our URL hash router, SammyJS. Sammy watches the location bar in your browser and looks for changes (in case your browser doesn’t support events on location). Well, Sammy must be doing something on the page to cause a repaint!
I bumped the timer duration up from 200ms to 10000ms. This should help, right? Nope, it was still repainting every 50-100ms.
I enabled Chrome’s Show paint rectangles option in the Rendering tab on the Console to highlight any Paint calls. It showed me where the painting was occurring. All of the athlete images were constantly being repainted.
Inspecting the element showed the culprit. The loading spinner!
Because of how we retrieve athlete images, we place this gif as the background behind each image. It then gets covered up by the loaded image.
We store athlete images as PNGs, which can have transparency. The browser says, “There’s an animation going on behind this possibly transparent image. I’d better repaint!” This happens even if the picture has no transparent parts. I can’t hold a grudge against the browser, it didn’t know any better. 🙂
I also found that the Enable continuous page repainting option can also be useful for finding elements that paint particularly slow.
Removing that spinner background after the image is loaded fixed the issue on some devices, but the iPad was still a bit laggy.
The last piece was to change our old FastClick implementation over to one from FT Labs. The version we were using had a lot of move, touch, and mouse handlers on the entire page and individual elements. This made it a nightmare when scrolling on a tablet or phone.
The Last Chapter
With all of these changes we now have a better product and a happy customer!
A lot of blog posts, technical talks, and even committed source code only shows the end result of the process. Personally, 90% of my time is spent making mistakes, failing, and learning. It’s where experience can tell you how to sniff out and fix problems. That’s what makes this job so much fun.
I’m happy we could share one of our adventures with you. Stay tuned for a sequel!