6/23/2023 0 Comments Chrome extension mouseless 2018Even though we cache line break positions in every node, we don't know which absolute line number is in which node. Say we load a 64MB file, the piece table will have 1000 nodes. With string concatenation out of the way, we can now open large files but this leads us to another potential performance issue. Piece table is a data structure used to represent a series of edits on a text document (source code in TypeScript): class PieceTable Boost line lookup by using a balanced binary tree Avoiding too much meta-data by using a piece table After reviewing various data structures, I found that piece table may be a good candidate to start with. Thus, we started looking for data structures that require less metadata. In a perfect world, we would store only the text of the file and no additional metadata. The old line array representation can take a lot of time to create and consumes a lot of memory, but it gives fast line look-up. The split itself hurts performance which you'll see in benchmarks further down. To construct the array of lines, we had to split the content by line breaks, such that we would get a string object per line. That's roughly 20 times the initial file size!Īnother problem with the line array representation was the speed of opening a file. We would create a ModelLine object for each line and every object used around 40-60 bytes, so the line array used around 600MB memory to store the document. The root cause was that the file had too many lines, 13.7 million. For example, one user failed to open a 35 MB file. However, we kept receiving reports that opening certain files would cause Out-Of-Memory crashes in VS Code. When inserting a new line, we spliced a new line object into the line array and the JavaScript engine would do the heavy lifting for us. When the user is typing, we located the line to modify in the array and replaced it. We used an array of lines, and it worked pretty well because typical text documents are relatively small. Although simple, the text buffer implementation powering VS Code hadn't changed much since the first day we kicked off the Monaco project. Developers read and write source code line by line, compilers provide line/column based diagnostics, stack traces contain line numbers, tokenization engines run line by line, etc. The mental model for an editor is line based. Even without low level engine tricks, it is still possible to improve speed by one or more orders of magnitude by using better suited data structures and faster algorithms. Inspiring blog posts like this one from Vyacheslav Egorov show ways to push a JavaScript engine to its limits and squeeze out as much performance as possible. Not going native, we had to find ways to improve our JavaScript/TypeScript code. We will discuss this in more detail at the end of this post. Converting strings between a custom native representation and V8's strings is costly and in our case, compromised any performance gained from implementing text buffer operations in C++. During an in-depth exploration, we found that a C++ implementation of the text buffer could lead to significant memory savings, but we didn't see the performance enhancements we were hoping for. For the VS Code text buffer, these discussions started more than a year ago. Performance discussions about JavaScript programs usually involve a discussion about how much should be implemented in native code. In this blog post, I'd like to tell the story of how we selected and designed the data structures and algorithms that led to those improvements. Maby Peng Lyu, Visual Studio Code 1.21 release includes a brand new text buffer implementation which is much more performant, both in terms of speed and memory usage. Node.js Development with Visual Studio Code and Azure.Moving from Local to Remote Development.
0 Comments
Leave a Reply. |