Accelerated erasure coding system and method – StreamScale

The Accelerated erasure coding system and method software patent was filed by StreamScale, a patent holding company, and granted by the US patent office in march 2014 (filed july 2013). It claims to own the idea to use SIMD instructions to speed up the computation of Erasure Code. It is a patent-ineligible abstract idea and can be ignored.

The abstract can be translated as follows (original in italics, translation in bold):

  • An accelerated erasure coding system includes a processing core for executing computer instructions and accessing data from a main memory, and a non-volatile storage medium for storing the computer instructions. It runs on a computer with RAM and disks.
  • The processing core, storage medium, and computer instructions are configured to implement an erasure coding system, which includes: a data matrix for holding original data in the main memory; a check matrix for holding check data in the main memory; an encoding matrix for holding first factors in the main memory, the first factors being for encoding the original data into the check data; and a thread for executing on the processing core. The accelerated erasure code function input is in RAM.
  • The thread includes: a parallel multiplier for concurrently multiplying multiple entries of the data matrix by a single entry of the encoding matrix; and a first sequencer for ordering operations through the data matrix and the encoding matrix using the parallel multiplier to generate the check data.¬† The SIMD instructions of the processor are used to accelerate the erasure encoding function.

The only claim of substance (23 out of 40) can be translated as follows:

  • A method of accelerated error-correcting code (ECC) processing on a computing system comprising a non-volatile storage medium, a processing core for accessing instructions and data from a main memory, and a computer program comprising a plurality of computer instructions for implementing an erasure coding system, An algorithm implementing erasure code,
  • the method comprising: storing the computer program on the non-volatile storage medium; executing the computer instructions on the processing core; using disk, CPU and RAM;
  • arranging original data as a data matrix in the main memory; arranging first factors as an encoding matrix in the main memory, the first factors being for encoding the original data into check data, loads a data from disk and into a matrix and
  • the check data being arranged as a check matrix in the main memory;and computes coding chunks
  • and generating the check data using a parallel multiplier for concurrently multiplying multiple data entries of a matrix by a single factor, the generating of the check data comprising ordering operations through the data matrix and the encoding matrix using the parallel multiplier. using SIMD instructions of the processor.

After a failed attempt to bully USENIX, StreamScale  intimidated (i.e. there was no lawsuit and therefore no ruling) James Plank, a known researcher in the field, also author of widely used Free Software libraries using the same techniques as those described in the linux kernel. James Plank agreed to publish the following on his web site as part of a settlement, presumably in exchange for a promise from StreamScale to not threaten to sue him in the future.

On this page I (James Plank) am providing notice that:

  • GF-Complete and Jerasure versions 2.0 and later are no longer supported.
  • StreamScale, Inc. offers a similar solution for commercial purposes.
  • I offer no representations or warranties in general about StreamScale’s products.
  • I have verified that StreamScale’s solution is faster than GF-Complete or Jerasure in at least some respects.
  • StreamScale, Inc. asserts that the use of GF-Complete (particularly as part of Jerasure 2.0 or later) or any similar software, method or code for erasure coding infringes StreamScale’s issued United States Patent No. 8,683,296.
  • I express no opinion on StreamScale’s claims, but I believe that parties should be aware that StreamScale asserts such claims.

The repositories on which James Plank published the software implementing the ideas from his research papers ( gf-complete and jerasure ) have been removed from bitbucket the same day, meaning James Plank had to agree to never work on implementing erasure coded software in the future.

The most prominent prior art invalidating this patent is the RAID6 (one of the most commonly used Erasure Code) implementation of the linux kernel. In an article dated 2004 (i.e. ten years before the patent was granted to StreamScale) it is described to be optimized as follows : For additional speed improvements, it is desirable to use any integer vector instruction set that happens to be available on the machine, such as MMX or SSE-2 on x86, AltiVec on PowerPC, etc. Where SSE2 is the acronym of Streaming SIMD Extensions 2. The patent cites Anvin aticle’s but only to state the problem and does not acknowledge it also contains the solution.

StreamScale pressured James Plank into withdrawing the software he published and this demonstrates there is nothing more than SIMD optimization to the patent claims. The gf-complete code can be audited to verify it only contains SIMD optimizations of known mathematical functions used to perform Erasure Code encoding and decoding. If the patent was about something else, StreamScale would have no interest in the GF complete implementation.

As of December 2014, the development of jerasure and gf-complete has a new home at jerasure.org. The most prominent addition to GF-Complete since James Plank was forced to quit is the support for the SIMD instruction set of ARMv8.

In March 2015, Red Hat announced world wide availability of Red Hat Ceph Storage including Jerasure and GF-Complete as found in the corresponding source branch.