Fixing "ERROR: failed to generate sha1rnds4 instruction" (and friends)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Fixing "ERROR: failed to generate sha1rnds4 instruction" (and friends)

Jeffrey Walton-3
Hi Everyone,

This is a big FYI...

The test script has been showing a small issue when using a minimally featured cpu:

c++ -DNDEBUG -g2 -O2 -msse -msse2 -fPIC -march=native -pipe -c sha.cpp
ERROR: failed to generate sha1rnds4 instruction
ERROR: failed to generate sha1nexte instruction
ERROR: failed to generate sha1msg1 instruction
ERROR: failed to generate sha1msg2 instruction
ERROR: failed to generate sha256rnds2 instruction
ERROR: failed to generate sha256msg1 instruction
ERROR: failed to generate sha256msg2 instruction

This use case is the distro's, where they build without --march=native and then distribute the library to users with a variety of machines.

We used to have code that handled the use case. It was removed at http://github.com/weidai11/cryptopp/commit/fb6a11ff08b9. The code was removed for two reasons. First, it caused a few minor problems, like http://github.com/weidai11/cryptopp/issues/53 due to C++11 constexpr (for example, an IMM needs a CONSTEXPR or template parameter, not a function parameter). Second, GCC Bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57202 stated intrinsics would always be available is GCC version 5 and above.

With the fb6a11ff08b9 removal and the 57202 bug, I thought the new use cases would be: (1) GCC 5 users and above would get the intrinsics, and (2) GCC 4 and below would get CXX by default. GCC 4.8 and 4.9 users could install an updated compiler from backports, and move from (2) into (1). It would simplify the code and side step the 53 bug. It seemed like a good trade-off simplicity and performance while side stepping bugs.

It turns out 57202 did not enable intrinsics all the time. Additionally, it only enables intrinsics for IA32 (and not other platforms like ARM). I advised Wei incorrectly, so I got the OK to remove the code. Arg...

I want to start adding the code back incrementally. I'm going to start with SHA since its the immediate painpoint. But we need to do it for AEN-NI, PCLMUL, SSE4, and some SSSE3. And we need to do it with ARM, but ARM's a little trickier because of some assembler goodness. For the ARM assembler issue see https://sourceware.org/ml/binutils/2017-04/msg00171.html.

Jeff

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fixing "ERROR: failed to generate sha1rnds4 instruction" (and friends)

Jeffrey Walton-3

I want to start adding the code back incrementally. I'm going to start with SHA since its the immediate painpoint. But we need to do it for AEN-NI, PCLMUL, SSE4, and some SSSE3. And we need to do it with ARM, but ARM's a little trickier because of some assembler goodness. For the ARM assembler issue see <a href="https://sourceware.org/ml/binutils/2017-04/msg00171.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsourceware.org%2Fml%2Fbinutils%2F2017-04%2Fmsg00171.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF8rRxukWUMUmVJb4mKwcLnBYa78A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsourceware.org%2Fml%2Fbinutils%2F2017-04%2Fmsg00171.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF8rRxukWUMUmVJb4mKwcLnBYa78A&#39;;return true;">https://sourceware.org/ml/binutils/2017-04/msg00171.html.

Here's an example of what it looks like using CRC32: https://github.com/weidai11/cryptopp/commit/9d2455a69949 .

There are two differences from the original code that was removed. First, the workaround is in the relevant source file when possible. This keeps cpu.h from growing unbounded. Second, some inline ASM will need template versions to handle immediates. For example, here's SSSE3's shuffle:

#  define MM_SHUFFLE_EPI32(a,b) MM_SHUFFLE_EPI32_TEMPLATE<(b)>((a))

template <unsigned int b>
GCC_INLINE __m128i GCC_INLINE_ATTRIB
MM_SHUFFLE_EPI32_TEMPLATE(__m128i a)
{
    // pshufd uses imm8
    asm ("pshufd %2, %1, %0" : "+x"(a) : "x"(a), "N"(b));
    return a;
}

Jeff

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fixing "ERROR: failed to generate sha1rnds4 instruction" (and friends)

Jeffrey Walton-3
Hi Everyone,

I don't like the result of this strategy. It makes things very messy for X86 and X64. And ARMv7a and ARMv8 need a different strategy because the tricks below don't work. (The tricks don't work on ARM because linker has different behavior or ARM).

I put together some sample code that mostly takes Android's approach of splitting the source files. Its in the context of CRC-32C, which both SSE4.2 and ARMv8 provide. You can find it at https://github.com/noloader/CRC-Test.

In the split source approach, crc.cpp has the C++ implementation; while crc-simd.cpp provides the implementation for SSE4.2 and ARMv8. crc-simd.cpp provides only Calculate_SSE4() and Calculate_ARMv8().

Please take a moment to look over the sample code, and provide comments or objections.

Jeff

On Saturday, May 20, 2017 at 6:11:07 PM UTC-4, Jeffrey Walton wrote:

I want to start adding the code back incrementally. I'm going to start with SHA since its the immediate painpoint. But we need to do it for AEN-NI, PCLMUL, SSE4, and some SSSE3. And we need to do it with ARM, but ARM's a little trickier because of some assembler goodness. For the ARM assembler issue see <a href="https://sourceware.org/ml/binutils/2017-04/msg00171.html" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsourceware.org%2Fml%2Fbinutils%2F2017-04%2Fmsg00171.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF8rRxukWUMUmVJb4mKwcLnBYa78A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsourceware.org%2Fml%2Fbinutils%2F2017-04%2Fmsg00171.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNF8rRxukWUMUmVJb4mKwcLnBYa78A&#39;;return true;">https://sourceware.org/ml/binutils/2017-04/msg00171.html.

Here's an example of what it looks like using CRC32: <a href="https://github.com/weidai11/cryptopp/commit/9d2455a69949" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fweidai11%2Fcryptopp%2Fcommit%2F9d2455a69949\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHakc5xz_PT_nLOxQQRxVuVeowS9g&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fweidai11%2Fcryptopp%2Fcommit%2F9d2455a69949\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHakc5xz_PT_nLOxQQRxVuVeowS9g&#39;;return true;">https://github.com/weidai11/cryptopp/commit/9d2455a69949 .

There are two differences from the original code that was removed. First, the workaround is in the relevant source file when possible. This keeps cpu.h from growing unbounded. Second, some inline ASM will need template versions to handle immediates. For example, here's SSSE3's shuffle:

#  define MM_SHUFFLE_EPI32(a,b) MM_SHUFFLE_EPI32_TEMPLATE<(b)>((a))

template <unsigned int b>
GCC_INLINE __m128i GCC_INLINE_ATTRIB
MM_SHUFFLE_EPI32_TEMPLATE(__m128i a)
{
    // pshufd uses imm8
    asm ("pshufd %2, %1, %0" : "+x"(a) : "x"(a), "N"(b));
    return a;
}

--
--
You received this message because you are subscribed to the "Crypto++ Users" Google Group.
To unsubscribe, send an email to [hidden email].
More information about Crypto++ and this group is available at http://www.cryptopp.com.
---
You received this message because you are subscribed to the Google Groups "Crypto++ Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...