Ticklish Techs

Jul 31

How to catch SqlExceptions and don’t get into trouble about aborted transactions by SQL Server (or: how is it possible that a transaction is only executed halfway)

by Benjamin Tags: coding, database, dotnet, sql, sqlserver
No Comments »

We witnessed a strange behavior in one of our projects that caused me some headache. Unfortunately, we could only observe it in production environment and never reproduce it. I took a look at the databases and could not imagine any circumstances that lead to such data. One day I said:

It looks like the transaction was only executed halfway or like a broken rollback.

But we were using Microsoft SQL Server and I could not imagine the existence of a bug that does such a thing and wasn’t already fixed to the current version.

By luck I got a clue and I could fix it in our code. Here is what happened:

The database

For the sake of simplicity let’s use this database with just two tables:

Table Data
- Id (PK, int, not null)
- Data (varchar(50), not null)
Table Log
- Id (PK, uniqueidentifier, not null, autovalue)
- Text (varchar(max), not null)

The code

try
{
 using (var transaction = new TransactionScope(TransactionScopeOption.Required))
 using (var connection = new SqlConnection(ConnectionString))

 {
 connection.Open();

    var insertDataCmd = connection.CreateCommand();
    insertDataCmd.CommandText = "INSERT INTO Data(Id, Data) VALUES(1, 'Foo')";

    try
    {
          insertDataCmd.ExecuteNonQuery();
    }
    catch (SqlException e)
    {
         Console.WriteLine(e.Message);
         // for some reason i don't care for this PK conflict
     }

    var insertLogCmd = connection.CreateCommand();
    insertLogCmd.CommandText = "INSERT INTO Log(Text) VALUES('this should not find its way to the database')";
    insertLogCmd.ExecuteNonQuery();     
    // don't commit this transaction       transaction.Complete();
 }
}
catch (Exception e)
{
    Console.WriteLine(e.Message);
}

We use TransactionScope to handle transactions. The first INSERT (with Id = 1) will throw an exception since a record with this id already exists. I don’t care about this exception, catch it and go on. Now I insert something into the log table. Finally, I will not commit the transaction at all.

This is a very artificial program to show just one event that has happened.

After running this piece of code I can find a new record in the log table even if the transaction was never committed! – That scared me!

The analysis

The first insert does not only throw a primary-key-already-exists-exception but the SQL Server itself aborted the transaction at this point. The following commands are executed without any transaction or transaction scope at all. I couldn’t imagine that this can happen with that code!

The reason for this behavior of SQL Server was a trigger like this:

CREATE TRIGGER InsteadOfInsertIntoData
 ON [Data]
 INSTEAD OF INSERT
 AS
 BEGIN
   INSERT INTO [Data] (Id, Data)
   SELECT
   Id,
   [Data]
   FROM inserted
 END
 GO

On tables without such a trigger my code works great. I can catch the exception and go on.

With this trigger, the PK error does not raise in user code but within the trigger. If an error happens within a trigger the transaction is always aborted.

(Please don’t start a discussion about this trigger itself. I don’t like it, but it was in the database. Cannot argue about it.)

Maybe with more explicated transaction handling

A colleague pointed out, that TransactionScope may be the problem and I should try explicit transaction handling with the pain old Transaction-class.

try
{
 using (var connection = new SqlConnection(ConnectionString))
 {
   connection.Open();
   var transaction = connection.BeginTransaction();
   var insertDataCmd = connection.CreateCommand();
   insertDataCmd.CommandText = "INSERT INTO Data(Id, Data) VALUES(1, 'Foo')";
   insertDataCmd.Transaction = transaction;

   try
   {
       insertDataCmd.ExecuteNonQuery();
   }
   catch (SqlException e)
   {
       Console.WriteLine(e.Message);
       // for some reason i don't care for this PK conflict
   }
    
    var insertLogCmd = connection.CreateCommand();
    insertLogCmd.CommandText = "INSERT INTO Log(Text) VALUES('this should not find its way to the database')";
    insertLogCmd.Transaction = transaction;
    insertLogCmd.ExecuteNonQuery();
    transaction.Rollback();
  }
}
catch (Exception e)
{
      Console.WriteLine(e.Message);
}

But it does not change much: the INSERT INTO Log got written to the database even with the Rollback() call at the end.

But you can observe something interesting on the transaction instance after SQL Server has aborted the transaction: The transaction-object got a private (!) property names “IsZombied” that got true afterwards. Buts its private, so I cannot access it in my code.

At the same moment the public property Connection of transaction lost its value and got null. A clue one could check but not very explicit and I don’t like to check for such a property every time I want to call ExectuteNonQuery().

A better solution (?)

I figured out that I can dig deeper into the SqlException and check if this exception aborted my transaction:

private static bool WasTransactionAborted(SqlException ex)
{
  foreach (SqlError error in ex.Errors)
  {
     // The statement has been terminated.
     if (error.Number == 3621)
        return true;
  }
  return false;
}

If any of the SqlErrors within the SqlException was of number 3621 then the transaction is dead and I don’t want to run any more code against my SQL Server connection.

The code from above can be changed at the point where I catch (and ignore) the exception to check for the described condition and don’t ignore that one.

 try
 { 
    cmd.ExecuteNonQuery();
 }
 catch (SqlException e) when (WasTransactionAborted(e))
 {
    Console.WriteLine("That's it - sorry - over and out");
    throw;
 }
 catch(SqlException e)
 {
    Console.WriteLine(e.Message);
    // for some reason I don't care for this PK conflict
 }

If the transaction was aborted we re-throw the exception end quit this method. Otherwise I can safely ignore the exception and go on.

Jan 06

Accessing git repositories with hg (mercurial) tools

by Benjamin Tags: coding, git, github, hg, mercurial
No Comments »

Preface

Once upon a time we switched from subversion to Mercurial (or hg) as source control system. We choose hg because at that time the tool support on windows was much better for mercurial as for git. Not all of us wanted to go the console-only approach and I still love my TortoiseHg very much. We are very happy with our hg setup and have no intension to move away from it.

But from time to time I come across a git repository I want to clone or I have to collaborate with somebody else on a git repository. Soon I’m planning to release a project on github since this is currently the community to go if you want to publish your stuff. But still I would like to stick with only a small set of tools.

The Hg-Git Mercurial Plugin

Instead of getting git to my machine (in fact it is running here anyway) I found the Hg-Git Mercurial Plugin. This allows you to use hg on your local box but pull and push to a git repository.

On first sight it works pretty nice but I had some challenges to overcome when installing it. And that’s what this post will be about.

Installing the Plugin

The homepage of the Plugin itself can be found at http://hg-git.github.com/ with the source code available at github and bitbucket.

Even with easy_install available on my windows machine I could not install the plugin as described at “Installing Hg-Git the easy way” but the other way “Installing Hg-Git manually” is not much work at all. Dulwich was installed when I tried the easy way before, so I only had to get the code and edit the hg config file.

Get authentication working with github

For each repository over at github the SSH url is shown at the website like
git@github.com:irgendwienet/testing.git

To use this with your hg installation just prefix it with git+ssh:// like
git+ssh://git@github.com:irgendwienet/testing.git

But then a “PuTTY Fatal Error” in a message box pops up “Disconnected: No supported authentication methods available (server sent: publickey)”

So you have to do two things: upload your public ssh key to github and explain your local system to connect using your private key.

To upload a public key just log into github, go to Account Settings and into the SSH Keys section.

Then open putty.exe and configure the following:

Host Name = github.com
Connection / SSH / Auth / Private key file for authentication = your key file

Save this session as “github” (the name will be imported during the next step).

Now try cloning the repository again but instead of the actual URL (github.com) use the name of the session (just “github” in my case). Putty looks for this session and finds the actual URL with the key.
git+ssh://git@github:irgendwienet/testing.git

Jan 03

Developing R#-PlugIns with a little comfort?

by Wolfram Tags: coding, dotnet, resharper, visual studio, visual studio 2010
No Comments »

I am playing around with R#-plugins these days. I have started off with this nice introduction. Unfortunately it silences about some problems that may occur to you in the beginning.

1) When trying to copy the given source code you’ll mention that all the classes are unknown to Visual Studio, because all a lot of references are missing. So far I have no clue about the R#-class-architecture. So I started .NET Reflector, open ALL the .dlls in the R#/bin-folder into Reflector and use the search to find the .dlls to reference in my project.

2) When you have your first R#-plugin ready you want to test it. To do so you have to copy the dll containing your plugin in the R#-plugin folder. You may want to do this in a post-build event like this:

copy $(TargetPath) "C:\Program Files (x86)\JetBrains\ReSharper\v5.1\Bin\PlugIns\WobTest"

When you build your project with this post-build event it fails. This doesn’t work for two reasons. First: Win 7 denies access to that folder. Second: after another start of Visual Studio R# has loaded the plugin so you can’t copy a newer version over it.

My solution to this is a .cmd that starts Visual Studio without R# (devenv /safemode). I run this .cmd as administrator.

Now at least I can debug my plugins by starting another instance of Visual Studio (not in safemode!)

This solution is not cool, because developing R#-plugin without using R# in the development-process sucks…. any ideas?

a better solution…

Some minutes after I published this post, Benjamin pointed me to this page. Forget about my lines to 2). You don’t need a silly .cmd run as administrator not a post-build event. All you need to do is to start your debugging Visual Studio the the command line option /ReSharper.Plugin followed by the path to your plugin-dll.
/ReSharper.Plugin "R:\sandbox\ReSharperPlugInTest\PublicGoesVirtual\PublicGoesVirtual\bin\Debug\PublicGoesVirtual.dll"

Jul 23

Be careful with certain ctors of XmlSerializer

by Benjamin Tags: coding, dotnet, performance
No Comments »

In my current project we’re using XmlSerializer a lot. At some point I need to write about 400 files with the XmlSerializer. For some reason I created a new XmlSerializer for each of these files.

I discovered during debugging a lot of debug messages in Visual Studio output window like:

'Application.vshost.exe' (Managed (v4.0.30319)): Loaded 'm0dayvr5'
'Application.vshost.exe' (Managed (v4.0.30319)): Loaded 'oxxqw1rq'
'Application.vshost.exe' (Managed (v4.0.30319)): Loaded 'dgh3mgtl'
'Application.vshost.exe' (Managed (v4.0.30319)): Loaded '00sdpqlv'
'Application.vshost.exe' (Managed (v4.0.30319)): Loaded 'yokpozj4'

Then I remembered that the XmlSerializer creates dynamic assemblies during runtime and loaded them into the AppDomain (and never unloads these assemblies since you cannot unload from an AppDoamin). In the MSDN article I found a solution that explains this behavior:

Dynamically Generated Assemblies

To increase performance, the XML serialization infrastructure dynamically generates assemblies to serialize and deserialize specified types. The infrastructure finds and reuses those assemblies. This behavior occurs only when using the following constructors:

XmlSerializer.XmlSerializer(Type)

XmlSerializer.XmlSerializer(Type, String)

If you use any of the other constructors, multiple versions of the same assembly are generated and never unloaded, which results in a memory leak and poor performance. The easiest solution is to use one of the previously mentioned two constructors. Otherwise, you must cache the assemblies in a Hashtable, as shown in the following example.

Of course you can use any other data structure to do the caching. But you have to do it on your own if you do not use the two mentioned constructors. I’ve got no idea why Microsoft builds caching into the code for some constructors and not for others, but that’s the way it is.

Some performance measurements

I didn’t build a great benchmark, but at least I got a few numbers. Serializing multiple files without caching takes on my machine about 140ms in debug mode and 130ms in release mode. Each time. With caching only the first file takes this 140 / 130ms and every additional file took about 8ms (debug) or 6ms (release).

That’s a speedup of about 17x to 20x!

And you do not produce memory leaks!

Mar 04

Be careful when using GetCallingAssembly() and always use the release build for testing

by Benjamin Tags: coding, debugging, dotnet, dotnet3.5
1 Comment »

This looks like such a innocent method but it lead to big trouble in one of my projects. But lets start with when someone would use this method that is declared as a static method in the Assembly class. In the MSDN you can read:

Returns the Assembly of the method that invoked the currently executing method.

In our project we had an assembly with a lot of helper methods. On of these gets resources from the calling assembly. In various places of our code we called this method to get icons or other resources. This method used exactly this GetCallingAssembly() method to figure out what assembly to look for resources.

That worked pretty good in debug mode but exceptions were thrown in release mode. We could not understand what is going on. It became even worse: when we build a release version and tried to debug that version (using Visual Studio Debugger) in worked again. It looked like a heisenbug.

It took us some time to figure out what is also written in MSDN:

If the method that calls the GetCallingAssembly method is expanded inline by the compiler (that is, if the compiler inserts the function body into the emitted Microsoft intermediate language (MSIL), rather than emitting a function call), then the assembly returned by the GetCallingAssembly method is the assembly containing the inline code. This might be different from the assembly that contains the original method. To ensure that a method that calls the GetCallingAssembly method is not inlined by the compiler, you can apply the MethodImplAttribute attribute with MethodImplOptions.NoInlining.

The JIT compiler moves code around to optimize for performance. Small methods (up to about 56 Byte IL-Code if I remember it right) can be inlined where the method call was before. But the compiler does this only in release, not in debug mode. Also when attaching the debugger to our release build the JIT compiler stopped inlining to enable debugging and our bug was gone.

After understanding this, the fix is easy. Just don’t allow the compiler to inline that particular method that calls Assembly.GetCallingAssembly(). Then the method stays in the assembly where the source code is written and everything will be fine.

[MethodImplAttribute(MethodImplOptions.NoInlining)]
public void SomeFunction(int i)
{
    // ...
    var a = Assembly.GetCallingAssembly();
    // ...
}

This attribute does the trick and I recommend to use it on all methods that call GetCallingAssembly() and can be called form another assembly and need the real calling assembly.

Feb 15

A Change event for Dependency Properties

by Benjamin Tags: coding, dotnet, dotnet3.5, wpf
No Comments »

WPF comes with Dependency Properties and everybody using WPF will know about these new kind of properties. When you define your own Dependency Properties in your own class you can pretty easy add a property change event handler:

public int MyProperty
{
    get { return (int)GetValue(MyPropertyProperty); }
    set { SetValue(MyPropertyProperty, value); }
}
public static readonly DependencyProperty MyPropertyProperty =
    DependencyProperty.Register("MyProperty",
                        typeof(int),
                        typeof(MainWindow),
                        new UIPropertyMetadata(0,
                            MyPropertyvalueChangeCallback));

private static void MyPropertyvalueChangeCallback
                        (DependencyObject d,
                         DependencyPropertyChangedEventArgs e)
{
}

But how to add such a event handler to an already existing Dependency Property or somewhere else then in the defining class? E.g. to an property of a WPF-Control that was not build by you. Take a standard WPF-TextBox; both the Text and the FontSize properties are Dependency Properties but the TextBox-class only provides a change event for the Text-property. Nevertheless you can get a change event for any Dependency Property:

DependencyPropertyDescriptor dpd =
    DependencyPropertyDescriptor.FromProperty
        (Control.FontSizeProperty, typeof (TextBox));
dpd.AddValueChanged(someTextBox, SomeTextBoxFontSizeChanged);

Every time the FontSizeProperty on the instance someTextBox changes the given method is called. It’s that easy and you can implement this code everywhere not only within the class that defines the property.

Jan 10

Shell-Lib reloaded: Another OS, another bug.

by Wolfram Tags: coding, dotnet, marshalling, windows7
No Comments »

Some time ago we blogged about a nifty little .net assembly that enables you to access the basic Windows shell operations in shell32.dll . You may want to do so especially in Win7 because your file operations integrate with the fancy Win7 dialogs, the flashing progress bar etc.

The problem we blogged about in the former post was a missing "Pack"-instruction in the marshalling-description of a structure. I guess we didn’t test it with WinXP64… but some days ago we tests it with Win7/64 and it failed. We figured out that we introduced a new bug that occurred on 64 bit machines only. After some more testing we found out that:

The original code (without Pack-value) crashes on 32 bit machines (as described in the former post).
The original code (without Pack-value) works fine on 64 bit machines (but we don’t think the original developer had 64 bits 7 years ago 😉 ).
Our fixed code (with Pack=2) works fine on 32 bit machines.
Our fixed code (with Pack=2) crashes on 64 bit machines.

It seems that the "SHFILEOPSTRUCT" structure in shell32.dll has different layouts in Win7/64 and Win7/32. I guess the guys at MS aim for performance and tell the compiler to optimize. The compiler optimizes by assuming the optimal Pack-value, which usually is the byte-width of the processor; 4 for 32bit, 8 for 64bit.

What effect does the Pack-value have at all?

The "Pack"-value controls the alignment of the starting addresses of each single member of a structure (often called "offset").

Let’s say you have this struct:


struct MammaMia      
{       
    public int16 sweet16;       
    public byte bite;       
    public string tanga;       
}

Usually you don’t care for the exact way your data is stored in memory, but when you pass it from .NET to the Win-API you have to make sure Win-API receives and returns exactly the right format. This process is called "Marshalling".

One crucial instruction for doing so is the StructLayout attribute.


[StructLayout(LayoutKind.Sequential, Pack = 1)]      
struct MammaMia       
{       
    public int16 sweet16;       
    public byte bite;       
    public string tanga;       
}

(Note: There are a lot more options for the StructLayout attribute and a lot more attributes that help you at marshalling.)

LayoutKind.Sequential means that your data is represented in memory in the same order as you declared it: First "sweet16", than "bite" and then "tanga".

Pack=1 tells .NET that every byte can be the starting point of the next member. So .NET produces this structure in memory:

Pack=1
byte	data
0	sweet16
1	sweet16
2	bite
3	tanga
4	tanga
…	…
20	last character of tanga

Now Pack=2 makes sure that only every 2nd byte can be the starting point of the next member. That leads – of course – to unused bytes in memory:

Pack=2
byte	data
0	sweet16
1	sweet16
2	bite
3	??? (unused)
4	tanga
5	tanga
…	…
21	last character of tanga

The higher the Pack-value, the more unused bytes you get:

Pack=4
byte	data
0	sweet16
1	sweet16
2	??? (unused)
3	??? (unused)
4	bite
5	??? (unused)
6	??? (unused)
7	??? (unused)
8	tanga
9	tanga
…	…
25	last character of tanga

Is this a waste of memory? Sure. But processors can access those addresses a bit faster than other addresses. So it’s an performance/memory tradeoff.

After some more testing, I figured out that the correct Pack-value for Win7/64 is 8, while the correct Pack-value for Win7/32 is 2 (see former post). Why 2? I’d expect 4 here. I don’t know and Microsoft wouldn’t provide the source code I guess.
Maybe in former Windows version they preferred a middle-course between performance and saving memory.

The simple solution

Now it was obvious how to make die ShellLib-assembly work with 32 and 64bit: Check what machine we run on and use a different marshalling. Since I had to declare different structs here, the major code is copying data back and forth, not very interesting.

The only interesting part here is: How do you check on what kind of machine you run? I was looking for some property at System.Environment, but didn’t find anything. Then Andreas pointed me to this post and the solution is too simple to cross one’s mind:

“Just do a IntPtr.Size (static method) and since IntPtr is designed to be an integer whose size is platform specific, it will be 8 if you are on a 64-bit machine and 4 if you are on a 32-bit machine.“

You can find more information and our fixed code soon at the codeproject.com .

Epilog

You can also do the marshalling all by yourself, starting with memory allocation and ending with freeing memory, just like in the old times. But this is another story and will be told later.

Maybe.

Jul 14

Hunting high and low

by Wolfram Tags: algorithm, coding, dotnet, maths
6 Comments »

Here is a nice little challenge, Benjamin came up with some months ago: “From a list of numbers find the smallest and the largest one – but do not use more than 1,5 comparisons per number.”

Dear reader: Try it!

The solution is simple, straight forward and very nice. And I didn’t find it.

I found something else. My approach was to look at the largest-number-so-far (max) and the smallest-number-so-far (min) as the boundaries of a region. A number outside this region must be larger than max or smaller than min and causes that boundary to change.

To check if a number (a) was outside that region a little computation came to my mind:

check = (max - a)*(min - a);

You would expect n to be smaller than max, so (max-n) should be positive.

You would expect n to be larger than min, so (n-min) should be positive, too.

Two positive numbers multiplied result in a positive number again.

So, if n was outside the boundaries, check would become negative. I only had to check which boundary was exceeded and that’s it:

if (check < 0)
    if (a > max)
        max = a;
    else
        min = a;

Using this approach the number of comparisons converges to 1 when the length of the list grows. So I found a solution that is way better than 1,5 comparisons per number, didn’t I?

Boooooh

No, I didn’t. I cheated. In fact (max-a) and (min-a) are comparisons, they just don’t use > or <.

(Actually it’s the other way around: To compute < or > most processors do a subtraction and compare the result to 0.)

So – if you count the subtractions as well you get 3+ comparisons per number…

The intended solution to the challenge (and the code you should provide in your exam) is:

a = list[count++];
b = list[count++];

if (a<b)
{
    if (a<min) min = a;
    if (b>max) max = b;
}
else
{
    if (b < min) min = b;
    if (a > max) max = a;                    
}

So – what’s the point of this post?

The point is: My silly approach can be faster than the standard solution. Depending on the type and range of the numbers in the list calculating and probing check needs less time than the comparisons in the standard solution.

It may not be much, but sometimes small advantages matter. For example: For an integer-list of length 10^8 with values from -10.000 to 10.000 my approach is 0.02 seconds faster (on my current laptop). And has less code.

So if you feel like using it – I won’t charge you.

Apr 10

Access to modified closure

by Benjamin Tags: coding, dotnet, dotnet3.5
No Comments »

Imaging a small C# winform app with just one ListBox and the following code:

private void Form1_Load(object sender, EventArgs e)
{
    for (int i = 0; i < 5; i++)
    {
        Thread t = new Thread(() => AddToListbox(i));
        t.Start();
    }
}

private void AddToListbox(int i)
{
    if (this.InvokeRequired)
        this.Invoke(new Action<int>(AddToListbox), i);
    else
        this.listBox1.Items.Add(i);
}

A simple loop iterating over the numbers from 0 to 4 and adding these values asynchronous to a ListBox. What do you expect? I expected the numbers from 0 to 4 shown in the ListBox but in any random order since I do not control the threads in any way.

I didn’t expect any number to appear multiple times and I’m totally surprised to see the number 5!

But ReSharper gave me a hint that I often saw but never understood:

So what’s going on?

I use the syntax of an Lambda expression instead of a regular function call (e.g. using the ThreadStart class and a delegate). This Lambda expression is not evaluated until the thread uses it. And by this time, the loop can be in its next iteration. If the loop is already finished i will be 5.

That is exactly what R# tries to tell me: “Hey, you are accessing here a variable but change it later on. Maybe that is not a good idea.”. – It isn’t.

The solution

Just make a copy of i before passing it into the expression. This copy must be a private copy that will not be changed later. The easiest way to do so is declaring a variable inside the body of the loop. In every iteration of the loop a new integer will be created on the stack and the Lambda expression will access this one.

for (int i = 0; i < 5; i++)
{
    int copy = i;
    Thread t = new Thread(() => AddToListbox(copy));
    t.Start();
}

Jan 21

x += x++;

by Benjamin Tags: coding, dotnet
1 Comment »

Ok, what does this expression do or mean? First of all: never write something like this in your code. Nobody understands the meaning with a single look and that will lead to confusion and maybe to a bug in your software someday.

Now let’s take a deeper look. First you may think about the difference between x++ and ++x. If you write it as a single statement both expressions are equal, but if you use it as a part of a complex expression the first will evaluated to the value of x before increasing it; the second one will evaluate to the new value of x. So y = x++; leads to a different result for y as y = ++x;.

x+=a simply is a ‘shortcut’ for x=x+a.

Now let’s do it step by step. For example x is 5. Then first x will be increased by one to 6 but the old value will go into the formula that remains as x=x+5. Since x was increased before the result will be 11.

If you think that is all right, than please take a break and test it with your favorite compiler. If you are a C or C++ guy you will in fact receive 11 as an answer and everything is fine. But if you are a C# or java guy x will be 10. Why?

.Net as well as the Java Runtime are stack machines. The expression will be put on the stack step by step before evaluating the whole thing. At that time x is 5. x will be changed to 6 by the x++ part, but that only happens in the main memory. The old value (5) is still on the stack. After executing the whole expression the changed x will be overwritten by 10 (5+5).

And once again: NEVER write code like this!