Finding Zombie Processes

Apologies in advance if this article gets a tad scary but not because of the threat of flesh-eating zombies appearing but because of some of the tools I’ll be using although I’ll show you how easy they actually are to use.

So What is a Zombie Process?

There seem to be a few different views out there as to what constitutes a zombie process on Windows operating systems.

There’s the school of thought here that they are processes which have open handles to non-existent items such as processes which have exited. The article links to a handy tool on GitHub which can identify these processes. This led me to write a PowerShell script which opens handles via the OpenProcess() API to a given process and then waits for user input or a specified amount of time before closing them. Whilst it is waiting, if the process(s) to which it has the handles open are terminated then the PowerShell process running my script is deemed by the tool as being a Zombie.

self induced zombies

It’s definitely a useful tool but my interpretation of a Zombie process is one that should be dead and thus gone completely but isn’t so this doesn’t fit the case here in my opinion since my PowerShell process is alive and well, and living in West Yorkshire, but it has handles to what are dead processes, so almost Zombies.

Helge Klein wrote a great post on finding the causes for lack of session re-use on RDS/Citrix servers, so where  session ids returned by the quser.exe command are relatively high numbers unless it has just been rebooted, which is a subject close to my heart as I see this frequently in my consultancy work and this is what initially led to my Zombie hunt. I’ve had some success with this to find Zombie sessions, mainly where processes show in task manager with a status of “Suspended”. I haven’t had any joy though in using SysInternals handle.exe utility to close all open handles and thus free up the session (which is also potentially dangerous). The “suspended” processes are closer to zombies though, in my view of the world, as they should be dead but aren’t as they are generally present in sessions which have been logged off so all processes in that session should have been terminated.

I’ve yet to find a way in PowerShell to find processes that task manager shows as suspended as that doesn’t seem to be in any property returned by Get-Process or from a WMI/CIM query on the win32_process class. The latter has an “ExecutionState” property which ought to be what I need but it is always empty so appears to not be implemented. However, I have had some success in looking for processes whose session id is for a session that no longer exists or which have no threads or open handles as that seems to be a sympton of some of these “Suspended” processes. I thought it would be a case of finding that all threads for a given process were suspended but how does that then relate to processes flagged as suspended in task manager but which have no threads at all unless task manager assumes that any process with no threads must be suspended since it has nothing that could be scheduled to run on a CPU?

Microsoft have a debugger extension command “!zombies” but I’ve never got that to show any Zombie processes as yet.

My own definition of a Zombie process is where they are not visible in any user mode tools like task manager or Process Explorer but still have an EPROCESS block in the kernel address space although these are tricky to find as you have to look for symptoms of their existence, such as Zombie sessions causing a lack of session reuse, given their inherent lack of visibility.

Finding Zombie Processes

So how do we see these EPROCESS blocks which reside in the kernel address space? Why, with a debugger of course. However, we would normally look at this in kernel dumps which we usually get after a BSoD (although there are some neat ways of converting hypervisor snapshot memory files and saved state files into dump files for VMware and Hyper-V that I’ve used).

We can point a debugger at a running machine but only if it has been started in debug mode which isn’t generally how they should be booted in BAU situations.

Fortunately, SysInternals comes to the rescue, again, in that they have a tool called livekd that in conjunction with Windows debuggers like windbg (a debugger with a GUI) or kd (a command line debugger), can debug a live system. It’s easy to use; we just need to tell it where windbg or kd are (they are in the same folder as each other when installed) and also decide what we are going to do about symbols which are what are required to convert raw memory addresses into meaningful function names. If you don’t do anything about symbols then livekd will prompt with some defaults but I prefer to first set the _NT_SYMBOL_PATH environment variable to “srv*c:\Symbols*http://msdl.microsoft.com/download/symbols” which uses Microsoft’s online symbol server and stores the downloaded symbols in the c:\Symbols folder.

So to start windbg to debug the local system, we run the following, elevated, obviously changing folders depending on where livekd.exe and windbg.exe live; where I’m using a PowerShell session, not a (legacy) cmd:

& 'C:\Program Files\sysinternals\livekd.exe' -k 'C:\Program Files (x86)\Windows Kits\10\Debuggers\x64\windbg.exe' -w -b

This will then start windbg which should look something like this:

windbg via livekd

Don’t worry about the error for symbols for livekd.sys as we can safely ignore this.

We’re now ready to view the process entries contained in EPROCESS blocks which we achieve via running the “!process 0 0″ debugger extension command where the first zero instructs it to dump all processes rather than the current process or a specific pid (which we would specify in hex, not decimal) and the second is a flags value which tells it here to just dump basic information. We type this in at the bottom where it shows the “0: kd>” prompt.

This should then give something similar to the following although it may take some time to complete if there are a lot of processes (if you’re on a remote session then minimising the windbg window can make it complete faster since it has to do less screen updating).

windbg livekd bang process

To quit the debugger, enter “q”.

So there we have it, all of the processes as far as the kernel sees it. But hang on, that’s not exactly easy to digest is it now and what does all the data mean? The Cid is actually the process id, albeit in hex, and thus the ParentCid is the parent process id for the process.

Rather than manually wading through this data, I wrote a quick PowerShell script that uses regular expressions to parse this data, which can be copied and pasted into a text file or saved directly to file via the “Write Window Text to File” option from the windbg Edit menu, correlates these processes against currently running processes via the Get-Process cmdlet and then outputs the data as below where I’ve filtered the grid view where “Running” is false:

selfservice zombies

Ignore the fact that it flags the System process as being a Zombie as it isn’t – I’ll filter that out presently.

So here we can see lots of Citrix SelfService.exe Zombie processes since there are none shown in task manager and yet the processed !process output has over 25 instances in the EPROCESS table. We don’t have any Zombie sessions here though, since the Zombie processes all exist in active or disconnected sessions. I have discovered Zombie sessions too using this technique where there are processes shown, and typically the same executables, for sessions that no longer exist and aren’t shown in task manager, etc.

You will potentially get a few false positives for processes that have exited between when the !process was executed and then when the script was run to process its output. However, I’m working on a script to automate the whole process, (weak) pun intended, by using the command line debugger kd.exe, rather than windbg, to run the !process command and then pipe its output straight into a PowerShell script to process it immediately.

The script will be made available in the publicly accessible ControlUp Script Based Actions (SBA) library although all of the available SBAs there can be used standalone without ControlUp software.

 

Advertisements

Guide to my GitHub Scripts

This article, which will be updated as new scripts are added, serves as an index to the scripts I have uploaded to GitHub with a quick summary of what the script can do and links to explanatory blog articles. The scripts are split logically into a number of GitHub repositories, namely:

Citrix

  1. DailyChecks.ps1 – allows you to get a summary of your Citrix XenApp/XenDesktop 7.x deployment emailed to you via a scheduled task to help spot issues. Blog Post
  2. End Disconnected sessions.ps1 – finds sessions disconnected over a given duration and logs them off, optionally terminating specified processes in case they are preventing logoff.
  3. Get PVS boot time stats.ps1 – pull PVS target device boot times from PVS server event logs to show fastest, slowest, mean, median and mode values with the option to send an email if thresholds are breached. Blog Post
  4. Get PVS device info.ps1 – retrieve PVS target device information from PVS servers and display their configuration along with corresponding data from Citrix Studio, Active Directory, VMware and the devices themselves such as last boot time & IP address. Selected devices can then have operations performed on them such as deleting from PVS/AD/Studio or rebooting. Blog Post
  5. Ghost Hunter.ps1 – find disconnected Citrix XenApp sessions which Studio/Director say still exist but do not and mark them such that they cannot prevent affected users from launching further published applications. Blog Post
  6. Show PVS audit trail.ps1 – collect PVS auditing events in a given date/time range and show on-screen or export to a csv file. Can also enable auditing if it is not already enabled.
  7. Show Studio Access.ps1 – show all users granted access to Citrix Studio and their access levels and optionally export to a csv file. It will recursively enumerate AD groups to show each individual user with Studio access.
  8. StoreFront Log Levels.ps1 – display and/or change the logging levels on Citrix StoreFront servers. It can operate on multiple servers from a single script invocation. Blog Post
  9. Parse storefront log files.ps1 – show Citrix StoreFront log files in a sortable and filterable consolidated view, optionally filtering on entry type and date ranges. Selected lines will be placed in the clipboard to enable further research. Blog Post
  10. Get Citrix admin logs.ps1 – retrieve the logs viewable in Studio in a given time window and write to a csv file or display in an on screen sortable/filterable grid view. The logs can be filtered on the user who performed the action, where the action was performed from, either Studio or Director, whether it was an admin or config change action and the type of action such as logoff or shadow.

Microsoft

  1. Change CPU priorities.ps1 – dynamically change the base priorities of processes which over consume CPU so other processes get preferential access to the CPU. If a process stops over consuming then its original base priority will be restored. Can include/exclude specific users, processes and sessions.
  2. Trimmer.ps1 – trim the working sets of processes to make more memory available for other processes/users on a system. Can trim on demand or when processes are unlikely to need the memory such as when a session is idle, disconnected or locked. Can also set hard working set limits to cap leaky processes. Blog Post Blog Post Blog Post
  3. Get installed software.ps1 – show the installed software on one or more computers where the computers are specified on the command line or via a csv file. Queries the registry rather than the win32_product WMI/CIM class which is faster and gives more complete results. Output can be to a csv file, an on screen grid view or standard output for piping into something else. If -uninstall is specified, items selected when OK is clicked in the grid view will be uninstalled. Similarly, a -remove option takes a comma separated list of package names or regular expressions and will run the uninstaller for them, silently if -silent is specified and the uninstall program is msiexec.exe.
  4. Group Membership Modifier.ps1 – add or remove a specified list of user accounts from local groups, such as Administrators or Remote Desktop Users, on one or more machines.
  5. Clone VHD.ps1 – create a new Hyper-V virtual machine from a .vhd/.vhdx file containing an existing VM, selecting the VM configuration in a GUI. Will integrate itself into Windows Explorer so you right-click on a virtual disk file and run it, elevating itself if required. Can make linked clones which can reduce disk space. Blog Post
  6. Fix Sysprep Appx errors.ps1 – parses sysprep logs looking for failures due to AppX packages causing sysprep to fail, removes them and runs sysprep again until successful.
  7. Show NTFS zone info.ps1 – Google Chrome and Internet Explorer store the URL of where downloaded files have come from in an NTFS Alternate Data Stream (ADS). This script shows these and optionally removes this information. Blog Post
  8. Profile Cleaner.ps1 – retrieve local profile information from one or more machines, queried from Active Directory OU, group or name, present them in an on-screen filterable/sortable grid view and delete any selected after prompting for confirmation. Options to include or exclude specific users and write the results to a csv file. Blog Post
  9. Show users.ps1 – Show current and historic logins including profile information, in a given time range or since boot, across a number of machines queried from Active Directory OU, group or name, write to csv file or display in an on-screen sortable/filterable grid view and logoff any selected sessions after confirmation. Works on RDS and infrastructure servers as well as XenApp. Blog Post
  10. Profile.ps1 – a PowerShell profile intended to be used on Server Core machines, with PowerShell set as the shell, which reports key configuration and status information during logon.
  11. Add firewall rules for dynamic SQL ports.ps1 – find all SQL instances and create firewall rules for them to work with dynamic ports
  12. Find Outlook drafts.ps1 – find emails in your Outlook drafts folder of a given age, prompt with the information with the option to open the draft. Designed to help you stop forgetting to complete and send emails. Has options to install & uninstall itself to launch at logon. Blog Post
  13. Outlook Leecher.ps1 – find SMTP email addresses in all your Outlook folders including calendars and write them to a csv file including context such as the subject and date of the email.
  14. Check Outlook recipient domains – an Outlook macro/function which will check the recipient addresses when sending an email and will warn if the email is going to more than a single external domain. Designed to help prevent accidental information leakage where someone may pick the wrong person when composing.
  15. Fix reminders – an Outlook macro/function which will find any non-all day Outlook meetings which have no reminder set, display the details in a popup and add a reminder for a number of minutes before the event as selected by the user. Blog Post.
  16. Check Skype Signed in.ps1 – uses the Lync 2013 SDK to check Skype for Business is signed in and will alert if it is not via a popup and playing an optional audio file. Can also pop up an alert if the client has been in “Do Not Disturb” in excess of a given period of time.
  17. Redirect Folders.ps1 – show existing folder redirections for the user running the script or set one or more folder redirections with a comma separated list of items of the form specialfolder=path. For example Music=H:\Music
  18. Check and start processes.ps1 – check periodically if each of a given list of processes is running and if not optionally start it, after an optional prompt is displayed. Any necessary parameters for each process can be specified after an optional semicolon character in the process name argument. Can install or uninstall itself to the per user or per machine registry run key so it runs at logon. Use it to launch and monitor key processes such as Outlook or Skype for Business (lync.exe).
  19. Autorun.ps1 – list, remove or add logon autoruns entries in the file system or registry for the user running the script or all users if the user has permissions. Can also operate on the RunOnce key and wow6432node on x64 systems. Uses regular expressions for matching the shortcut/registry value name and/or the command so knowing the exact names or commands is not required. Uses PowerShell’s built in confirmation mechanism before overwriting/deleting anything.

General Scripts

  1. Regrecent.ps1 – find registry keys modified in a given time/date window and write the results to a csv file or in an on-screen sortable/filterable grid view. Can include and/or exclude keys by name/regular expression. Blog Post
  2. Leaky.ps1 – simulate a leaky process by causing the PowerShell host process for the script to consume working set memory at a rate and quantity specified on the command line.
  3. Twitter Statistics.ps1 – fetch Twitter statistics, such as the number of followers and tweets, for one or more Twitter handles without using the Twitter API
  4. Sendto Checksummer.ps1 – when a shortcut to this script, by setting the shortcut target to ‘powershell.exe -file “path_to_the_script.ps1”, is added to the user’s Explorer SendTo folder, a right-click option for calculating file checksums/hashes is available. The user will be prompted for which hashing algorithm to use and then the checksums of all selected files will be calculated and shown in a grid view where selected items will be copied to the clipboard when “OK” is clicked.
  5. Zombie Handle Generator.ps1 – opens handles to a given list of processes and then closes them after a given time period or after keyboard input. Used to simulate handle leaks to test other software.
  6. Sendto folder size.ps1 – shows the sizes of each folder/file selected in explorer, or passed directly on the command line. For each item then selected in the grid view, it will show the largest 50 files. If any files are selected when OK is pressed in that grid view, a prompt to delete will be shown and if Yes is clicked, the files will be deleted via the recycle bin. To install for explorer right-click use and add a shortcut to this script via Powershell.exe -file in the shell:sendto folder.

Ivanti

  1. AMC configuration exporter.ps1 – export the configuration of one or more AppSense/Ivanti DesktopNow Management Servers to csv or xml file
  2. Get process module info.ps1 – interrogate running processes to extract file and certificate information for their loaded modules which can be useful in composing Ivanti Application Control configurations

VMware

  1. ESXi cloner.ps1 – create one or more new VMware ESXi virtual machines from existing VMs nominated as templates. For use when not using vCenter which has a built in templating mechansim. Can created linked clones to save on disk space and drastically speed up new VM creation. Can be used with or without a GUI.
  2. Get VMware or Hyper-V powered on vm details.ps1 – retrieves details of all powered on virtual machines, or just those matching a name pattern, from either VMware vSphere/ESXi or Hyper-V and either displays them in an on screen sortable & filterable grid view, standard output for further processing or writes to a text file that can be used in a custom field in SysInternals BGinfo tool to show IP addresses of these VMs on your desktop wallpaper which is useful when they are on an isolated network or not registered in DNS.

Dynamically Creating Process Monitor Filters

Introduction

I recently had the need to automate the use of SysInternals’ Process Monitor such that no manual intervention is required to initiate the capture, with a filter, and then to process the results, in PowerShell of course. Searching around, I found that the format of a procmon configuration (.pmc) file didn’t appear to be documented anywhere and, being a binary format, could prove tricky, and time-consuming, to fully reverse engineer. Indeed, web searches showed others looking for ways to dynamically create these configuration files, which contain the filters as well as included columns, but apparently without success.

Of course, one could run it without a filter but that will make for potentially much larger trace files, which could impact free disk space and performance and would take longer to process in PowerShell. I therefore set about trying to figure out how I could add a process id (PID) filter for a specific process via a script and I present the research and relevant script parts here for the benefit of others.

Isolating the Relevant Section of the Configuration File

In order to see if it was feasible to take an existing procmon configuration file containing a PID filter and change it, I performed a binary comparison between two configuration files I had manually saved from the procmon user interface. In terms of the filter parameters they contained, they were identical except one was for a PID of 123456 and the other for a PID 567890, e.g:

procmon controlup pid filter

To perform a binary comparison, I used the built-in Windows File Compare utility fc.exe. Note that when calling this from PowerShell, you must append the .exe to the end of the command since “fc” is a built-in alias for the Format-Custom cmdlet which is not what we want to call. The results looked like this:

procmon pmc binary comparison

Which instantly gave me hope that what I was trying to accomplish was achievable since there were only nine differences and being the sad geek that I am from my 6502 hand assembly days on Commodore computers, I already knew that hex 31 is the ASCII code for the number 1, hex 32 is 2 and so on so that the first six rows of the first column were representing the PID 123456 and the second column 567890. But what about the last 3 bytes which are different? Well, 123456 in hex is 01E240 and 567890 is 08AA52 which we can see stored in those last 3 different bytes, albeit in little endian format.

If we look at the area around these differences in a hex editor (I use XVI32 which has served me well for many years), then we get some context and more information:

xvi32 pmc file

Where the selected character is the start of the “123456” PID string. However, notice that after each ASCII character there is a null (00) – this means that the characters are technically Unicode (16 bit) rather than ASCII (8 bit). This is for information only, it doesn’t cause an issue as we’ll see presently. Also, the (crudely) highlighted portion shows that there is a Unicode null terminator character (00 00) after the “6” of “123456” followed by the PID in little endian format.

So my thinking was that I could produce a template configuration file with a placeholder PID of 123456 and then replace that with the actual PID I wanted procmon to trace. One potential issue was that PIDs can be between 1 and 6 digits long and I didn’t want to risk changing the size/layout of the file since that may have broken procmon. Fortunately I found that procmon was quite happy accepting a PID with leading zeroes such as “000123” so that meant as long as I padded my PID to six digits, procmon would still work.

Dynamically Creating a Configuration File

Whilst it is easy with the procmon GUI to set the filters how you want them and also include or exclude display columns and then save this as a .pmc file, I had the added complication that this script, because it is run as a Script Based Action (SBA) by ControlUp’s Real-Time console, needed to be self-contained so had to have the .pmc configuration file embedded within the script itself.

This fortunately is easy to achieve since we can base64 encode the binary file, which converts it to text that we then just assign to a variable within the script. To base64 encode a file and place it in the clipboard so that it can be pasted into a script, run the following:

[System.Convert]::ToBase64String( [byte[]][System.IO.File]::ReadAllBytes( 'c:\temp\demo.pmc' ) | Clip.exe

and we then paste it into the value of a variable assignment in our script, remembering to place the terminating quote character at the end of what will be a very long line:

base64 encoded

At run time, we can convert this base64 encoded text back to binary data simply by running the following:

[byte[]]$filter = [System.Convert]::FromBase64String( $procmonFilter )

I initially just used XVI32 to determine at what point the “123456” string appeared in the file data and placed that offset into a variable but I found as I tweaked the filter that I had to keep using XVI32 to see what the offset was which became laborious. I therefore wrote a function which returns the offset of a Unicode string within a PowerShell byte array, or -1 if it is not found. I then ended up with the following code snippet which, using the aforementioned function, finds the offset of the “FilterRules” block in the config file (see hex view above), finds “123456” after that offset and replaces the PID with ours from the $processId variable:

posh to replace pmc pid

The in memory configuration file, contained in the $filter variable, can then be written to a .pmc file and the full path to the file specified as an argument to procmon.exe via the “/LoadConfig” option. The arguments highlighted below are the ones I used to capture a trace where I used “/RunTime” which runs the capture for a given number of seconds and then terminates procmon, and thus the trace, cleanly. You could also run it without “/RunTime” and call procmon.exe again with a “/Terminate” argument when you have finished the capture. If you just kill the procmon.exe or procmon64.exe processes then the trace file will not be closed cleanly and will not be usable.

procmon automated options

Once this has finished capturing and exited, it will leave a binary trace in the file argument given to the “/BackingFile” option so to convert this to a CSV file that PowerShell can read in using the Import-Csv cmdlet, we run procmon again thus:

procmon.exe /Quiet /AcceptEula /OpenLog `"$backingFile`" /Minimized /SaveAs `"$csvTrace`" /SaveApplyFilter /LoadConfig `"$pmcFile`"

where $backingFile is the .pml trace, $csvTrace is the csv file that we want it to produce and $pmcFile is the configuration file we constructed and wrote to disk. Notice the quoting of the variables in case they contain spaces.

Downloading Procmon

But what if procmon isn’t already on the system where the script needs to run? Technically I could’ve used the same base64 encoding technique to embed it within the script but this would tie it to a specific version of procmon and may also fall foul of the licence agreement as it could be construed as distributing procmon. Thankfully, for many years, the SysInternals tools have been individually available via live.sysinternals.com so the following will download procmon to the file specified in the $procmon variable (proxy willing):

(New-Object System.Net.WebClient).DownloadFile( 'https://live.sysinternals.com/procmon.exe' , $procmon )

However, depending on security settings, sometimes running of the downloaded executable will produce a dialogue box warning that the file may be untrusted. We can prevent that by running the following:

Unblock-File -Path $procmon

The Finished Script

Will be made available shortly in the ControlUp Script Library. In fact, it’s likely there will be a number of Script Based Actions authored that utilise this dynamic filtering method.

Whilst this technique has shown how a custom PID filter can be dynamically constructed and used, the same techniques could be used to set other filters. The only caveat is that the patterns, such as “123456”, would need to be unique as the simple mechanism presented here cannot determine the column or relation for the filter rule.

Where did that download come from?

I was investigating something completely unrelated recently when I came across the fact that the Zone.Identifier information for downloaded files, on Windows 10, which is stored in NTFS Alternate Data Streams (ADS) on each downloaded file, contains the URL from which the file came. Yes, the whole URL so could potentially be very useful and/or very embarrassing. It’s this Zone.Identifier file that Windows Explorer checks when it puts restrictions on files that it deems could be unsafe because they have come from the internet zone.

Let me illustrate this with an example  where I have downloaded a theme from Microsoft using Chrome version 68 on Windows 10 and saved it into C:\Temp. One can then easily examine the ADS on this downloaded file using PowerShell version 3.0 or higher:

zone info chrome

The ZoneId is 3, which is the “Internet” zone as can be checked by looking at the “DisplayName” value in “HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\Zones\3”, and notice that it gives the actual path to where the file came from, which is actually different to the URL that I clicked. I reckon that could be handy if you forget where a particular file came from but also potentially embarrassing/incriminating depending on what you download where clearing your browser history and cache will only delete some of the evidence.

I’ve been aware of the Zone.Identifier ADS for a long time but I only ever remember seeing the zone number in there, not URLs, so I went back to a 2008R2 system, downloaded the same file with IE11 and sure enough there was only the ZoneId line. I then tried IE11 on Windows 10 and it too only had the ZoneId in the ADS file which gave rise to this table for my Windows 10 laptop since the behaviour is browser specific:

Browser Version Captures URL in ADS
Internet Explorer 11 No
Edge 42.17134 Yes
Chrome 68 Yes
Firefox 61 No
Tor 7.5.6 No

Although both Chrome and Edge don’t put the URL in the Zone.Identifier ADS when browsing in Incognito and InPrivate modes respectively.

This got me sufficiently interested to write a PowerShell script which finds files with a Zone.Identifier ADS in a given folder, and sub-folders if the -recurse option is specified. The script just outputs the data found so you can pipe it through cmdlets like Export-CSV or Out-GridView – below is an example of piping it through Out-GridView:

zone info script

The script also has -remove and -scrub options which will either completely remove the Zone.Identifier ADS file or just remove the URLs from it, so keeping the zone information, respectively.

The script is available here and you use it entirely at your own risk.

XenApp/XenDesktop 7.x Availability & Health Summary Script

This script started life because I became aware that my (former) colleagues in customer technical support were performing manual checks for customers at the start of the working day so it seemed obvious to me to automate as much as possible.

There are already some great scripts out there that will give you very detailed machine by machine health but I wanted something that would give an overview of the environment(s) given that many I work in have many hundreds of machines so one or two being unavailable at any one time isn’t necessarily a disaster but wading through an email with a list of 200+ machines trying to get a feel for overall health can be error prone.

The email that the script sends starts with a summary:
citrix daily checks health summary
and then below that there are a series of tables to give specific details on each of these items as well as a per-delivery group summary, including scheduled reboot information, but separately for XenApp and XenDesktop since you probably want to see different information for these.

health check dg summary

In addition it will also show the following in separate tables together with delivery group and catalogue information for each machine:

  • PVS devices with the highest number of retries, which might suggest problems with storage, networking or both if the numbers are high.
  • File share usage and percentage free space for a list of UNCs passed to the script.
  • Availability metrics for application groups and desktops which are tag restricted since the high level per-delivery group statistics can’t give this information.
  • Machines not powered on (a -excludedMachines option is available if you want/need to exclude machine names which are expected to be powered off such as PVS maintenance mode masters).
  • Unregistered powered on machines which are not in maintenance mode.
  • Machines with the highest number of sessions.
  • Machines with the highest load balancing indexes.

The “powered on machines failed to return boot time” table may indicate where machines are in a bad state of health such as having fallen off the domain, stuck at boot, hung, etc.

The “users disconnected more than xxx minutes” table is designed to show users whose sessions have failed to be terminated by settings in Citrix policy, which I have seen at some customers, and I have a separate script to help rectify this, available on GitHub. It will also show, by cross referencing the user’s session to the User Profile Service event log on the server where Citrix thinks they have their disconnected session to see if they do still have that session as I have seen issues where this session has already been logged off. I call these “ghost” sessions and this can cause a problem if an affected user tries to launch another application that would session share on that server as they will get an error since there is no session to share. I came across a workaround for this, by setting the “hidden” flag for that session which means that it won’t try and session share in that specific session and, yes, there is a script for that on GitHub too.

If your machines are not power managed by Citrix, so the Power State shows as “unmanaged” in Studio, the -vCentres option can be used, along with a comma separated list of vCentres, which allows the script to get the power state from VMware instead. VMware PowerCLI must be installed in order for this to work.

Options wise, the script accepts the following although not all are mandatory and many take defaults (there are a few others but I’ve omitted these as they’re not especially interesting) plus you can tab complete them if running interactively and only need to specify enough of the option for it to not be ambiguous:

Option Purpose
-ddcs Comma separated list of Delivery Controllers (only specify one per SQL connection)
-pvss Comma separated list of PVS servers (only specify one per SQL connection)
-vCentres Comma separated list of VMware vCentres
-UNCs Comma separated list of file shares to report on capacity & free space
-mailserver Address of SMTP server to use to send the email
-proxyMailServer If the SMTP server does not allow relaying via the machine where you run the script, use this option to proxy it via an allowed machine
-from The sender of the email. The default is the machine running the script but this may fail as it isn’t a valid email address
-subject The subject of the email. The default includes the date/time
-qualifier Prepended to the subject. E.g. “Production” or “Test”
-recipients Comma separated list of email recipients
-excludedMachines A regular expression where matching machines are excluded
-disconnectedMinutes Report sessions disconnected over this time which should be greater than any setting in Citrix policy. Default is 480 (8 hours)
-lastRebootedDaysAgo Report on machines which have not been rebooted in more than this number of days. The default is 7 days
-topCount Report this number of machines per category. Default is 5
-excludedTags Comma separated list of Citrix tags to exclude if machines are tagged

It must be run where the Citrix Delivery Controller and PVS PowerShell cmdlets are available locally which can be anywhere where the Studio and PVS consoles are installed. I tend to have these installed on dedicated management servers so as not to risk compromising the performance of production servers like Delivery Controllers.

If you don’t have scheduled reboots set and don’t want to report on workers not rebooted in a given timeframe then  pass zero to the -lastRebootedDaysAgo option.

I tend to schedule it to run it at least a couple of times a day for customers – once early in the morning so issues spotted can be rectified before the busier periods and again at just before midday when I think usage will be at its maximum so overloaded servers, etc can more easily be spotted and capacity increased if necessary. A typical command line to run it as a scheduled task is:

-ddcs ctxddc001 -pvss ctxpvs001 -UNCs \\clus01\AppV,\\clus01\commonfiles,\\clus01\usersettings -mailserver smtp.org.uk -recipients guy.leech@somewhere.uk -excludedMachines "\\(win10|win7)"

The script is available on GitHub here , requires version 3.0 of PowerShell as a minimum and is purely passive, other than sending an email, so risks associated with it are very low although you do use it entirely at your own risk. Note that it also requires the “Guys.Common.Functions.psm1” module which should be placed in the same folder as the script itself and is available ion the same GitHub repository.

Ghost Hunting in XenApp 7.x

The easily scared amongst you needn’t worry as what I am referring to here are disconnected XenApp sessions where the session that Citrix believes is still alive on a specific server have actually already ended, as in been logged off. “Does this cause problems though or is it just cosmetic?” I can hear you ask. Well, if a user tries to launch another application which is available on the same worker then it will cause a problem because XenApp will try and use session sharing, unless disabled, but there is no longer a session to share so the application launch fails. These show as “machine failures” in Director. Trying to log off the actually non-existent session, such as via Director, won’t fix it because there is no session to log off. Restarting the VDA on the effected machine also doesn’t cause the ghost session to be removed.

So how does one reduce the impact of these “ghost” sessions? In researching this, I came across this article from @jgspiers detailing the “hidden” flag which can be set for a session, albeit not via Studio or Director, such that session sharing is disabled for that one session.

I therefore set about writing a script that would query Citrix for disconnected sessions, via Get-BrokerSession, cross reference each of these to the XenApp server they were flagged as running on via running quser.exe and report those which didn’t actually have a session on that server. In addition, the script also tries to get the actual logoff time from the User Profile Service event log on that server and also checks to see if they have any other XenApp sessions, since that is a partial indication that they are not being hampered by the ghost session.

If the -hide flag is passed to the script then the “hidden” flag will be set for ghost sessions found.

The script can email a list of the ghost sessions if desired, by specifying the -recipients and -mailserver options (and -proxymailserver if the SMTP mail server does not allow relaying from where you run the script) and if a history file is specified, via the -historyFile option, then it will only email when there is a new ghost session discovered.

ghosted sessions example

I have noticed that the “UserName” field return by Get-BrokerSession is quite often blank for these ghost sessions and the user name is actually in the “UntrustedUserName” field about which the documentation states “This may be useful where the user is logged in to a non-domain account, however the name cannot be verified and must therefore be considered untrusted” but it doesn’t explain why the UserName field is blank since all logons are domain ones via StoreFront non-anonymous applications.

If running the script via a scheduled task, which I do at a frequency of every thirty minutes, with -hide, also specify the -forceIt flag otherwise the script will hang as it will prompt to confirm that you want to set any new ghost sessions to hidden.

The script is available on GitHub here and you use it at your own risk although I’ve been running it for one of my larger customers for months without issue; in fact we no longer have reports of users failing to launch applications which we previously had tracked down to the farm being haunted with these ghosts although it rarely affects more than 1% of disconnected sessions. This is on XenApp 7.13.

Outlook Draft Email Reminder

How many times have you either sat there wondering why someone hasn’t responded to an email you’ve sent or someone chases you asking why you haven’t replied to a certain email and in both cases the partial response is actually still in your Outlook drafts folder? Of course, you had every intention of sending that email but you got sidetracked and  then either Outlook got restarted after exiting or crashing, you logged off and back on, shutdown, etc. In both cases, that once open email is then no longer open on your desktop but hidden away in your drafts waiting for you to remember to send it – out of sight, out of mind!

Yes, it has happened to me on more than one occasion so I therefore decided to script a solution to it, or at least something that would politely remind you that you had draft emails that perhaps you might want to finish. I started off writing in VBA but I couldn’t get it to trigger at startup or asynchronously so I switched to PowerShell, which I much prefer anyway.

The script has a number of options but I would suggest that the easiest way to use it is to have it run at logon and give it parameters -waitForOutlook and -wait which  mean that it will wait for an Outlook process to start before it starts checking, although it doesn’t have to since it uses COM to instantiate an Outlook instance of its own anyway, and the -wait means that it will loop around rather than performing one check and exiting.

If it finds draft emails created in the last seven days, although this can be changed via the -withinDays option, a popup will be displayed, which will be on top of all other windows, asking if you want to open them:

outlook drafts

Clicking “Yes” will result in the emails being opened, giving you the opportunity to finally finish and send them. Selecting “No” will either cause the script to exit if the -nowait option isn’t specified or put it to sleep until either a new Outlook instance appears, for instance because you close the current one and at some point start another one, or until the nag timer expires. The nag option, triggered by using the -nag parameter with a value in minutes, will cause the script to remind you, via the popup, that there are drafts that could probably do with your attention.

As I believe the best way to run this is to have it run at logon and then continue to check for draft emails, I added options to install and uninstall it into the registry so that it will be run at logon to save you the hassle of doing this yourself. If you run the following command line, it will create a registry value “Outlook draft nagger” in HKCU\Software\Microsoft\Windows\CurrentVersion\Run, or HKLM  if you want it to run for all users and the -allusers option is specified:

& '.\Find Outlook drafts.ps1' -waitForOutlook -withinDays 7 -wait -install "Outlook Drafts Checker" -nag 120

This will nag the user if there are drafts created in the last seven days as soon as Outlook is launched and then nag again either if Outlook is relaunched in that session or every two hours. Alternatively, it could be setup as a scheduled task if preferred but you lose some of its responsiveness such as being able to nag immediately if a new Outlook process for that user is detected.

If you need to remove this autorun, simply run with -uninstall “Outlook draft nagger”.

The script is available on GitHub here and you use it entirely at your own risk although there’s not exactly a great deal of damage that it can wreak. None in fact, other than perhaps you finally finishing and sending an email that perhaps you shouldn’t but don’t blame the script for that, after all you can always delete draft emails rather than send them!