XenApp/XenDesktop 7.x Availability & Health Summary Script

This script started life because I became aware that my (former) colleagues in customer technical support were performing manual checks for customers at the start of the working day so it seemed obvious to me to automate as much as possible.

There are already some great scripts out there that will give you very detailed machine by machine health but I wanted something that would give an overview of the environment(s) given that many I work in have many hundreds of machines so one or two being unavailable at any one time isn’t necessarily a disaster but wading through an email with a list of 200+ machines trying to get a feel for overall health can be error prone.

The email that the script sends starts with a summary:
citrix daily checks health summary
and then below that there are a series of tables to give specific details on each of these items as well as a per-delivery group summary, including scheduled reboot information, but separately for XenApp and XenDesktop since you probably want to see different information for these.

health check dg summary

In addition it will also show the following in separate tables together with delivery group and catalogue information for each machine:

  • PVS devices with the highest number of retries, which might suggest problems with storage, networking or both if the numbers are high.
  • File share usage and percentage free space for a list of UNCs passed to the script.
  • Availability metrics for application groups and desktops which are tag restricted since the high level per-delivery group statistics can’t give this information.
  • Machines not powered on (a -excludedMachines option is available if you want/need to exclude machine names which are expected to be powered off such as PVS maintenance mode masters).
  • Unregistered powered on machines which are not in maintenance mode.
  • Machines with the highest number of sessions.
  • Machines with the highest load balancing indexes.

The “powered on machines failed to return boot time” table may indicate where machines are in a bad state of health such as having fallen off the domain, stuck at boot, hung, etc.

The “users disconnected more than xxx minutes” table is designed to show users whose sessions have failed to be terminated by settings in Citrix policy, which I have seen at some customers, and I have a separate script to help rectify this, available on GitHub. It will also show, by cross referencing the user’s session to the User Profile Service event log on the server where Citrix thinks they have their disconnected session to see if they do still have that session as I have seen issues where this session has already been logged off. I call these “ghost” sessions and this can cause a problem if an affected user tries to launch another application that would session share on that server as they will get an error since there is no session to share. I came across a workaround for this, by setting the “hidden” flag for that session which means that it won’t try and session share in that specific session and, yes, there is a script for that on GitHub too.

If your machines are not power managed by Citrix, so the Power State shows as “unmanaged” in Studio, the -vCentres option can be used, along with a comma separated list of vCentres, which allows the script to get the power state from VMware instead. VMware PowerCLI must be installed in order for this to work.

Options wise, the script accepts the following although not all are mandatory and many take defaults (there are a few others but I’ve omitted these as they’re not especially interesting) plus you can tab complete them if running interactively and only need to specify enough of the option for it to not be ambiguous:

Option Purpose
-ddcs Comma separated list of Delivery Controllers (only specify one per SQL connection)
-pvss Comma separated list of PVS servers (only specify one per SQL connection)
-vCentres Comma separated list of VMware vCentres
-UNCs Comma separated list of file shares to report on capacity & free space
-mailserver Address of SMTP server to use to send the email
-proxyMailServer If the SMTP server does not allow relaying via the machine where you run the script, use this option to proxy it via an allowed machine
-from The sender of the email. The default is the machine running the script but this may fail as it isn’t a valid email address
-subject The subject of the email. The default includes the date/time
-qualifier Prepended to the subject. E.g. “Production” or “Test”
-recipients Comma separated list of email recipients
-excludedMachines A regular expression where matching machines are excluded
-disconnectedMinutes Report sessions disconnected over this time which should be greater than any setting in Citrix policy. Default is 480 (8 hours)
-lastRebootedDaysAgo Report on machines which have not been rebooted in more than this number of days. The default is 7 days
-topCount Report this number of machines per category. Default is 5
-excludedTags Comma separated list of Citrix tags to exclude if machines are tagged

It must be run where the Citrix Delivery Controller and PVS PowerShell cmdlets are available locally which can be anywhere where the Studio and PVS consoles are installed. I tend to have these installed on dedicated management servers so as not to risk compromising the performance of production servers like Delivery Controllers.

If you don’t have scheduled reboots set and don’t want to report on workers not rebooted in a given timeframe then  pass zero to the -lastRebootedDaysAgo option.

I tend to schedule it to run it at least a couple of times a day for customers – once early in the morning so issues spotted can be rectified before the busier periods and again at just before midday when I think usage will be at its maximum so overloaded servers, etc can more easily be spotted and capacity increased if necessary. A typical command line to run it as a scheduled task is:

-ddcs ctxddc001 -pvss ctxpvs001 -UNCs \\clus01\AppV,\\clus01\commonfiles,\\clus01\usersettings -mailserver smtp.org.uk -recipients guy.leech@somewhere.uk -excludedMachines "\\(win10|win7)"

The script is available on GitHub here , requires version 3.0 of PowerShell as a minimum and is purely passive, other than sending an email, so risks associated with it are very low although you do use it entirely at your own risk. Note that it also requires the “Guys.Common.Functions.psm1” module which should be placed in the same folder as the script itself and is available ion the same GitHub repository.

Advertisements

Ghost Hunting in XenApp 7.x

The easily scared amongst you needn’t worry as what I am referring to here are disconnected XenApp sessions where the session that Citrix believes is still alive on a specific server have actually already ended, as in been logged off. “Does this cause problems though or is it just cosmetic?” I can hear you ask. Well, if a user tries to launch another application which is available on the same worker then it will cause a problem because XenApp will try and use session sharing, unless disabled, but there is no longer a session to share so the application launch fails. These show as “machine failures” in Director. Trying to log off the actually non-existent session, such as via Director, won’t fix it because there is no session to log off. Restarting the VDA on the effected machine also doesn’t cause the ghost session to be removed.

So how does one reduce the impact of these “ghost” sessions? In researching this, I came across this article from @jgspiers detailing the “hidden” flag which can be set for a session, albeit not via Studio or Director, such that session sharing is disabled for that one session.

I therefore set about writing a script that would query Citrix for disconnected sessions, via Get-BrokerSession, cross reference each of these to the XenApp server they were flagged as running on via running quser.exe and report those which didn’t actually have a session on that server. In addition, the script also tries to get the actual logoff time from the User Profile Service event log on that server and also checks to see if they have any other XenApp sessions, since that is a partial indication that they are not being hampered by the ghost session.

If the -hide flag is passed to the script then the “hidden” flag will be set for ghost sessions found.

The script can email a list of the ghost sessions if desired, by specifying the -recipients and -mailserver options (and -proxymailserver if the SMTP mail server does not allow relaying from where you run the script) and if a history file is specified, via the -historyFile option, then it will only email when there is a new ghost session discovered.

ghosted sessions example

I have noticed that the “UserName” field return by Get-BrokerSession is quite often blank for these ghost sessions and the user name is actually in the “UntrustedUserName” field about which the documentation states “This may be useful where the user is logged in to a non-domain account, however the name cannot be verified and must therefore be considered untrusted” but it doesn’t explain why the UserName field is blank since all logons are domain ones via StoreFront non-anonymous applications.

If running the script via a scheduled task, which I do at a frequency of every thirty minutes, with -hide, also specify the -forceIt flag otherwise the script will hang as it will prompt to confirm that you want to set any new ghost sessions to hidden.

The script is available on GitHub here and you use it at your own risk although I’ve been running it for one of my larger customers for months without issue; in fact we no longer have reports of users failing to launch applications which we previously had tracked down to the farm being haunted with these ghosts although it rarely affects more than 1% of disconnected sessions. This is on XenApp 7.13.

Outlook Draft Email Reminder

How many times have you either sat there wondering why someone hasn’t responded to an email you’ve sent or someone chases you asking why you haven’t replied to a certain email and in both cases the partial response is actually still in your Outlook drafts folder? Of course, you had every intention of sending that email but you got sidetracked and  then either Outlook got restarted after exiting or crashing, you logged off and back on, shutdown, etc. In both cases, that once open email is then no longer open on your desktop but hidden away in your drafts waiting for you to remember to send it – out of sight, out of mind!

Yes, it has happened to me on more than one occasion so I therefore decided to script a solution to it, or at least something that would politely remind you that you had draft emails that perhaps you might want to finish. I started off writing in VBA but I couldn’t get it to trigger at startup or asynchronously so I switched to PowerShell, which I much prefer anyway.

The script has a number of options but I would suggest that the easiest way to use it is to have it run at logon and give it parameters -waitForOutlook and -wait which  mean that it will wait for an Outlook process to start before it starts checking, although it doesn’t have to since it uses COM to instantiate an Outlook instance of its own anyway, and the -wait means that it will loop around rather than performing one check and exiting.

If it finds draft emails created in the last seven days, although this can be changed via the -withinDays option, a popup will be displayed, which will be on top of all other windows, asking if you want to open them:

outlook drafts

Clicking “Yes” will result in the emails being opened, giving you the opportunity to finally finish and send them. Selecting “No” will either cause the script to exit if the -nowait option isn’t specified or put it to sleep until either a new Outlook instance appears, for instance because you close the current one and at some point start another one, or until the nag timer expires. The nag option, triggered by using the -nag parameter with a value in minutes, will cause the script to remind you, via the popup, that there are drafts that could probably do with your attention.

As I believe the best way to run this is to have it run at logon and then continue to check for draft emails, I added options to install and uninstall it into the registry so that it will be run at logon to save you the hassle of doing this yourself. If you run the following command line, it will create a registry value “Outlook draft nagger” in HKCU\Software\Microsoft\Windows\CurrentVersion\Run, or HKLM  if you want it to run for all users and the -allusers option is specified:

& '.\Find Outlook drafts.ps1' -waitForOutlook -withinDays 7 -wait -install "Outlook Drafts Checker" -nag 120

This will nag the user if there are drafts created in the last seven days as soon as Outlook is launched and then nag again either if Outlook is relaunched in that session or every two hours. Alternatively, it could be setup as a scheduled task if preferred but you lose some of its responsiveness such as being able to nag immediately if a new Outlook process for that user is detected.

If you need to remove this autorun, simply run with -uninstall “Outlook draft nagger”.

The script is available on GitHub here and you use it entirely at your own risk although there’s not exactly a great deal of damage that it can wreak. None in fact, other than perhaps you finally finishing and sending an email that perhaps you shouldn’t but don’t blame the script for that, after all you can always delete draft emails rather than send them!


	

Showing Current & Historical User Sessions

One of my pet hates, other than hamsters, is when people logon to infrastructure servers, which provide a service to users either directly or indirectly, to run a console or command when that item is available on another server which isn’t providing user services. For instance, I find people logon to Citrix XenApp Delivery Controllers to run the Studio console where, in my implementations, there will always be a number of management servers where all of the required consoles and PowerShell cmdlets are installed. They compound the issue by then logging on to other infrastructure servers to run additional consoles which is actually more effort for them than just launching the required console instance(s) on the aforementioned management server(s). To make matters even worse, I find they quite often disconnect these sessions rather than logoff and have the temerity to leave consoles running in these disconnected sessions! How not to be in my good books!

Even if I have to troubleshoot an issue on one of these infrastructure servers, I will typically remotely access their event logs, services, etc. via the Computer Management MMC snap-in connected remotely and if I need to run non-GUI commands then I’ll use PowerShell’s Enter-PSSession cmdlet to remote to it which is much less of an impact than getting a full blown interactive session via mstsc or similar.

To find these offenders, I used to run quser .exe, which is what the command “query user” calls, with the /server argument against various servers to check if people were logged on when they shouldn’t have been but I thought that I really ought to script it to make it easier and quicker to run. I then also added the ability to select one or more of these sessions and log them off.

It also pulls in details of the “offending” user’s profile lest that’s too big and needs trimming or deleting. I have written a separate script for user profile analysis and optional deletion which is also available in my GitHub repository.

For instance, running the following command:

 & '.\Show users.ps1' -name '^cxt2[05]\d\d' -current

will result in a grid view similar to the one below:

show users ordered

 

It works by querying Active Directory via the Get-ADComputer cmdlet, runs quser.exe against all machines named CTX20xx and CTX25yy, where xx and yy are numerical, and display them in a grid view. Sessions selected in this grid view when the “OK” button is pressed will be logged off although PowerShell’s built in confirmation mechanism is used so if “OK” is accidentally pressed, the world probably won’t end because of it.

The script can also be used to show historical logons on a range of servers where the range can be specified in one of three ways:

  1. -last x[smhdwy] where x is a number and s=seconds, m=minutes, h=hours, d=days, w=weeks and y=years. For example, ‘-last 7d’ will show sessions logged on in the preceding 7 days
  2. -sinceboot
  3. -start “hh:mm:ss dd/MM/yyyy” -end “hh:mm:ss dd/MM/yyyy” (if the date is omitted then the current date is used)

For example, running the following:

& '.\Show users.ps1' -ou 'contoso.com/Servers/Citrix XenApp/Production/Infrastructure Servers' -last 7d

gives something not totally unlike the output below where the columns can be sorted by clicking on the headings and filters added by clicking “Add criteria”:

show users aged

Note that the OU is specified in this example as a canonical name, so can be copied and pasted out of the properties tab for an OU in AD Users and Computers rather than you having to write it in distinguished name form, although it will accept that format too. It can take a -group option instead of -ou and will recursively enumerate the given group to find all computers and the -name option can be used with both -ou and -group to further restrict what machines are interrogated.

The results are obtained from the User Profile Service operational event log and can be written to file, rather than being displayed in a grid view, by using the -csv option.

Sessions selected when “OK” is pressed will again be logged off although a warning will be produced instead if a session has already been logged off.

If you are looking for a specific user, then this can be specified via the -user option which takes a regular expression as the argument. For instance adding the following to the command line:

-user '(fredbloggs|johndoe)'

will return only sessions for usernames containing “fredbloggs” or “johndoe”

Although I wrote it for querying non-XenApp/RDS servers, as long as the account you use has sufficient privileges, you can point it at these rather than using tools like Citrix Director or Edgesight.

The script is available on GitHub here and use of it is entirely at your own risk although if you run it with the -noprofile option it will not give the OK and Cancel buttons so logoff cannot be initiated from the script. It requires a minimum of version 3.0 of PowerShell, access to the Active Directory PowerShell module and pulls data from servers from 2008R2 upwards.

If you are querying non-English operating systems, there may be an issue since the way the script parses the output from the quser command is to use the column headers, namely ‘USERNAME’,’SESSIONNAME’,’ID’,’STATE’,’IDLE TIME’,’LOGON TIME’ on an English OS, since the output is fixed width. You may need to either edit the script or specify the column names via the -filedNames option.

Profile Cleaner Utility

We EUC consultants can spend a considerable amount of time deciding on and building the most suitable user profile mechanism for our Citrix, VMware and RDS deployments but very little, if any, time is spent doing the same for infrastructure servers. I’m not saying that this is an issue – it isn’t generally – as most people take the out of the box default which is local profiles. However, over time as people leave, we can get disk space issues caused by these stale profiles and even when people haven’t left, their profiles can become large without them realising which can potentially impact the performance of these servers since a machine with a full file system generally doesn’t function well. It can of course also be used on persistent XenApp/RDS servers to check for and delete stale or oversize profiles there.

Having checked this manually for rather too long, I decided to write a script to give visibility of local profiles across a range of machines pulled from Active Directory where the machines to interrogate can be selected by a regular expression matching their name, an organisational unit (e.g. copied to the clipboard from the properties of an OU in the AD Users and Computers MMC snap in) or an AD group.

This actually turned out to be easier than I anticipated, for once, in that I didn’t have to go anywhere near the ProfileList registry key directly since there is a WMI class Win32_UserProfile which contains the required information, albeit with the profile owner as a SID rather than username but in PowerShell it’s easy to get the username for a SID. I’ve pulled out what I think are the most useful fields but if you were to use it, say, for persistent XenApp servers using roaming profiles then you might want to pull more of the fields out.

The script requires the Active Directory PowerShell module to be present wher the script is run from since it will query AD and retrieve various AD properties for the domain users associated with profiles to make it easy to spot users who may have left because their AD account is disabled or their last AD logon was a long time ago.

Thanks to the great PowerShell Out-GridView cmdlet, it was straightforward to take the list of user profiles which were selected when the “OK” button was clicked in the grid view and then delete those profiles, albeit with PowerShell prompting for confirmation before deletions. The deletion is achieved by calling the Delete() method of the win32_userprofile WMI object previously returned for that profile. Obviously the script will need to be run under an account that has the rights to remotely delete profiles.

It’s very simple to use, for example running the script with the following  options will result in a grid view where any profiles that you want to delete can be selected and then the OK button pressed to delete them:

& '.\Profile Cleaner.ps1' -excludeLocal -excludeUsers [^a-z]SVC-[a-z] -name '^CTX\d{4}'

profiles tp delete

This will exclude all local, as in non-domain, accounts and any accounts that start with SVC- as these may be service accounts that are best left well alone, unless the profile size is of a concern. This will be on all servers named CTXxxxx where xxxx is numerical, specified by regular expression, aka regex, which really aren’t that scarey, honest!

An OU, either in canonical or distinguished name format, or AD group can be specified via the -OU and -group options respectively. The -name option can also be specified with either of these to restrict what machines are returned from the OU or group specified.

It will write the profile information to a csv file if the -csv option is specified instead of displaying it in a grid view.

Run with -verbose to get more detail as it runs such as what machine it is querying. It may seem to run slowly but that is most likely to be because it has to traverse each user’s profile in order to determine its size.

The script is available for download from GitHub here and you use it entirely at your own risk.

This is very much an interactive tool – if you need an automated mechanism for removing profiles then I would recommend looking at the delprof2 tool from Helge Klein which is available here.

Memory Control Script – Capping Leaky Processes

In the third part of the series covering the features of a script I’ve written to control process working sets (aka “memory”), I will show how it can be used to prevent leaky processes from consuming more memory than you believe they should.

First off, what is a memory leak? For me, it’s trying to remember why I’ve gone into a room but in computing terms, it is when a developer has dynamically allocated memory in their programme but then not subsequently informed the operating system that they have finished with that memory. Older programming languages, like C and C++, do not have built in garbage collection so they are not great at automatically releasing memory which is no longer required. Note that just because a process’s memory increases but never decreases doesn’t actually mean that it is leaking – it could be holding on to the memory for reasons that only the developer knows.

So how do we stop a process from leaking? Well short of terminating it, we can’t as such but we can limit the impact by forcing it to relinquish other parts of its allocated memory (working set) in order to fulfil memory allocation requests. What we shouldn’t do is to deny the memory allocations themselves, which we could actually do with hooking methods like Microsoft’s Detours library. This is because the developer, if they even bother checking the return status of a memory allocation request before using it, which would result in the infamous error “the memory referenced at 0x00000000 could not be read/written” (aka a null pointer dereference), probably can’t do a lot if the memory allocation fails other than outputting an error to that effect and exiting.

What we can do, or rather the OS can do, is to apply a hard maximum working set limit to the process. What this means is that the working set cannot increase above the limit so if more memory is required, part of the existing working set must be paged out. The memory paged out is the least recently used so is very likely to be the memory the developer forgot to release so they won’t be using it again and it can sit in the page file until the process exits. Thus increased page file usage but decreased RAM usage which should help performance and scalability and reduce the need for reboots or manual intervention.

Applying a hard working set limit is easy with the script, the tricky part is knowing what value to set as the limit – too low and it might not just be leaked memory that is paged out so performance could be negatively affected due to hard page faults. Too high a limit and the memory savings, if the limit is ever hit, may not be worth the effort.

To set a hard working set limit on a process we run the script thus:

.\trimmer.ps1 -processes leakprocess -hardMax -maxWorkingSet 100MB

or if the process has yet to start we can use the waiting feature of the script along with the -alreadyStarted option in case the process has actually already started:

.\trimmer.ps1 -processes leakprocess -hardMax -maxWorkingSet 100MB -waitFor leakyprocess -alreadyStarted

You will then observe in task manager that its working set never exceeds 100MB.

To check that hard limits are in place, you can use the reporting option of the script since tools like task manager and SysInternals Process Explorer won’t show whether any limits are hard ones. Run the following:

.\trimmer.ps1 -report -above 0

which will give a report similar to this where you can filter where there is a hard working set limit in place:

hard working set limit

There is a video here which demonstrates the script in action and uses task manager to prove that the working set limit is adhered to.

One way to implement this for a user, would be to have a logon script that uses the -waitFor  option as above, together with -loop so that the script keeps running and picks up further new instances of the process to be controlled, to wait for the process to start. To implement for system processes, such as a leaky third party service or agent, use the same approach but in a computer start-up script.

Once implemented, check that hard page fault rates are not impacting performance because the limit you have imposed is too low.

The script is available here and use of it is entirely at your own risk.

Changing/Checking Citrix StoreFront Logging Settings

Enabling, capturing and diagnosing StoreFront logs is not something I have to do often but when I do, I found it was time consuming to enable, and disable, logging across multiple StoreFront servers and also to check on the status of logging since Citrix provide cmdlets to change tracing levels but not to query them as far as I can tell.

After looking at reported poor performance of several StoreFront servers at one of my customers, I found that two of them were set for verbose logging which wouldn’t have been helping. I therefore set about writing a script that would allow the logging (trace) level to be changed across multiple servers and also to report on the current logging levels. I use the plural as there are many discrete modules within StoreFront and each can have its own log level and log file.

So which module needs logging enabled? The quickest way, which is all the script currently supports, is to enable logging for all modules. The Citrix cmdlet that changes trace levels, namely  Set-DSTraceLevel, can be used more granularly it seems but I have found insufficient details in order to be able to implement it in my script.

The script works with clustered StoreFront servers in that you can specify just one of the servers in the cluster via the -servers option together with the -cluster option which will (remotely) read the registry on that server to find where StoreFront is installed so that it can load the required cmdlets to retrieve the list of all servers in the cluster.

To set the trace level on all servers in a StoreFront cluster run the following:

& '.\StoreFront Log Levels.ps1' -servers storefront01 -cluster -traceLevel Verbose

The available trace levels are:

  • Off
  • Error
  • Warning
  • Info
  • Verbose

To show the trace levels, without changing them, on these servers and check that they are consistent on each server and across them, run the following:

& '.\StoreFront Log Levels.ps1' -servers storefront01 -cluster -grid

Which will give a grid view similar to this:

storefront log settings

It will also report the version of StoreFront installed although the -cluster option must be used and all servers in the cluster specified via -servers if you want to display the version for all servers.

The script is available here and you use it entirely at your own risk although I do use it myself on production StoreFront servers. Note that it doesn’t need to run on a StoreFront server as it will remote commands to them via the Invoke-Command cmdlet. It has so far been tested on StoreFront versions 3.0 and 3.5 and requires a minimum of PowerShell version 3.0.

Once you have the log files, there’s a script introduced here that stitches the many log files together and then displays them in a grid view, or csv, for easy filtering to hopefully quickly find anything relevant to the issue being investigated.

For those of an inquisitive nature, the retrieval side of the script works by calling the Get-DSWebSite cmdlet to get the StoreFront web site configuration which includes the applications and for each of these it finds the settings by examining the XML in each web.config file.

Don’t forget to return logging levels to what they were prior to your troubleshooting although I would recommend leaving them set as “Error” as opposed to “Off”.