[R] [P] Generating Tabular Data with GANs/VAEs for datasets with both Continuous and Discrete Features

0

I'm not aware of any techniques out there currently which convincingly address this problem. But I have been Googling around for hours and found the following resources.

This has a really nice idea that combines several softmax outputs (one for each discrete feature) with continuous ones at the end of the GAN.

https://medium.com/jungle-book/towards-data-set-augmentation-with-gans-9dd64e9628e6

This also mentions using a VAE to generate prototypes for counterfactual explanations, but I don't think it's as relevant.

https://docs.seldon.io/projects/alibi/en/stable/methods/CFProto.html

https://docs.seldon.io/projects/alibi/en/stable/examples/cfproto_cat_adult_ohe.html

MIT has a nice repository for doing this also, but it's in Tensorflow, and not easy to take apart for extending it for research etc.

https://github.com/sdv-dev/CTGAN

I don't suppose anyone has any suggestions of which of these methods might be best? I'm leaning towards the first one, but surprisingly no one has a published a paper doing it, so I'd have to code it myself and it'd be hard to justify at a conference review process by citing an internet article and saying that "some guy on the internet said it worked well, so we did the same thing here".

I want to use such a generative model for research into explainable AI, but I've never surveyed this literature before, and it's pretty hectic. Thanks for any responses.

submitted by /u/skeering
[link] [comments]

Source: Reddit Machine Learning

Choose your Reaction!
Leave a Comment