Data Miscellaneous

My boss proposes infeasible projects and doesn’t like confrontation, advice?

Hey guys, so I’m working as a Data Scientist in real estate. My boss proposes projects with insanely large data he’s thrown into google big query.

He has a very limited background in CS and I’ve had to optimize a lot of his code / ETL pipelines just to make everyone else’s life easier. He got the position as it’s a group of friends who started the company, designated himself as the head of data science.

He’s proposing ideas that, in an optimal setting with a large budget, it would be feasible. I’ve talked to him about it and he’s dismissed my concerns.

Alarming extra concerns: 1) he was amazed that I used terminal. 2) he doesn’t understand basic linear algebra.

I’m concerned for my safety in the company. If I can’t fulfill my boss’ project proposals / ideas, I’ll be let go.

Please, I’d love some advice. Thanks Gang!

submitted by /u/expatwithajetpack
[link] [comments]

Source: Reddit Data Science

How do you peer review Data Science projects?

Hey everyone. 🙂

I'm Shay (written erroneously; pronounced Shy), a data science consultant out of Israel.

I've been dedicating a lot of thought to peer review in small data science teams (like the ones I use to run, and now consult to). Sure, some of it entails reviewing code, but a lot of our work products and processes are different, and require, I believe, a dedicated peer review process.

I'd love to hear your thoughts on the topic. Is peer review a regular part of the work process in your team? Have you reviewed or been reviewed by a peer? What is your approach? What do you feel is still missing? Have you encountered any structured approaches to this process that are unique to DS/ML teams – especially small ones, where 1 project = 1 data scientist?

If you're interested in my approach so far – which we have started implementing at one of my clients', and I've actually reviewed a DS project using this procedure – you are more than welcome to take a look at this blog post, and shout at me for all of my mistakes (friends link, so no paywall): 😗
https://medium.com/@shay.palachy/peer-reviewing-data-science-projects-7bfbc2919724?source=friends_link&sk=914d618224f713cbcabf1f6ead3ba3d9

Cheers (and Coronavirus),
Shay

submitted by /u/shaypal5
[link] [comments]

Source: Reddit Data Science

[R] [P] Generating Tabular Data with GANs/VAEs for datasets with both Continuous and Discrete Features

I'm not aware of any techniques out there currently which convincingly address this problem. But I have been Googling around for hours and found the following resources.

This has a really nice idea that combines several softmax outputs (one for each discrete feature) with continuous ones at the end of the GAN.

https://medium.com/jungle-book/towards-data-set-augmentation-with-gans-9dd64e9628e6

This also mentions using a VAE to generate prototypes for counterfactual explanations, but I don't think it's as relevant.

https://docs.seldon.io/projects/alibi/en/stable/methods/CFProto.html

https://docs.seldon.io/projects/alibi/en/stable/examples/cfproto_cat_adult_ohe.html

MIT has a nice repository for doing this also, but it's in Tensorflow, and not easy to take apart for extending it for research etc.

https://github.com/sdv-dev/CTGAN

I don't suppose anyone has any suggestions of which of these methods might be best? I'm leaning towards the first one, but surprisingly no one has a published a paper doing it, so I'd have to code it myself and it'd be hard to justify at a conference review process by citing an internet article and saying that "some guy on the internet said it worked well, so we did the same thing here".

I want to use such a generative model for research into explainable AI, but I've never surveyed this literature before, and it's pretty hectic. Thanks for any responses.

submitted by /u/skeering
[link] [comments]

Source: Reddit Machine Learning

Quarantined in Spain

Hi everyone! The coronavirus situation over here is pretty wild, so I'll be forced to postpone the my Big Data masters from end of April to October. I would hate to spend the next six months doing absolutely nothing, but I'm also breaking my head trying to find something related to data science (education aside), and not just any random job. For context, I have a BS in Economics, so studied a lot of econometrics but zero programming. I'd appreciate your advice if you have any idea, and hope you're safe wherever you are!

Edit: Education aside because I already found resources I'm excited about in that field 🙂

submitted by /u/elenabdt
[link] [comments]

Source: Reddit Data Science

[Project] Amerikana: A python decoder module for simpler imagenet synset!

Hey guys, I saw a cool project the other day which used simpler imagenet labels, instead of the current imagenet or keras synsets. So i went ahead and made a decoder module which is 100% tf.keras compatible and can be used as drop in replacement for tf.keras decode_predictions!

Happy detecting.

PS: Github repo is here: Amerikana

submitted by /u/gwolf3
[link] [comments]

Source: Reddit Machine Learning